I'm trying to design a CNN in Keras to classify small images of emojis in other images. Below is an example of one of the 13 classes. All images are the same size and all the emojis are of the same size as well. I would think that one should rather easily be able to achieve VERY high accuracy when classifying, as emojis from one class is exactly the same! My intuition told me that if an emoji is 50x50 I could create a convolutional layer of the same size to match one type of emoji. My supervisor did not think that was feasible however. Anyway, my problem is that, no matter how I design my model, I always get the same validation accuracy for each epoch, which corresponds to 1/13 (or simply guessing that each emoji belongs to the same class).
My model looks like this:
model = Sequential()
model.add(Conv2D(16, kernel_size=3, activation="relu", input_shape=IMG_SIZE))
model.add(Dropout(0.5))
model.add(Conv2D(32, kernel_size=3, activation="relu"))
model.add(Conv2D(64, kernel_size=3, activation="relu"))
model.add(Conv2D(128, kernel_size=3, activation="relu"))
#model.add(Conv2D(256, kernel_size=3, activation="relu"))
model.add(Dropout(0.5))
model.add(Flatten())
#model.add(Dense(256, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(NUM_CLASSES, activation='softmax', name="Output"))
And I train it like this:
# ------------------ Compile and train ---------------
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
rms = optimizers.RMSprop(lr=0.004, rho=0.9, epsilon=None, decay=0.0)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=["accuracy"]) # TODO Read more about this
train_hist = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.n // BATCH_SIZE,
validation_steps=validation_generator.n // BATCH_SIZE, # TODO que?
epochs=EPOCHS,
validation_data=validation_generator,
#callbacks=[EarlyStopping(patience=3, restore_best_weights=True)]
)
Even with this model, that has over 200 million parameters, I get exactly 0.0773 in validation accuracy for each epoch:
Epoch 1/10
56/56 [==============================] - 21s 379ms/step - loss: 14.9091 - acc: 0.0737 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 2/10
56/56 [==============================] - 6s 108ms/step - loss: 14.9308 - acc: 0.0737 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 3/10
56/56 [==============================] - 6s 108ms/step - loss: 14.7869 - acc: 0.0826 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 4/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8948 - acc: 0.0759 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 5/10
56/56 [==============================] - 6s 109ms/step - loss: 14.8897 - acc: 0.0762 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 6/10
56/56 [==============================] - 6s 109ms/step - loss: 14.8178 - acc: 0.0807 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 7/10
56/56 [==============================] - 6s 108ms/step - loss: 15.0747 - acc: 0.0647 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 8/10
56/56 [==============================] - 6s 108ms/step - loss: 14.7509 - acc: 0.0848 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 9/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8948 - acc: 0.0759 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 10/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8228 - acc: 0.0804 - val_loss: 14.8719 - val_acc: 0.0773
Because its not learning anything, I'm starting to think that it's not my models fault, but maybe the dataset or how I train it. I have tried training with "adam" as well but get the same result. I try to change the input size of the images but still, same result. Below is a sample from my dataset. Do you guys have any ideas what could be wrong?
I think the main issue currently is that your model has way too many parameters relative to how few samples you have for training. For image classification nowadays, you generally want to just have conv layers, a GlobalSomethingPooling layer, and then a single Dense layer for your outputs. You just need to make sure that your conv section ends up with a large enough receptive field to be able to find all the features you need.
The first thing to think about is making sure that you have a large enough receptive field (further reading about that here). There are three main ways to achieve that: pooling, stride >= 2, and/or dilation >= 2. Because you have only 13 "features" that you want to identify, and all of them will always be pixel-perfect, I'm thinking dilation will be the way to go so that the model can easily "overfit" on those 13 "features". If we use 4 conv layers with dilations of 1, 2, 4, and 8, respectively, then we'll end up with a receptive field of 31. This should be enough to easily recognize 50-pixel emoji.
Next, how many filters should each layer have? Normally, you start with a few filters, and increase as you go through the model, as you are doing here. However, we want to "overfit" on specific features, so we should probably increase the amounts in the earlier layers. Just to make it easy, let's give all layers 64 filters.
Last, how do we convert this all to a single prediction? Rather than using a dense layer, which would use a ton of parameters and would not be translation invariant, people nowadays use GlobalAveragePooling or GlobalMaxPooling. GlobalAveragePooling is more common, because it's good for helping to find combinations of many features. However, we just want to find 13 or so exact features here, so GlobalMaxPooling might work even better. Then a single dense layer after that will be enough to get a prediction. Because we're using GlobalMaxPooling, it doesn't need to be flattened first—global pooling already does that for us.
Resulting model:
Conv2D(64, kernel_size=3, activation="relu", input_shape=IMG_SIZE)
Conv2D(64, kernel_size=3, dilation_rate=2, activation="relu")
Conv2D(64, kernel_size=3, dilation_rate=4, activation="relu")
Conv2D(64, kernel_size=3, dilation_rate=8, activation="relu")
GlobalMaxPooling()
Dense(NUM_CLASSES, activation='softmax', name="Output")
Try that. You'll probably also want to add BatchNormalization layers in there after every layers except the last. Once you get decent training accuracy, check if it's overfitting. If it is (which is likely), try these steps:
batch norm if you haven't already
weight decay on all conv and dense layers
SpatialDropout2D after the conv layers and regular Dropout after the pooling layer
augment your data with an ImageDataGenerator
Designing nets like this is almost more of an art than a science, so change stuff around if you think you should. Eventually, you will get an intuition for what is more or less likely to work in any given situation.
Related
I have a LSTM model that predicts tomorrow's water outflow volume based on today's outflow volume, temperature, and precipitation.
model = Sequential()
model.add(LSTM(units=24, return_sequences=True,
input_shape=(X_Train.shape[1],X_Train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(20, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
history = model.fit(X_Train, Y_Train, epochs=8,
validation_data=(X_Test, Y_Test))
While training I got:
Epoch 1/8
4638/4638 [==============================] - 78s 17ms/step - loss:
1.9951e-04 - val_loss: 1.5074e-04
Epoch 2/8
4638/4638 [==============================] - 77s 17ms/step - loss:
9.6735e-05 - val_loss: 1.0922e-04
Epoch 3/8
4638/4638 [==============================] - 78s 17ms/step - loss:
6.5202e-05 - val_loss: 5.9079e-05
Epoch 4/8
4638/4638 [==============================] - 77s 17ms/step - loss:
5.1011e-05 - val_loss: 4.9478e-05
Epoch 5/8
4638/4638 [==============================] - 77s 17ms/step - loss:
4.3992e-05 - val_loss: 5.1148e-05
Epoch 6/8
4638/4638 [==============================] - 77s 17ms/step - loss:
3.9901e-05 - val_loss: 4.2351e-05
Epoch 7/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.6884e-05 - val_loss: 4.0763e-05
Epoch 8/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.5287e-05 - val_loss: 3.6736e-05
But when I manually calculate the mean-square error, I get a different result
mean_square_root = mean_squared_error(predicted_y_values_unnor, Y_test_actual)
130.755469707972
Manual Calculation:
I wanted to know why they would the validation loss be different while training than while calculating manually. How is the loss calculated while training?
The loss you have chosen is mean_squared_error in the line
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
That is the loss your LSTM model is minimizing.
The Mean Squared Error, or MSE, loss is the default loss to use for regression problems. Mean squared error is calculated as the average of the squared differences between the predicted and actual values. The result is always positive regardless of the sign of the predicted and actual values and a perfect value is 0.0. The squaring means that larger mistakes result in more error than smaller mistakes, meaning that the model is punished for making larger mistakes.
LSTM is a general model and you can choose lots of different loss functions. Here is keras inbuilt available functions list
https://keras.io/api/losses/
You need to select a loss function as per your problem.
I was just having the same exact problem and i solved it with flatten() function.
When you do: "mean_squared_error(predicted_y_values_unnor, Y_test_actual)" the shape of "Y_test_actual" is just a one dimensional array. Example: array([297, 290, 308, 308, 214]) but the other argument is a bidimensional array in wich one of the dimensions is 1. Example: "array([300.53693, 299.9416 , 295.61334, 218.2563 , 219.74983],
dtype=float32)"
you just have to do flatten() on "predicted_y_values_unnor"
I have problem with image classification using Keras. I always got poor accuracy with only 0.02. I tried to follow cat and dog classification which has 0.8 in accuracy but it could not work in my case with 30 classes.
Let say I have datasets with around 100K of images and categorized within 30 classes. I split it with 80% for training and 20% validation.
The structure folder is look like this.
|-train
|---category1
|---category2
|---category3
|---category4
|---.....
|---category30
|
|-validation
|---category1
|---category2
|---category3
|---category4
|---.....
|---category30
Each category in train folder contains around 2000 to 4000 images.
my model
model = tf.keras.Sequential([
Conv2D(kernel_size=3, filters=16, padding='same', activation='relu', input_shape=[150,150, 3]),
Conv2D(kernel_size=3, filters=30, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=60, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=90, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=110, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=130, padding='same', activation='relu'),
Conv2D(kernel_size=1, filters=40, padding='same', activation='relu'),
GlobalAveragePooling2D(),
Dense(1,'sigmoid'),
Activation('softmax')
])
model.compile(optimizer=keras.optimizers.Adam(lr=.00001),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
Train the datasets
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=10,
validation_data=validation_generator,
validation_steps=50,
verbose=2)
And I always got low accuracy like 0.02 or 0.03
Epoch 6/10
100/100 - 172s - loss: -1.8906e+01 - accuracy: 0.0265 - val_loss: -1.8923e+01 - val_accuracy: 0.0270
Epoch 7/10
100/100 - 171s - loss: -1.8773e+01 - accuracy: 0.0230 - val_loss: -1.8396e+01 - val_accuracy: 0.0330
Epoch 8/10
100/100 - 170s - loss: -1.8780e+01 - accuracy: 0.0295 - val_loss: -1.9882e+01 - val_accuracy: 0.0180
Epoch 9/10
100/100 - 170s - loss: -1.8895e+01 - accuracy: 0.0240 - val_loss: -1.8572e+01 - val_accuracy: 0.0210
Epoch 10/10
100/100 - 170s - loss: -1.9091e+01 - accuracy: 0.0265 - val_loss: -1.8685e+01 - val_accuracy: 0.0300
So how can I improve my model? is there something wrong?
You should have as many neurons in your final layer as you have classes. So your final dense layer should be:
Dense(n_classes),
Activation('softmax')
Also, since your task is not binary classification, your loss function should be:
loss=tf.keras.losses.CategoricalCrossentropy()
from_logits=True should only be set to true if you don't have an activation function on your final dense layer (which you have). If you want to keep from_logits=True, remove the softmax activation.
For this loss function, make sure that in your flow_from _directory call, that class_mode='categorical'.
One more thing, your learning rate seems very small. The default learning rate of 0.001 should be fine.
First of all, you should change the following lines:
Dense(1,'sigmoid'),
Activation('softmax')
into:
Dense(number_of_classes,'softmax'),
and down in the model.fit()
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False)
where number_of_classes is the number of categories in your case (30).
Second of all, since you have a very small number for each of your classes, you should use a pretrained network. A good starting point would be a ResNet50.
I am using the below LeNet architecture to train my image classification model , I have noticed that both train , val accuracy not improving for each iteration . Can any one expertise in this area explain what might have gone wrong ?
training samples - 110 images belonging to 2 classes.
validation - 50 images belonging to 2 classes.
#LeNet
import keras
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
#import dropout class if needed
from keras.layers import Dropout
from keras import regularizers
model = Sequential()
#Layer 1
#Conv Layer 1
model.add(Conv2D(filters = 6,
kernel_size = 5,
strides = 1,
activation = 'relu',
input_shape = (32,32,3)))
#Pooling layer 1
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Layer 2
#Conv Layer 2
model.add(Conv2D(filters = 16,
kernel_size = 5,
strides = 1,
activation = 'relu',
input_shape = (14,14,6)))
#Pooling Layer 2
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Flatten
model.add(Flatten())
#Layer 3
#Fully connected layer 1
model.add(Dense(units=128,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 4
#Fully connected layer 2
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#layer 5
#Fully connected layer 3
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#layer 6
#Fully connected layer 4
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 7
#Output Layer
model.add(Dense(units = 2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
from keras.preprocessing.image import ImageDataGenerator
#Image Augmentation
train_datagen = ImageDataGenerator(
rescale=1./255, #rescaling pixel value bw 0 and 1
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#Just Feature scaling
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'/Dataset/Skin_cancer/training',
target_size=(32, 32),
batch_size=32,
class_mode='categorical')
test_set = test_datagen.flow_from_directory(
'/Dataset/Skin_cancer/testing',
target_size=(32, 32),
batch_size=32,
class_mode='categorical')
model.fit_generator(
training_set,
steps_per_epoch=50, #number of input (image)
epochs=25,
validation_data=test_set,
validation_steps=10) # number of training sample
Epoch 1/25
50/50 [==============================] - 52s 1s/step - loss: 0.8568 - accuracy: 0.4963 - val_loss: 0.7004 - val_accuracy: 0.5000
Epoch 2/25
50/50 [==============================] - 50s 1s/step - loss: 0.6940 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/25
50/50 [==============================] - 48s 967ms/step - loss: 0.6932 - accuracy: 0.5065 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 4/25
50/50 [==============================] - 50s 1s/step - loss: 0.6932 - accuracy: 0.4824 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 5/25
50/50 [==============================] - 49s 974ms/step - loss: 0.6932 - accuracy: 0.4949 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 6/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4854 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 7/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5015 - val_loss: 0.6918 - val_accuracy: 0.5000
Epoch 8/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4986 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 9/25
50/50 [==============================] - 49s 973ms/step - loss: 0.6932 - accuracy: 0.5000 - val_loss: 0.6929 - val_accuracy: 0.5000
Epoch 10/25
50/50 [==============================] - 50s 1s/step - loss: 0.6931 - accuracy: 0.5044 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 11/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5022 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 12/25
Most importantly is that you are using loss = 'categorical_crossentropy', change it to loss = 'binary_crossentropy' as you have just 2 classes. And also change class_mode='categorical' to class_mode='binary' in flow_from_directory.
As #desertnaut rightly mentioned, categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid.
Other Improvements:
You have very limited data (160 images) and you have used almost 50% of data as validation data.
As you are building the model for image classification, you just have two Conv2D Layer and 4 dense Layer. The Dense layers are adding huge amount of weights to be learnt. Add few more conv2d layer and reduce the Dense layer.
Set batch_size = 1 and remove steps_per_epoch. As you have very less input let every epoch have same number of steps as input records.
Use the default glorot_uniform kernel initializer.
To further tune your model, build model using multiple Conv2D layer, followed by GlobalAveragePooling2D layer and FC Layer and final softmax layer.
Use Data Augmentation technique like horizontal_flip, vertical_flip, shear_range, zoom_range of ImageDataGenerator to increase the number of training and validation images.
Moving the comments to answer section as suggested by #desertnaut -
Question - Thanks ! Yes , less data is the problem I figured . One additional
question - why is that adding more dense layer than conv layer
negatively affecting the model, is there any rule to follow when we
decide how many conv and dense layer we gonna use ? –
Arun_Ramji_Shanmugam 2 days ago
Answer - To answer the first part of your question, Conv2D layer maintains the
spatial information of the image and weights to be learnt depend on
the kernel size and stride mentioned in the layer,where as the Dense
layer needs the output of Conv2D to be flattened and used further
hence losing the spatial information. Also dense layer adds more
number of weights, for example 2 dense layers of 512 adds
(512*512)=262144 params or weights to the model(has to be learnt by
the model).That means you have to train for more number of epochs and
with good hype parameters settings for learning of these weights. –
Tensorflow Warriors 2 days ago
Answer - To answer the second part of your question,use systematic experiments
to discover what works best for your specific dataset. Also it depends
on processing power you hold. Remember, deeper networks is always
better, at the cost of more data and increased complexity of learning.
A conventional approach is to look for similar problems and deep
learning architectures which have already been shown to work. Also we
have the flexibility to utilize the pretrained models like resnet, vgg
etc, use these models by freezing the part of the layers and training
on remaining layers. – Tensorflow Warriors 2 days ago
Question - Thank you for detailed answer !! If you don't bother one more question
- so when we are using already trained model (may be some layers) , isn't it required to be trained on same input data as the one we gonna
work ? – Arun_Ramji_Shanmugam yesterday
Answer - The intuition behind transfer learning for image classification is
that if a model is trained on a large and general enough dataset, this
model will effectively serve as a generic model of the visual world.
You can find transfer learning example with explanation here -
tensorflow.org/tutorials/images/transfer_learning . – Tensorflow
Warriors yesterday
Remove all kernel_initializer='uniform' arguments from your layers; don't specify anything here, the default initializer glorot_uniform is the highly recommended one (and the uniform is a particularly bad one).
As a general rule, keep in mind that the default values for such rather advanced settings are there for your convenience, they are implicitly recommended, and you should better not mess with them unless you have specific reasons to do so and you know exactly what you are doing.
For the kernel_initializer argument in particular, I have started believing that it has caused a lot of unnecessary pain to people (just see here for the most recent example).
Also, dropout should not be used by default, especially in cases like here where the model seems to struggle to learn anything; start without any dropout (comment out the respective layers), and only add it back if you see signs of overfitting.
I'm creating a very simple 2 layer feed forward network but am finding that the loss is not updating at all. I have some ideas but I wanted to get additional feedback/guidance.
Details about the data:
X_train:
(336876, 158)
X_dev:
(42109, 158)
Y_train counts:
0 285793
1 51083
Name: default, dtype: int64
Y_dev counts:
0 35724
1 6385
Name: default, dtype: int64
And here is my model architecture:
# define the architecture of the network
model = Sequential()
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform", activation="relu"))
model.add(Dense(3print("[INFO] compiling model...")
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)Dense(1, activation = 'sigmoid'))
Now, with this, my loss after the first few epochs are as follows:
Epoch 1/12
336876/336876 [==============================] - 8s - loss: 2.4441 - acc: 0.8484
Epoch 2/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 3/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 4/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 5/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 6/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 7/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 8/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 9/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
And when I test the model after, my f1_score is 0. My main thought was that I may need more data but I'd still expect it to perform better than it is now on the test set. Could it be that it is overfitting? I added Dropout but no luck there either.
Any help would be much appreciated.
at first glance, I believe that your learning rate is too high. Also, please consider normalizing your data especially if different features have different ranges of values (look at Scaling). Also, please consider changing your layer activations depending on whether your labels are multi-class or not. Assuming your code is of this form (you seem to have some typos in problem description):
# define the architecture of the network
model = Sequential()
#also what is the init="uniform" argument? I did not find this in keras documentation, consider removing this.
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform",
activation="relu"))
model.add(Dense(1, activation = 'sigmoid')))
#a slightly more conservative learning rate, play around with this.
adam = Adam(lr=0.0001)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128,
verbose=1)
This should lead the loss to converge. If not, please consider deepening your neural net (think about how many parameters you may need).
Consider adding the classification layer before compiling your model.
model.add(Dense(1, activation = 'sigmoid'))
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)
I have a dataset on which I train a DNN model.
My dataset is like this (single example row):
hash name surname datetime total nb_success nb_loss
axBBdnn78 aaa bbb 2016-01-01 00:01:26 50.00 1 2
I replaced the header names (confidentiality), but they are all important. The value I try to predict is the state column (present on the dataset), which can take 2 values: 'ok' and 'nok'.
By doing some preparation on the dataset and one-encoding the string with this code:
data = data.select_dtypes(exclude=['number']).apply(LabelEncoder().fit_transform).join(data.select_dtypes(include=['number']))
I then got my final training set, which looks like this:
hash name surname datetime total nb_success nb_loss
1696 4 37 01 50.00 1 2
I then use the following Keras DNN model:
model = Sequential()
model.add(Dense(16, input_shape=(7,)))
model.add(Activation('relu'))
# model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Dense(1)) # 2 outputs possible (ok or nok)
model.add(Activation('relu'))
model.compile(loss='mse', optimizer='sgd', metrics=['accuracy'])
But, neither my loss decreases, or my accuracy increases.
5000/5000 [==============================] - 0s - loss: 0.5070 - acc: 0.4930 - val_loss: 0.4900 - val_acc: 0.5100
Epoch 2/10
5000/5000 [==============================] - 0s - loss: 0.5054 - acc: 0.4946 - val_loss: 0.5100 - val_acc: 0.4900
Epoch 3/10
5000/5000 [==============================] - 0s - loss: 0.5112 - acc: 0.4888 - val_loss: 0.4140 - val_acc: 0.5860
Epoch 4/10
5000/5000 [==============================] - 0s - loss: 0.4900 - acc: 0.5100 - val_loss: 0.4660 - val_acc: 0.5340
I tried several other loss functions as well as other optimizers, but each time I only achieve roughly 50% accuracy (so, nothing since the output is 2 classes).
I have two questions:
Is my one-encoding method correct?
Which part do I miss for the model to actually train/converge?
You have label-encoded your data but not one-hot encoded it. For every label you have in each of your features you need to create one binary label.
E.g.: If you have 4 names than you need 3 binary labels that represent the 4 possible states.
You can use Scikit-learn´s OneHotEncoder: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
One of the reasons for this is that right now name #37 is "more" or "higher" than name #36 or #10 or #5 which doesn´t make any sense (since these are categorical values rather than continuous ones)
On your last Dense layer you need to use sigmoid as the activation function.
One more thing: You might need to increase your models size. And you have two lines of model.add(Activation('relu')) in your code.