I am training a Convolutional Neural Network to recognize MRZ(Machine Readable Zone) characters, on a smartphone. I want to know if in order to improve accuracy I should train it with multiple fonts, even if MRZ only uses OCR-B. Also, the model does not perform on device with the same level of accuracy as in the python code I use to train/test it. Any ideas?
This is the architecture I'm using:
model = Sequential()
model.add(Convolution2D(filters=32, kernel_size=(3, 3), strides=(2, 2), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Convolution2D(filters=64, kernel_size=(1, 1), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
If MRZ use only one font, then you should use only this font to train your CNN.
In order to improve results, you should preprocess the image before passing it to the CNN, for example, at first identify text zones in an image and then pass them through CNN.
The accuracy of the model can change from a device to another because of processing unit architecture, for example, CPU and GPU will get different results due to numerical stability.
Related
I wrote very simple cnn code with spectrogram images, but
the accuracy is only 0.3~0.4
What do i have to add the other option to improve accuracy?
model.add(Conv2D(32, (3, 3), input_shape=X_train.shape[1:], padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(14))
model.add(Activation('softmax'))```
With the information you provide, there is zero chance to help you with the problem. The definition of your model looks correct (but you missed an activation function after the first dense layer if this is by accident). So here are some considerations:
Do you train long enough? Your model is quite big and therefore needs a long time to converge AND a large dataset to train with.
Is your dataset large enough and contains enough variance? When your dataset doesn't represent your problem well, you can't train.
Take a look at the Loss curves of your validation AND training set. Are you overfitting/underfitting?
Do you correctly normalize and preprocess your dataset? Try to transform the values of the images to a range of -1 to 1 or 0 to 1 with a float datatype.
Is your dataset balanced? As you are softmaxing 14 classes, you need a balanced dataset in order to train every single class.
Hoped this helped a little, if you need further help please provide some detailed descriptions of your problem and what are you doing in your whole process.
So I train my model with a dataset and for each epoch I can see the loss and val_loss go down (it is important to note that val_loss will go down to a certain point, but then it stagnates as well, having some minor ups and downs) and accuracy go up but for some reason my val_accuracy stays at roughly 0.33.
I browsed this and it seems to be a problem of overfitting so i added Dropout layers and regularization by using l2 on some layers of the model but it seems to have no effect. Therefore I would like to ask you what do you think I could improve in my model in order to make the val_loss keep going down and my val_accuracy not stagnate and therefore keep going up.
I've tried to use more images but the problem seems to be the same.. Not sure if my increment of images was enough tho.
Should I add Dropout layers in the Conv2D layers?
Should I use less or more l2 regularization?
Should I use even more images?
Just some questions that might have something to do with my problem.
My model is below:
model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), input_shape=(580, 360, 1), padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.02)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.02)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(532, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(266, activation='softmax'))
model.add(Reshape((7, 38)))
print(model.summary())
optimizer = keras.optimizers.SGD(lr=0.00001)
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
Thanks in advance!
PS: Here is the graph of training:
PS2: Here is the end of training:
209/209 [==============================] - 68s 327ms/step - loss: 0.7421 - accuracy: 0.9160 - val_loss: 3.8159 - val_accuracy: 0.3152
Epoch 40/40
This seems to be a classic overfitting problem.
It would be nice to have a more detailed introduction to the problem, like is it a classification task? Are your images grayscale? What is the purpose of this network?
With this information I would say that any proper regularization to the network should help. Some item you could try:
For conv layers I recommend using SpatialDropout layers.
Get more data (if possible)
Use data augmentation (if possible)
Increase the rate of the dropout layers
Try reducing the complexity of your model architecture (maybe fewer layers, fewer number of filters in general, fewer number of neurons in dense layers, etc.)
Hope this helps!
Just a hint :
You have a problem with your CNN architecture, the size must be lower and lower at each convolution, but in your case it is growing: you have 16, 32, 64, 64, 128. You should do that in the reverse manner. Start from input_shape=(580,360) and then you may go, let us say to shapes 256, 128, 64, 32 for Conv2D.
Let's say I have binary classification task and I'm using a CNN. Simply visualizing the CNN isn't very helpful as the input isn't images. However, I would like to know which particular filters contribute the most for an input sample to be considered a particular class.
Given the following architecture (implemented using Keras), how do I achieve this?
model = Sequential()
model.add(Conv2D(32, kernel_size=(10, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (10, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
I explored resources A and B. But neither seem helpful for what I want to do. If there are other suggestions for understanding what a network learns for non-image datasets, that'd be really helpful.
I want to add some extra information to a CNN as gender, age, a vector...
My CNN have as inputs matrices that represent voice histograms with dimensions 125x64. Since they are from different persons, I would like to add that information to the model. Besides, I would like to add some vector 125x1 who represents the pitch or the energy of the voice (getting from feature extraction) but I think is not a good idea to attach it to the histogram.
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='valid', strides=1,
input_shape=input_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 3), strides=(1, 3)))
model.add(Conv2D(32, (1, 3), padding='valid', strides=1,
input_shape=input_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3), strides=(1, 3)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
It indeed doesn't make much sense to add that data to the histogram. Keras has an explanation in their own documentation to how you use multiple inputs in a model: https://keras.io/getting-started/functional-api-guide/. The paragraph Multi-input and multi-output models seems to be what you're looking for.
I'm trying to build a CNN to play a game online. This game to be precise:
https://www.gameeapp.com/game-bot/ibBTDViUP
I've collected images and labels for each image. These labels tell the network to press SPACE (output 1) or do nothing (output 0).
I'm training the network using Keras, like this:
history = model.fit_generator(
train_generator,
steps_per_epoch=2000 // batch_size,
epochs=3,
validation_data=validation_generator,
validation_steps=800 // batch_size)
The network looks like this:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(275, 208, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
The thing is. Most of the time the network ends up always outputting 1 or always zero even when the images are completely unrelated to the game images.
Am I modelling this problem the right way?
How can I make the best way for the network to be able to Identify to "not do" anything.
Please let me know if the question isn't clear and Thanks in advance!
You want to do binary image classification ( binary : is - is not ) and i think your net looks good. In Binary Image Classification with CNN - best practices for choosing "negative" dataset? are general hints for training binary image classification networks. In https://medium.com/#kylepob61392/airplane-image-classification-using-a-keras-cnn-22be506fdb53 is a complete guide for setting up image classification network in keras. i am not sure about the training possibly use plain model.fit() like in https://medium.com/#kylepob61392/airplane-image-classification-using-a-keras-cnn-22be506fdb53