Scikit-learn SVM digit recognition - python

I want to make a program to recognize the digit in an image. I follow the tutorial in scikit learn .
I can train and fit the svm classifier like the following.
First, I import the libraries and dataset
from sklearn import datasets, svm, metrics
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
Second, I create the SVM model and train it with the dataset.
classifier = svm.SVC(gamma = 0.001)
classifier.fit(data[:n_samples], digits.target[:n_samples])
And then, I try to read my own image and use the function predict() to recognize the digit.
Here is my image:
I reshape the image into (8, 8) and then convert it to a 1D array.
img = misc.imread("w1.jpg")
img = misc.imresize(img, (8, 8))
img = img[:, :, 0]
Finally, when I print out the prediction, it returns [1]
predicted = classifier.predict(img.reshape((1,img.shape[0]*img.shape[1] )))
print predicted
Whatever I user others images, it still returns [1]
When I print out the "default" dataset of number "9", it looks like:
My image number "9" :
You can see the non-zero number is quite large for my image.
I dont know why. I am looking for help to solve my problem. Thanks

My best bet would be that there is a problem with your data types and array shapes.
It looks like you are training on numpy arrays that are of the type np.float64 (or possibly np.float32 on 32 bit systems, I don't remember) and where each image has the shape (64,).
Meanwhile your input image for prediction, after the resizing operation in your code, is of type uint8 and shape (1, 64).
I would first try changing the shape of your input image since dtype conversions often just work as you would expect. So change this line:
predicted = classifier.predict(img.reshape((1,img.shape[0]*img.shape[1] )))
to this:
predicted = classifier.predict(img.reshape(img.shape[0]*img.shape[1]))
If that doesn't fix it, you can always try recasting the data type as well with
img = img.astype(digits.images.dtype).
I hope that helps. Debugging by proxy is a lot harder than actually sitting in front of your computer :)
Edit: According to the SciPy documentation, the training data contains integer values from 0 to 16. The values in your input image should be scaled to fit the same interval. (http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)

1) You need to create your own training set - based on data similar to what you will be making predictions. The call to datasets.load_digits() in scikit-learn is loading a preprocessed version of the MNIST Digits dataset, which, for all we know, could have very different images to the ones that you are trying to recognise.
2) You need to set the parameters of your classifier properly. The call to svm.SVC(gamma = 0.001) is just choosing an arbitrary value of the gamma parameter in SVC, which may not be the best option. In addition, you are not configuring the C parameter - which is pretty important for SVMs. I'd bet that this is one of the reasons why your output is 'always 1'.
3) Whatever final settings you choose for your model, you'll need to use a cross-validation scheme to ensure that the algorithm is effectively learning
There's a lot of Machine Learning theory behind this, but, as a good start, I would really recommend to have a look at SVM - scikit-learn for a more in-depth description of how the SVC implementation in sickit-learn works, and GridSearchCV for a simple technique for parameter setting.

It's just a guess but... The Training set from Sk-Learn are black numbers on a white background. And you are trying to predict numbers which are white on a black background...
I think you should either train on your training set, or train on the negative version of your pictures.
I hope this help !

If you look at:
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits
you can see that each point in the matrix as a value between 0-16.
You can try to transform the values of the image to between 0-16. I did it and now the prediction works well for the digit 9 but not for 8 and 6. It doesn't give 1 any more.
from sklearn import datasets, svm, metrics
import cv2
import numpy as np
# Load digit database
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
# Train SVM classifier
classifier = svm.SVC(gamma = 0.001)
classifier.fit(data[:n_samples], digits.target[:n_samples])
# Read image "9"
img = cv2.imread("w1.jpg")
img = img[:,:,0];
img = cv2.resize(img, (8, 8))
# Normalize the values in the image to 0-16
minValueInImage = np.min(img)
maxValueInImage = np.max(img)
normaliizeImg = np.floor(np.divide((img - minValueInImage).astype(np.float),(maxValueInImage-minValueInImage).astype(np.float))*16)
# Predict
predicted = classifier.predict(normaliizeImg.reshape((1,normaliizeImg.shape[0]*normaliizeImg.shape[1] )))
print predicted

I have solved this problem using below methods:
check the number of attributes, too large or too small.
check the scale of your gray value, I change to [0,16].
check data type, I change it to uint8.
check the number of training data, too small or not.
I hope it helps. ^.^

Hi in addition to #carrdelling respond, i will add that you may use the same training set, if you normalize your images to have the same range of value.
For example you could binaries your data ( 1 if > 0, 0 else ) or you could divide by the maximum intensity in your image to have an arbitrary interval [0;1].

You probably want to extract features relevant to to your data set from the images and train your model on them.
One example I copied from here.
surf = cv2.SURF(400)
kp, des = surf.detectAndCompute(img,None)
But the SURF features may not be the most useful or relevant to your dataset and training task. You should try others too like HOG or others.
Remember this more high level the features you extract the more general/error-tolerant your model will be to unseen images. However, you may be sacrificing accuracy in your known samples and test cases.

Related

SVM testing - normalization of test data [duplicate]

This question already has an answer here:
what is the difference between fit() ,fit_transform() and transform() in scikit_learn? [duplicate]
(1 answer)
Closed 1 year ago.
I'm working with SVM model to classify 5 different classes. (N1, N2, N3, W, R)
Feature extractions -> Data normalization -> train SVM
when I tested the model (20%, 80% usual train-test-split), it shows high accuracy enter image description here
But when I tried testing with a completely new dataset, with the same method of
Feature extractions -> Data normalization -> test on trained SVM model
It came out really badly.
Let's say the original dataset used in training is A, and the new test dataset is B.
when I trained the model only with A and tested B, it came out really badly.
First I thought it was model overfitting so I included A and B to train the model and tested with B. It came out badly again...
I think the problem is the normaliztion process. It eventually worked when I tried new dataset C, but this time I brought the train A data, concatenated A+C to normalize, and then cut only C dataset out from it. And when I compared that with the data C normalized alone, it was different..
I used MinMaxScaler from sklearn.
I mean mathematically speaking of course it's different.. because every dataset has different minimum maximum value and normalized data will be different when mixed with other data.
My question is, when you test with new dataset, is it normal to bring the train dataset to normalize it together and then take out the test datapart only?? It's like mixing A(112x12), B(15x12) -> normalize (127x12) together -> take out (15x12)
Or should I start from fixing the code from feature extraction and training SVM?
(I attached the code, and each feature has 12x1 shape which means each stage has 12xN matrix.)
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
# Load training data
N1_train = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Train_N1_features")
N2_train = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Train_N2_features")
N3_train = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Train_N3_features")
W_train = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Train_W_features")
R_train = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Train_R_features")
# Load test data
N1_test = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Test_N1_features")
N2_test = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Test_N2_features")
N3_test = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Test_N3_features")
W_test = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Test_W_features")
R_test = pd.read_pickle("C:/Users/User/Desktop/EWHADATASETS/Features/Test_R_features")
# normalize with original raw features and take only test out
N1_scaled_test = features.normalize_together(N1_test, N1_train, "N1")
N2_scaled_test = features.normalize_together(N2_test, N2_train, "N2")
N3_scaled_test = features.normalize_together(N3_test, N3_train, "N3")
W_scaled_test = features.normalize_together(W_test, W_train, "W")
R_scaled_test = features.normalize_together(R_test, R_train, "R")
def normalize_together(test, raw, stage_no):
together = pd.concat([test, raw], ignore_index=True)
scaled_test = pd.DataFrame(scaler.fit_transform(together.iloc[:, :-1]))
scaled_test['label'] = "{}".format(stage_no)
scaled_test = scaled_test.iloc[0:test.shape[0], :]
return scaled_test
Test data should remain unseen during training (includes preprocessing) - don't use both test + train data to compute a common normalisation factor. Normalise the training set. Separately, normalise the test set.
Why? It's vital to use an unseen test partition to evaluate your trained model. Otherwise you have not tested the ability for your model to generalise - imagine playing a game of cards where you have already have prior knowledge of the cards or order of the deck.

Understanding what exactly neural network is predicting in documentation example (MNIST)

I've taken a quick course in neural networks to better understand them and now I'm trying them out for myself in R. I'm following this documentation of Keras.
The way I understand what is happening:
We are inputting a series of images and transforming these images to numerical matrices based on the arrangement of the pixels and colors in those pixels. We then build a neural network model to learn the pattern of these arrangements, depending on the classification (0 to 9). We then use the model to predict which class an image belongs to. I'll be honest and admit I'm not entirely sure what y_train and x_train is. I simply see it as one training and one validation set so I'm not sure what the difference between x and y is.
My question:
I've followed the steps to the T and the model runs fine and the predictions look like they do in the documentation. Ultimately, the prediction looks like this:
I take this to mean that observation 1 in x_test is predicted to be a category 7.
However, looking at x_test it looks like this:
There is a 0 in every column and row, also if I scroll further down. This is where I get confused. I'm also not sure how I view the original images to view for myself how well they are predicting them. I would eventually like to draw a number myself in paint or so and then see if the model can predict it, but for that I need to first understand what is going on. I feel I am close but I just need a little nudge!
I think if you read more about the input and output layer's dimensions, that would help.
In your example:
Input layer:
A single training example of image has two dimensions 28*28, which is then converted to a single vector of dimension 784. This acts as the input layer for the neural network.
So for m training examples, your input layer will have dimensions (m, 784). Analogically speaking (to traditional ML systems), you can imagine that each pixel of an image is converted into a feature (or x1, x2, ... x784), and your training set is a dataframe with m rows and 784 columns, which is then fed into neural network to compute y_hat = f(x1,x2,x3,...x784).
Output layer:
As an output for our neural network, we want it to predict which number it is from 0 to 9. So for a single training example, the output layer has dimension 10, representing each number from 0 to 9 and for n testing examples the output layer would be a matrix with dimension n*10.
Our y is a vector of length n which would be something like [1,7,8,2,.....] containing true value for each testing example. But to match the dimension of output layer, the y vector's dimension are converted using one-hot encoding. Imagine a length 10 vector, representing number 7 by putting 1 at 7th place and rest of the positions zeros something like [0,0,0,0,0,0,1,0,0,0].
So in your question, if you wish to see the original image, you should be able to see it before reshaping the training examples with something like image(mnist$test$x[1, , ]
Hope this helps!!
y_train are the labels and x_train is the training data, so images in this example. You need to use some kind of plotting library to plot x'es. In this example you probably are not expected to input your own drawings and if you want you would need to preprocess them in the same way as in MNIST and pass them to the model.

Is it possible to use a different data set as input for prediction in AdaBoostRegressor (sklearn)?

I am just beginner of machine learning and I am playing sklearn now.
I copied the example of AdaBoostRegressor from official site at here and added the following.
X_pred = np.linspace (6, 12, 100)[:, np.newaxis]
y_pred = regr_2.predict(X_1)
As the training data set X is ranged from 0 to 6, I am trying to get a prediction for a different data set X_pred ranged from 6 to 12.
However, I found that the value of y_pred is always -1.05382839 which is the last value of training set output y.
I am wondering whether it is possible to use a non-input-sample data set as the input of prediction.
Is it possible to do that? If so what is the correct usage?
BTW, attached pic is the output.
Red and green are the predicated output based on training set input (0-6) and blue is the output of X_pred (6 - 12).
In short - no. This is not what regression is about. Regression is about interpolation, not extrapolation. Pretty much none of the regressors can make any predictions about data outside of the training set.

How can I use the Keras OCR example?

I found examples/image_ocr.py which seems to for OCR. Hence it should be possible to give the model an image and receive text. However, I have no idea how to do so. How do I feed the model with a new image? Which kind of preprocessing is necessary?
What I did
Installing the depencencies:
Install cairocffi: sudo apt-get install python-cairocffi
Install editdistance: sudo -H pip install editdistance
Change train to return the model and save the trained model.
Run the script to train the model.
Now I have a model.h5. What's next?
See https://github.com/MartinThoma/algorithms/tree/master/ML/ocr/keras for my current code. I know how to load the model (see below) and this seems to work. The problem is that I don't know how to feed new scans of images with text to the model.
Related side questions
What is CTC? Connectionist Temporal Classification?
Are there algorithms which reliably detect the rotation of a document?
Are there algorithms which reliably detect lines / text blocks / tables / images (hence make a reasonable segmentation)? I guess edge detection with smoothing and line-wise histograms already works reasonably well for that?
What I tried
#!/usr/bin/env python
from keras import backend as K
import keras
from keras.models import load_model
import os
from image_ocr import ctc_lambda_func, create_model, TextImageGenerator
from keras.layers import Lambda
from keras.utils.data_utils import get_file
import scipy.ndimage
import numpy
img_h = 64
img_w = 512
pool_size = 2
words_per_epoch = 16000
val_split = 0.2
val_words = int(words_per_epoch * (val_split))
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
fdir = os.path.dirname(get_file('wordlists.tgz',
origin='http://www.mythic-ai.com/datasets/wordlists.tgz', untar=True))
img_gen = TextImageGenerator(monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),
bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),
minibatch_size=32,
img_w=img_w,
img_h=img_h,
downsample_factor=(pool_size ** 2),
val_split=words_per_epoch - val_words
)
print("Input shape: {}".format(input_shape))
model, _, _ = create_model(input_shape, img_gen, pool_size, img_w, img_h)
model.load_weights("my_model.h5")
x = scipy.ndimage.imread('example.png', mode='L').transpose()
x = x.reshape(x.shape + (1,))
# Does not work
print(model.predict(x))
this gives
2017-07-05 22:07:58.695665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
File "eval_example.py", line 45, in <module>
print(model.predict(x))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1567, in predict
check_batch_axis=False)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 106, in _standardize_input_data
'Found: array with shape ' + str(data.shape))
ValueError: The model expects 4 arrays, but only received one array. Found: array with shape (512, 64, 1)
Well, I will try to answer everything you asked here:
As commented in the OCR code, Keras doesn't support losses with multiple parameters, so it calculated the NN loss in a lambda layer. What does this mean in this case?
The neural network may look confusing because it is using 4 inputs ([input_data, labels, input_length, label_length]) and loss_out as output. Besides input_data, everything else is information used only for calculating the loss, it means it is only used for training. We desire something like in line 468 of the original code:
Model(inputs=input_data, outputs=y_pred).summary()
which means "I have an image as input, please tell me what is written here". So how to achieve it?
1) Keep the original training code as it is, do the training normally;
2) After training, save this model Model(inputs=input_data, outputs=y_pred)in a .h5 file to be loaded wherever you want;
3) Do the prediction: if you take a look at the code, the input image is inverted and translated, so you can use this code to make it easy:
from scipy.misc import imread, imresize
#use width and height from your neural network here.
def load_for_nn(img_file):
image = imread(img_file, flatten=True)
image = imresize(image,(height, width))
image = image.T
images = np.ones((1,width,height)) #change 1 to any number of images you want to predict, here I just want to predict one
images[0] = image
images = images[:,:,:,np.newaxis]
images /= 255
return images
With the image loaded, let's do the prediction:
def predict_image(image_path): #insert the path of your image
image = load_for_nn(image_path) #load from the snippet code
raw_word = model.predict(image) #do the prediction with the neural network
final_word = decode_output(raw_word)[0] #the output of our neural network is only numbers. Use decode_output from image_ocr.py to get the desirable string.
return final_word
This should be enough. From my experience, the images used in the training are not good enough to make good predictions, I will release a code using other datasets that improved my results later if necessary.
Answering related questions:
What is CTC? Connectionist Temporal Classification?
It is a technique used to improve sequence classification. The original paper proves it improves results on discovering what is said in audio. In this case it is a sequence of characters. The explanation is a bit trick but you can find a good one here.
Are there algorithms which reliably detect the rotation of a document?
I am not sure but you could take a look at Attention mechanism in neural networks. I don't have any good link now but I know it could be the case.
Are there algorithms which reliably detect lines / text blocks / tables / images (hence make a reasonable segmentation)? I guess edge detection with smoothing and line-wise histograms already works reasonably well for that?
OpenCV implements Maximally Stable Extremal Regions (known as MSER). I really like the results of this algorithm, it is fast and was good enough for me when I needed.
As I said before, I will release a code soon. I will edit the question with the repository when I do, but I believe the information here is enough to get the example running.
Now I have a model.h5. What's next?
First I should comment that the model.h5 contains the weights of your network, if you wish to save the architecture of your network as well you should save it as a json like this example:
model_json = model_json = model.to_json()
with open("model_arch.json", "w") as json_file:
json_file.write(model_json)
Now, once you have your model and its weights you can load them on demand by doing the following:
json_file = open('model_arch.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
# if you already have a loaded model and dont need to save start from here
loaded_model.load_weights("model.h5")
# compile loaded model with certain specifications
sgd = SGD(lr=0.01)
loaded_model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
Then, with that loaded_module you can proceed to predict the classification of certain input like this:
prediction = loaded_model.predict(some_input, batch_size=20, verbose=0)
Which will return the classification of that input.
About the Side Questions:
CTC seems to be a term they are defining in the paper you refered, extracting from it says:
In what follows, we refer to the task of labelling un-
segmented data sequences as
temporal classification
(Kadous, 2002), and to our use of RNNs for this pur-
pose as
connectionist temporal classification
(CTC).
To compensate the rotation of a document, images, or similar you could either generate more data from your current one by applying such transformations (take a look at this blog post that explains a way to do that ), or you could use a Convolutional Neural Network approach, which also is actually what that Keras example you are using does, as we can see from that git:
This example uses a convolutional stack followed by a recurrent stack
and a CTC logloss function to perform optical character recognition
of generated text images.
You can check this tutorial that is related to what you are doing and where they also explain more about Convolutional Neural Networks.
Well this one is a broad question but to detect lines you could use the Hough Line Transform, or also Canny Edge Detection could be good options.
Edit: The error you are getting is because it is expected more parameters instead of 1, from the keras docs we can see:
predict(self, x, batch_size=32, verbose=0)
Raises ValueError: In case of mismatch between the provided input data and the model's expectations, or in case a stateful model receives a number of samples that is not a multiple of the batch size.
Here, you created a model that needs 4 inputs:
model = Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)
Your predict attempt, on the other hand, is loading just an image.
Hence the message: The model expects 4 arrays, but only received one array
From your code, the necessary inputs are:
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len],dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
The original code and your training work because they're using the TextImageGenerator. This generator cares to give you the four necessary inputs for the model.
So, what you have to do is to predict using the generator. As you have the fit_generator() method for training with the generator, you also have the predict_generator() method for predicting with the generator.
Now, for a complete answer and solution, I'd have to study your generator and see how it works (which would take me some time). But now you know what is to be done, you can probably figure it out.
You can either use the generator as it is, and predict probably a huge lot of data, or you can try to replicate a generator that will yield just one or a few images with the necessary labels, length and label length.
Or maybe, if possible, just create the 3 remaining arrays manually, but making sure they have the same shapes (except for the first, which is the batch size) as the generator outputs.
The one thing you must assert, though, is: have 4 arrays with the same shapes as the generator outputs, except for the first dimension.
Hi You can Look in to my github repo for the same. You need to train the model for type of images you want to do the ocr.
# USE GOOGLE COLAB
import matplotlib.pyplot as plt
import keras_ocr
images = [keras_ocr.tools.read("/content/sample_data/IMG_20200224_113657.jpg")] #Image path
pipeline = keras_ocr.pipeline.Pipeline()
prediction = pipeline.recognize(images)
x_max = 0
temp_str = ""
myfile = open("/content/sample_data/my_file.txt", "a+")#Text File Path to save text
for i in prediction[0]:
x_max_local = i[1][:, 0].max()
if x_max_local > x_max:
x_max = x_max_local
temp_str = temp_str + " " + i[0].ljust(15)
else:
x_max = 0
temp_str = temp_str + "\n"
myfile.write(temp_str)
print(temp_str)
temp_str = ""
myfile.close()

Preprocessing Image dataset with numpy for CNN:Memory Error

I have a dataset (71094 train images and 17000 test) for which i need to train a CNN.During preprocessing , i tried creating a matrix using numpy that turns out to be ridiculously large(71094*100*100*3 for the train data) [all images are RGB 100 by 100].. Hence i get a memory error.How do i tackle the situation.??Pls help .
This is my code..
import numpy as np
import cv2
from matplotlib import pyplot as plt
data_dir = './fashion-data/images/'
train_data = './fashion-data/train.txt'
test_data = './fashion-data/test.txt'
f = open(train_data, 'r').read()
ims = f.split('\n')
print len(ims)
train = np.zeros((71094, 100, 100, 3)) #this line causes the error..
for ix in range(train.shape[0]):
i = cv2.imread(data_dir + ims[ix] + '.jpg')
label = ims[ix].split('/')[0]
train[ix, :, :, :] = cv2.resize(i, (100, 100))
print train[0]
train_labels = np.zeros((71094, 1))
for ix in range(train_labels.shape[0]):
l = ims[ix].split('/')[0]
train_labels[ix] = int(l)
print train_labels[0]
np.save('./data/train', train)
np.save('./data/train_labels', train_labels)
I recently ran into the same problem, and I believe its a common problem when working with image data.
There are a number of methods you can use to tackle this problem depending on what you would like to do.
1) It can make sense to sample the data from each image when training, so not to train on ALL 71094*100*100 pixels. This can be done simply, by creating a function which loads one image at a time, and samples your pixels. There is some argument that doing this randomly for each epoch can reduce overfitting, but again depends on the exact problem. Stratified sampling may also help to balance the classes should you be working with pixel classification.
2) mini-batch training - split your training data into small "mini-batches" and train on each of these individually. Your epoch will end after you have completed training on all of your mini-batches, with all of your data. Here you should, each epoch, randomise the order of the data in order to avoid overfitting.
3) Load and train one image at a time - similar to mini-batch training, but just use one image as a "mini-batch" for each iteration, and run a for-loop through all of the images in the folder. this way only 1x100x100x3 is stored in memory at a time. depending on the size of your memory, you could perhaps use more than one image per mini batch i.e. - Nx100x100x3, and run for 71094/N iterations to go over all training data
I hope this was clear.. and that it helps somewhat!

Categories

Resources