I am new to Deep Learning and am using Keras to learn it. I followed instructions at this link to build a handwritten digit recognition classifier using MNIST dataset. It worked fine in terms of seeing comparable evaluation results. I used tensorflow as the backend of Keras.
Now I want to read an image file with a handwritten digit and predict its digit using the same model. I think the image needs to be transformed to be in 28x28 dimension with 255 depth first? I am not sure whether my understanding is correct to begin with. If so, how can I do this transformation in Python? If my understanding is incorrect, what kind of transformation is required?
Thank you in advance!
To my knowledge, you will need to turn this into a 28x28 grayscale image in order to work with this in Python. That's the same shape and scheme as the images that were used to train MNIST, and the tensors are all expecting 784 (28 * 28)-sized items, each with a value between 0-255 in their tensors as input.
To resize an image you could use PIL or Pillow. See this SO post or this page in the Pillow docs (linked to by Wtower in the previously mentioned post, copied here for ease of accesson resizing and keeping aspect ratio, if that's what you want to do.
HTH!
Cheers,
-Maashu
Related
I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.
I have trained my CNN in Tensorflow using MNIST data set; when I tested it, it worked very well using the test data. Even, to prove my model in a better way, I made another set taking images from train and test set randomly. All the images that I took from those set, at the same time, I deleted and I didn't give them to my model. It worked very well too, but with a dowloaded image from Google, it doesn't classify well, so my question is: should I have to apply any filter to that image before I give it to the prediction part?
I resized the image and converted it to gray scale before.
MNIST is an easy dataset. Your model (CNN) structure may do quite well for MNIST, but there is no guarantee that it does well for more complex images too. You can add some more layers and check different activation functions (like Relu, Elu, etc.). Normalizing your image pixel values for small values like between -1 and 1 may help too.
I am trying to do handwriting character recognition using Tensorflow in Google-colab.
I have trained and tested model with an accuracy of 91%
I tried it on image given in the tutorial, and it worked correctly.
it was 28*28 resized.
When I wanted to try it on my input-image, it is predicting wrong results as 2,3, but my input-image is of 'digit-6'.
the problem may be in image-operations and before passing to model.
also, further I wanted to pass that image for realtime-recognition.
I am doing resizing, inverting of the image, to make it compatible with my trained labels.
OpenCV input image is represented opposite-notation of tensorflow labels, as the current matrix represents black as 0 and white as 255.
my GitHub Jupyter-notebook file is followed from tutorial of digitalocean's blog
How can I upload an image taken from a phone/webcam and recognize characters from that image?
where I am making mistakes in processing image?
further, I wanted to pass that image in a project - real-time recognition of characters
testing images are
do you know Mnist data set is restricted with padding of images?
appropriate realtime image processing is needed.
This is useful article about that
https://link.medium.com/0ySCmyMpzU
and following is my project about simple mnist game
https://github.com/mym0404/Math-Writer
For my Deep Learning Course, I need to implement a neural network which is exactly the same as the Tensorflow MNIST for Experts Tutorial. ,
The only difference is that I need to down-sampşe the database, then put it into the neural network. Should I crop and resize, or should I implement the neural network with parameters which accepts multiple data sizes(28x28 and 14x14).
All of the parameters in the tensorflow tutorial is static so I couldn't find a way to feed the algorithm with a 14x14 image. Which tool should I use for 'optimal' down-sampling?
You need resize the input images to a fixed size (which appears tp be 14*14 from your description). There are different ways for doing this, for example, you can use interpolation to resize, simply crop the central part or some corner of the image, or randomly chose one or many patches (all of the same size as your network's input) from a give image. You can also combine these methods. For example, in VGG, they first do a aspect preserving resize using bilinear interpolation and then get a random patch from the resulting image (for test phase they get the central crop). You can find VGG's preprocessing source code in TensorFlow at the following link:
https://github.com/tensorflow/models/blob/master/slim/preprocessing/vgg_preprocessing.py
The only parameters of sample code in the tutorial you have mentioned that needs to be changed are those related to the input image sizes. For example, you need to change 28s to 14s and 784s to 228s (these are just examples, there are other wight sizes that you will need to change as well).
I have been working on MNIST dataset to learn how to use Tensorflow and Python for my deep learning course.
I could read the data internally/externally and also train it in softmax and cnn thanks to tensorflow tutorial at website. At the end, I could get >%90 in softmax, >%98 in cnn, accuracy.
My problem is that I want to resize all images on MNIST as 14x14 and train it again, also to augment all (noising, rotating etc.) and train again. At the end, I want to be able to compare the accuracies of these three different dataset.
Could you please help me to solve it? How to resize all images and how the model should change.
Thanks!
One way to resize images is using the scipy resize function:
from scipy.misc import imresize
img = imresize(yourimage, (14, 14))
But my real advice to you is that should take a look at the Kadenze course "Creative applications of deep learning". This is a notebook for lecture two: https://github.com/pkmital/CADL/blob/master/session-2/lecture-2.ipynb
This course is really good at helping you understand using images and Tensorflow.
What you need is some image processing library like OpenCV, PIL etc. If you are using the dataset downloaded from tensorflow, it will be a 3d array( array of 2d arrays(every image)) or have more dimensions depending on how it's stored (I'm not sure) you can treat numpy arrays as images and use them with any image processing library you like but make sure what datatype they are in and if it's compatible with the libraries you are using.
Also, tensorflow also has such functions if you want to keep it all in tensorflow.
this post has an accepted answer.