CNTK evaluation for image classification - python

I built an image classifier using CNTK. The images are grayscale. Therefore, I entered the number of channels as 1. So, the model requires (1x64x64) data (64 being the image height and width).
The problem is, when I try to predict the class of a new image, it is seen as (64x64) only. So, the code errors out due to data mismatch.
Therefore, I reshaped the image using:
image_data = image_data.reshape((1, image_data.shape[0], image_data.shape[1]))
This generated (1x64x64) - which worked. Though the predictions are coming the same class for every image I select. I wonder if it is because of this reshaping or not. Can someone chime in? Thanks!

Reshaping your input would not affect the output of the model. If it is only predicting one class for every image, it is an issue with model training. I would suggest you try predicting on your training data to see if it only predicts one class on the training data. If that is the case, it is definitely a model training issue.

Related

Tensorflow Flower Classifier consistent predictions with varied input

I am following this tensorflow tutorial notebook to classify images of flowers:
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/images/classification.ipynb#scrollTo=U-e-XzMeyH2O
Everything seems OK until the final cell in the notebook where the trained model is used to predict the class of a new image. I am getting identical predictions for all inputs.
I tried adding:
print(predictions)
print(score)
Then predicting on the sample image (of a sunflower):
sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
outputs:
[[-2.1131027 -1.3355725 0.29224062 3.8924832 1.3749899 ]]
tf.Tensor([0.00220911 0.00480723 0.02448191 0.896212 0.07228985], shape=(5,), dtype=float32)
This image most likely belongs to sunflowers with a 89.62 percent confidence.
But if I just change the input to a picture of a rose, like:
sunflower_url = "https://images.photowall.com/products/64377/rose-flower.jpg"
outputs:
[[-2.1131027 -1.3355725 0.29224062 3.8924832 1.3749899 ]]
tf.Tensor([0.00220911 0.00480723 0.02448191 0.896212 0.07228985], shape=(5,), dtype=float32)
This image most likely belongs to sunflowers with a 89.62 percent confidence.
I have seen that there can be many model related issues (scaling / overfitting etc) which can cause identical outputs, however it seems strange that a tutorial example would fail in this way. So I suspect there is something more obvious that I am missing.
The issue here was related to the line
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)
When downloading a new image, it was NOT overwriting the stored image. So the model was making a prediction against the same input every time.
I manually defined the save location and made sure to get the input from that, then it worked.
Note that you do not need to rescale the image, this is handled within the predict() function from keras as already defined in the model.

How do we really clean or pre-process Image for Image Classification?

I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.

How do I have to process an image to test it in a CNN?

I have trained my CNN in Tensorflow using MNIST data set; when I tested it, it worked very well using the test data. Even, to prove my model in a better way, I made another set taking images from train and test set randomly. All the images that I took from those set, at the same time, I deleted and I didn't give them to my model. It worked very well too, but with a dowloaded image from Google, it doesn't classify well, so my question is: should I have to apply any filter to that image before I give it to the prediction part?
I resized the image and converted it to gray scale before.
MNIST is an easy dataset. Your model (CNN) structure may do quite well for MNIST, but there is no guarantee that it does well for more complex images too. You can add some more layers and check different activation functions (like Relu, Elu, etc.). Normalizing your image pixel values for small values like between -1 and 1 may help too.

Following a TensorFlow tutorial - questions around IMG_SIZE and image size consistency

I am following a tutorial on TensorFlow image classifications.
In the first CreateData section of the tutorial it has a parameter:
# The size of the images that your neural network will use
IMG_SIZE = 50
I am wondering what this is? Is this a 50x50 pixel image? In my case I would like to train my model and allow my model to predict based upon differently sizes photos, they won't all have the same resolution or size. In this case, how will I alter the code within this tutorial to cater for that?
Also, in the bit of code used to run a prediction against a test bit of data, it also has the same IMG_SIZE param, so again the same question for this, in my use case, test data will be different sizes, so how can we cater for this?
Additionally, how many photos should I have for each of my trained classes at a minimum? Right now I have about 22 data points for each of my 3 cases, and the results are very poor, but that could equally be to do with the IMG_SIZE issue as above.
Any and all advice greatly appreciated.

prediction of MNIST hand-written digit classifier

I am new to Deep Learning and am using Keras to learn it. I followed instructions at this link to build a handwritten digit recognition classifier using MNIST dataset. It worked fine in terms of seeing comparable evaluation results. I used tensorflow as the backend of Keras.
Now I want to read an image file with a handwritten digit and predict its digit using the same model. I think the image needs to be transformed to be in 28x28 dimension with 255 depth first? I am not sure whether my understanding is correct to begin with. If so, how can I do this transformation in Python? If my understanding is incorrect, what kind of transformation is required?
Thank you in advance!
To my knowledge, you will need to turn this into a 28x28 grayscale image in order to work with this in Python. That's the same shape and scheme as the images that were used to train MNIST, and the tensors are all expecting 784 (28 * 28)-sized items, each with a value between 0-255 in their tensors as input.
To resize an image you could use PIL or Pillow. See this SO post or this page in the Pillow docs (linked to by Wtower in the previously mentioned post, copied here for ease of accesson resizing and keeping aspect ratio, if that's what you want to do.
HTH!
Cheers,
-Maashu

Categories

Resources