I have built a neural network to detect handwritten digits using the MNIST dataset.
The network takes an input shape of (28,28) as the training MNIST images are 28x28 grayscale.
I now want to test my neural network on some of my own handwriting.
The images I have are not 28x28 grayscale images so I am trying to convert them so that my model will accept them to make predictions.
Currently I have the following:
img = Image.open('image.png').convert('LA')
newImg = img.resize((28,28), Image.ANTIALIAS)
toPredict = np.array(new_img)
However this is giving my an numpy array of shape (28, 28, 2) I don't understand this.
After conversion to grayscale and resizing I should have a 28x28 shaped array (28 pixels height multiplied by 28 pixels width).
I don't understand why the shape is not that.
Can anyone help me get the shape to be 28x28 (and explain why it isn't already) so I can pass this to my neural network?
Thank you!
You're almost there.
img = Image.open('image.png').convert('LA') is 28x28x2 because it is greyscale with an alpha channel.
Instead convert it to just greyscale with:
img = Image.open('image.png').convert('L')
You can see more information on the modes here:
https://pillow.readthedocs.io/en/latest/handbook/concepts.html#modes
Related
I am currently working with rgb images loaded as tensors and i would like to reshape them to be 2d tensors to implement deep neural networks on them
the shape which I am currently working on is :
images.shape
torch.Size([32, 3, 244, 244])
I dont know how to deal with the last two fields and also how to flatten the 3 channels of colors
Your requirement is too hazy and it's unclear what you want to achieve with these images. Do they come with labels? If not, do you want to use an unsupervised method such as an autoencoder? Looking at the shape of your images tensor:
torch.Size([32, 3, 244, 244])
This means that there are 32 color (RGB) images in this tensor. If your definition of 2D means converting them to grayscale images, then you can use the torchvision library.
images = [torchvision.transforms.ToPILImage()(img) for img in images]
images = [torchvision.transforms.Grayscale()(img) for img in images]
And to convert the PIL grayscale images back to torch tensor, use:
images = [torchvision.transforms.ToTensor()(img) for img in images]
images = torch.stack(images).to(device)
Now, the shape of images would be [32, 244, 244]
Flattening the much high resolution image at the very first layer is not a recommended idea. So, that's why you see in the computer vision literature that folks apply few convolution layers in the beginning of the model architecture so as to downsample them to smaller size (resolution) feature descriptors.
You can do something simple like this
image = image.view(image.shape[0], -1)
That will flatten the image to have two dimensions your batch size and the product of your three other dimensions.
I have tried the tensorflow example with zalando mnist here:
https://www.tensorflow.org/tutorials/keras/basic_classification
After that I changed the clothes images with handwritten mnist database, which also works.
Now I want to train the AI with the mnist handwritten database, take a picture from my handwritten "1" and let the KI guess the number.
I appended after the trainig of the KI some lines of code.
What I tried is this:
ownPicArr = imageio.imread(filename) #it is a 28x28 PNG file
ownPicArr = ownPicArr / 255.0
pred = model.predict(ownPicArr)
I got following error:
ValueError: Error when checking input: expected flatten_input to have 3 dimensions, but got array with shape (28, 28)
How to solve this problem? Thnak you...
Even if the colours of your picture were inverted, this is how you could perform the predictions using OpenCV
import os, cv2
image=cv2.imread(imagePath)
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
p = np.expand_dims(size_image, 0)
img = tf.cast(p, tf.float32)
pred = model.predict(img)
First we read the image using OpenCV, which stores it as an array. We then convert the array and also specify the colour channels. After Resizing the image we create a batch of a single image and then after changing the datatype to float32 to or the datatype matching your model we finally make predictions
I have a numpy array representation of an image and I want to turn it into a tensor so I can feed it through my pytorch neural network.
I understand that the neural networks take in transformed tensors which are not arranged in [100,100,3] but [3,100,100] and the pixels are rescaled and the images must be in batches.
So I did the following:
import cv2
my_img = cv2.imread('testset/img0.png')
my_img.shape #reuturns [100,100,3] a 3 channel image with 100x100 resolution
my_img = np.transpose(my_img,(2,0,1))
my_img.shape #returns [3,100,100]
#convert the numpy array to tensor
my_img_tensor = torch.from_numpy(my_img)
#rescale to be [0,1] like the data it was trained on by default
my_img_tensor *= (1/255)
#turn the tensor into a batch of size 1
my_img_tensor = my_img_tensor.unsqueeze(0)
#send image to gpu
my_img_tensor.to(device)
#put forward through my neural network.
net(my_img_tensor)
However this returns the error:
RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor
The problem is that the input you give to your network is of type ByteTensor while only float operations are implemented for conv like operations. Try the following
my_img_tensor = my_img_tensor.type('torch.DoubleTensor')
# for converting to double tensor
Source PyTorch Discussion Forum
Thanks to AlbanD
There are other post with similar questions but none of the answers are helping me. I´m new to this CNN world.
I followed this tutorial for training a CNN with Keras using theano as BackEnd with the MNIST dataset. Now I want to pass to the CNN my own jpg image but I dont know how to reshape it. Can you help me please? Im super new at this.
So far, I tried this to reshape
image = np.expand_dims(image, axis=0) image = preprocess_input(image)
but get the following error when predicting:
ValueError: Error when checking : expected conv2d_1_input to have shape (None, 1, 28, 28) but got array with shape (1, 3, 28, 28)
As you can see, my CNN uses width = 28, height = 28 and depth =1.
Try using Numpy for reshaping. Since, you have been using a 2D-Convolutional model:
image = np.reshape(image, (28, 1, 28, 1))
The error message shows the network expects the image shape is 1*28*28, but your input is in 3*28*28. I guess the image you input is a color image, 3 channels(RGB), while the network expects a gray image, one channel.
When you call opencv to read image, please use code below.
img = cv2.imread(imgfile, cv2.IMREAD_GRAYSCALE)
simply use
'''image = np.reshape(len(image), (28,28, 1))'''
I want to capture frames from a video with python and opencv and then classify the captured Mat images with tensorflow. The problem is that i don´t know how to convert de Mat format to a 3D Tensor variable. This is how i am doing now with tensorflow (loading the image from file) :
image_data = tf.gfile.FastGFile(imagePath, 'rb').read()
with tf.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
I will appreciate any help, thanks in advance
Load the OpenCV image using imread, then convert it to a numpy array.
For feeding into inception v3, you need to use the Mult:0 Tensor as entry point, this expects a 4 dimensional Tensor that has the layout: [Batch index,Width,Height,Channel]
The last three are perfectly fine from a cv::Mat, the first one just needs to be 0, as you do not want to feed a batch of images, but a single image.
The code looks like:
#Loading the file
img2 = cv2.imread(file)
#Format for the Mul:0 Tensor
img2= cv2.resize(img2,dsize=(299,299), interpolation = cv2.INTER_CUBIC)
#Numpy array
np_image_data = np.asarray(img2)
#maybe insert float convertion here - see edit remark!
np_final = np.expand_dims(np_image_data,axis=0)
#now feeding it into the session:
#[... initialization of session and loading of graph etc]
predictions = sess.run(softmax_tensor,
{'Mul:0': np_final})
#fin!
Kind regards,
Chris
Edit: I just noticed, that the inception network wants intensity values normalized as floats to [-0.5,0.5], so please use this code to convert them before building the RGB image:
np_image_data=cv2.normalize(np_image_data.astype('float'), None, -0.5, .5, cv2.NORM_MINMAX)
With Tensorflow 2.0 and OpenCV 4.2.0, you can convert by this way :
import numpy as np
import tensorflow as tf
import cv2 as cv
width = 32
height = 32
#Load image by OpenCV
img = cv.imread('img.jpg')
#Resize to respect the input_shape
inp = cv.resize(img, (width , height ))
#Convert img to RGB
rgb = cv.cvtColor(inp, cv.COLOR_BGR2RGB)
#Is optional but i recommend (float convertion and convert img to tensor image)
rgb_tensor = tf.convert_to_tensor(rgb, dtype=tf.float32)
#Add dims to rgb_tensor
rgb_tensor = tf.expand_dims(rgb_tensor , 0)
#Now you can use rgb_tensor to predict label for exemple :
#Load pretrain model, made from: https://www.tensorflow.org/tutorials/images/cnn
model = tf.keras.models.load_model('cifar10_model.h5')
#Create probability model
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
#Predict label
predictions = probability_model.predict(rgb_tensor, steps=1)
It looks like you're using the pre-trained and pre-defined Inception model, which has a tensor named DecodeJpeg/contents:0. If so, this tensor expects a scalar string containing the bytes for a JPEG image.
You have a couple of options, one is to look further down the network for the node where the JPEG is converted to a matrix. I'm not sure what the MAT format is, but this will be a [height, width, colour_depth] representation. If you can get your image in that format you can replace the DecodeJpeg... string with the name of the node you want to feed into.
The other option is to simply convert your images to JPEGs and feed them straight in.
You should be able to convert the opencv mat format to a numpy array as:
np_image_data = np.asarray(image_data)
Once you have the data as a numpy array you can pass it to tensor flow through a feeding mechanism as in the link that #thesonyman101 referenced:
feed_dict = {some_tf_input:np_image_data}
predictions = sess.run(some_tf_output, feed_dict=feed_dict)
In my case i had to read an image from file, do some processing and then inject into inception to obtain the return from a features layer, called last layer.
My solution is short but effective.
img = cv2.imread(file)
... do some processing
img_as_string = cv2.imencode('.jpg', img)[1].tostring()
features = sess.run(last_layer, {'DecodeJpeg/contents:0': img_as_string})