how can i reshape these images as a 2d images tensors? - python

I am currently working with rgb images loaded as tensors and i would like to reshape them to be 2d tensors to implement deep neural networks on them
the shape which I am currently working on is :
images.shape
torch.Size([32, 3, 244, 244])
I dont know how to deal with the last two fields and also how to flatten the 3 channels of colors

Your requirement is too hazy and it's unclear what you want to achieve with these images. Do they come with labels? If not, do you want to use an unsupervised method such as an autoencoder? Looking at the shape of your images tensor:
torch.Size([32, 3, 244, 244])
This means that there are 32 color (RGB) images in this tensor. If your definition of 2D means converting them to grayscale images, then you can use the torchvision library.
images = [torchvision.transforms.ToPILImage()(img) for img in images]
images = [torchvision.transforms.Grayscale()(img) for img in images]
And to convert the PIL grayscale images back to torch tensor, use:
images = [torchvision.transforms.ToTensor()(img) for img in images]
images = torch.stack(images).to(device)
Now, the shape of images would be [32, 244, 244]
Flattening the much high resolution image at the very first layer is not a recommended idea. So, that's why you see in the computer vision literature that folks apply few convolution layers in the beginning of the model architecture so as to downsample them to smaller size (resolution) feature descriptors.

You can do something simple like this
image = image.view(image.shape[0], -1)
That will flatten the image to have two dimensions your batch size and the product of your three other dimensions.

Related

What does the 1 in torch.Size([64, 1, 28, 28]) mean when I check a tensor shape?

I'm following this tutorial on towardsdatascience.com because I wanted to try the MNIST dataset using Pytorch since I've already done it using keras.
So in Step 2, knowing the dataset better, they print the trainloader's shape and it returns torch.Size([64, 1, 28, 28]). I understand that 64 is the number of images in that loader and that each one is a 28x28 image but what does the 1 mean exactly?
It simply defines an image of size 28x28 has 1 channel, which means it's a grayscale image. If it was a colored image then instead of 1 there would be 3 as the colored image has 3 channels such as RGB.
It's the number of channels in the input. In the MNIST data set the images are gray scale thus the shape of the image is [28, 28, 1]. Notice that pytorch set the first dimension to the channel dimension.
Of course once loaded as batches the total input shape is the one you are getting.
refer to the MNIST dataset link, where it states:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
In short ,
Its just the number of channels your 28x28 image has
This would suggest the number of batches present in the dataset. Think of it as groups, so we have 1 batch of 64 images, or you could change that, and say, have 2 batches of 32 images each. The batch size can usually influence the computational complexity for the model.
And, of course, depending on the used library (especially in the training/testing loop), the code would look slightly different if you would use just 1 batch, or X number of batches.
For example (the number of epochs/iterations = 50): imagine you are training a dataset of batch size = 1, in the training loop you would just write train the model epoch times. However, for batch size = x, you would have to loop for each epoch as well as for each batch/group.

Unexpected shape in numpy array of a pillow image?

I have built a neural network to detect handwritten digits using the MNIST dataset.
The network takes an input shape of (28,28) as the training MNIST images are 28x28 grayscale.
I now want to test my neural network on some of my own handwriting.
The images I have are not 28x28 grayscale images so I am trying to convert them so that my model will accept them to make predictions.
Currently I have the following:
img = Image.open('image.png').convert('LA')
newImg = img.resize((28,28), Image.ANTIALIAS)
toPredict = np.array(new_img)
However this is giving my an numpy array of shape (28, 28, 2) I don't understand this.
After conversion to grayscale and resizing I should have a 28x28 shaped array (28 pixels height multiplied by 28 pixels width).
I don't understand why the shape is not that.
Can anyone help me get the shape to be 28x28 (and explain why it isn't already) so I can pass this to my neural network?
Thank you!
You're almost there.
img = Image.open('image.png').convert('LA') is 28x28x2 because it is greyscale with an alpha channel.
Instead convert it to just greyscale with:
img = Image.open('image.png').convert('L')
You can see more information on the modes here:
https://pillow.readthedocs.io/en/latest/handbook/concepts.html#modes

Tensorflow example with own handwritten images

I have tried the tensorflow example with zalando mnist here:
https://www.tensorflow.org/tutorials/keras/basic_classification
After that I changed the clothes images with handwritten mnist database, which also works.
Now I want to train the AI with the mnist handwritten database, take a picture from my handwritten "1" and let the KI guess the number.
I appended after the trainig of the KI some lines of code.
What I tried is this:
ownPicArr = imageio.imread(filename) #it is a 28x28 PNG file
ownPicArr = ownPicArr / 255.0
pred = model.predict(ownPicArr)
I got following error:
ValueError: Error when checking input: expected flatten_input to have 3 dimensions, but got array with shape (28, 28)
How to solve this problem? Thnak you...
Even if the colours of your picture were inverted, this is how you could perform the predictions using OpenCV
import os, cv2
image=cv2.imread(imagePath)
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
p = np.expand_dims(size_image, 0)
img = tf.cast(p, tf.float32)
pred = model.predict(img)
First we read the image using OpenCV, which stores it as an array. We then convert the array and also specify the colour channels. After Resizing the image we create a batch of a single image and then after changing the datatype to float32 to or the datatype matching your model we finally make predictions

inputing numpy array images into pytorch neural net

I have a numpy array representation of an image and I want to turn it into a tensor so I can feed it through my pytorch neural network.
I understand that the neural networks take in transformed tensors which are not arranged in [100,100,3] but [3,100,100] and the pixels are rescaled and the images must be in batches.
So I did the following:
import cv2
my_img = cv2.imread('testset/img0.png')
my_img.shape #reuturns [100,100,3] a 3 channel image with 100x100 resolution
my_img = np.transpose(my_img,(2,0,1))
my_img.shape #returns [3,100,100]
#convert the numpy array to tensor
my_img_tensor = torch.from_numpy(my_img)
#rescale to be [0,1] like the data it was trained on by default
my_img_tensor *= (1/255)
#turn the tensor into a batch of size 1
my_img_tensor = my_img_tensor.unsqueeze(0)
#send image to gpu
my_img_tensor.to(device)
#put forward through my neural network.
net(my_img_tensor)
However this returns the error:
RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor
The problem is that the input you give to your network is of type ByteTensor while only float operations are implemented for conv like operations. Try the following
my_img_tensor = my_img_tensor.type('torch.DoubleTensor')
# for converting to double tensor
Source PyTorch Discussion Forum
Thanks to AlbanD

Keras CNN with varying image sizes

I'm trying to use the VOC2012 dataset for training a CNN. For my project, I require B&W data, so I extracted the R components. So far so good. The trouble is that the images are of different sizes, so I can't figure out how to pass it to the model. I compiled my model, and then created my mini-batches of size 32 as below (where X_train and Y_train are the paths to the files).
for x in X_train:
img = plt.imread(x)
img = img.reshape(*(img.shape), 1)
X.append(img)
for y in Y_train:
img = plt.imread(y)
img = img.reshape(*(img.shape), 1)
Y.append(img)
model.train_on_batch(np.array(X), np.array(Y))
However, I suspect that because the images are all of different sizes, the numpy array has a shape (32,) rather than (32, height, width, 1) as I'd expect. How do I take care of this?
According to some sources, one is indeed able to train at least some architectures with varying input sizes. (Quora, Cross Validated)
When it comes to generating an array of arrays varying in size, one might simply use a Python list of NumPy arrays, or an ndarray of type object to collect all the image data. Then in the training process, the Quora answer mentioned that only batch size 1 can be used, or one might clump several images together based on the sizes. Even padding with zeros could be used to make the images evenly sized, but I can't say much about the validity of that approach.
Best of luck in your research!
Example code for illustration:
# Generate 10 "images" with different sizes
images = [np.zeros((i+5, i+10)) for i in range(10)]
images = np.array([np.zeros((i+5, i+10)) for i in range(10)])
# Or an empty array to append to
images = np.array([], dtype=object)

Categories

Resources