Save Pytorch 4D tensor as image - python

I have a 4-d Pytorch tensor that I would like to save to disk as a .jpg
My tensor is the following size:
print(image_tensor.size())
>>>torch.Size([1, 3, 400, 711])
I can view the entire tensor as one image within my IDE:
ax1.imshow(im_convert(image_tensor))
Since I am able to view the entire tensor as one image, I am assuming there is a way to also save it as such. However, when I try to save the image, it looks like it only saves the blue color channel. I would like to save the entire tensor as a single image.
img1 = image_tensor[0]
save_image(img1, 'img1.jpg')

In PyTorch this snippet is working and saving the image:
from torchvision.utils import save_image
import torch
import torchvision
tensor= torch.rand(2, 3, 400, 711)
img1 = tensor[0]
save_image(img1, 'img1.png')
Before saving the image can you check the shape of the img1 in any case something happened.

Related

Tensorflow dataset generator error in channel dimension

I am trying to train my model for image segmentation task and for that i am using a generator to yield the dataset. i have trained it multiple times before but recently I am facing this error.
ValueError:'generator' yielded an element of shape (128,192,3) where an element of shape (128,192,1) was expected.
when i printed out the shapes of my image and mask that comes out of generator it shows.
image:(128,192,1)
mask:(128,192,3)
The generator element gets both the image and mask data loaded from the tensorflow dataset. The question is how does the shape of the mask of an grayscale image changes to 3 when even the input image is grayscale of 1?
How to possibly convert the mask back to channel of 1?
Unfortunately I cannot post the complete code to reproduce as its under privacy
Without knowing more of the library you're using for reading the image it's hard to know. I am assuming you're using PIL and I'll do
from PIL import Image
img = Image.open('im1.jpg','r')
img = img.convert('L')
'L' is for the grayscale, you can check more mode -> https://pillow.readthedocs.io/en/stable/handbook/concepts.html
While you're updating your code, you should check the preprocessing module of keras, then the code will be
# Returns a PIL image
image = tf.keras.preprocessing.image.load_img(image_path, color_mode="grayscale")
input_arr = keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
predictions = model.predict(input_arr)
More information -> https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/load_img

Restore grayscale image from jpg file

I have a 2-d numpy array, that I save as .jpg image.
For simplicity, let's assume my numpy array is the numbers between 0...255.
My problem is that once I save this array as .jpg image, I can't restore its values.
So my code is:
import cv2
from scipy.ndimage import imread
arr=np.array(range(256)).reshape(16,16)
cv2.imwrite('arr.jpg',arr)
restored=imread('arr.jpg')
print((arr==restored).sum()) #output is 224 rather than 256, i.e. 32 pixels are different!
So, how can I save the image so that I can see it, and restore the values afterwords?
Any help will be appreciated!

How to interpret the file mean.binaryproto when loading a Neural Network?

I want to load a Neural Network that has been trained with caffe for image classification.
The NN contains a file mean.binaryproto which has the means to be subtracted before inputting an image to be classified.
I am trying to understand what is contained in this file so I used Google Colab to see what is inside it.
The code to load it is the following:
# Load the Drive helper and mount
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive')
!ls "/content/drive/My Drive"
#install packages
!apt install -y caffe-cuda
!apt update
!apt upgrade
!apt dist-upgrade
!ls "/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/"
import caffe
import numpy as np
with open('/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/mean.binaryproto', 'rb') as f:
blob = caffe.proto.caffe_pb2.BlobProto()
blob.ParseFromString(f.read())
arr = np.array( caffe.io.blobproto_to_array(blob) )
print(arr.shape)
out = arr[0]
data = np.array(blob.data).reshape([blob.channels, blob.height, blob.width])
print (data.shape)
print(data[0])
#display the mean image
from PIL import Image
from IPython.display import Image as Im, display
display(Image.fromarray(data[0], 'RGB'))
which outputs:
(1, 3, 256, 256)
(3, 256, 256)
What I have understood is that the file contain the means and the images we are talking about are 3 channel images so there is a mean for each channel.
However I was expecting a single value per channel instead I found a 256x256 array: does it mean that a mean on each pixel of each channel has been taken?
Another question is the following: I want to use such NN with OpenCV which instead of RGB uses BGR: How to know if the mean 3x256x256 uses RGB or BGR?
The link to the model is this. The model I am looking at is contained in the zip file CNRPark-Trained-Models.zip within the folder: mAlexNet-on-CNRPark.
However I was expecting a single value per channel instead I found a
256x256 array: does it mean that the took a mean on each pixel of each
channel?
Exactly. According to the shape of mean.binaryproto, this file is the average image of some dataset, which means that it took the mean of each pixel (feature) for each channel.
This should not be confused with the mean pixel, which, as you stated, is a single value for each channel.
For example, mean pixel was adoped by Very Deep Convolutional Networks for Large-Scale Image Recognition. According to their paper:
The only pre-processing we do is subtracting the mean RGB value,
computed on the training set, from each pixel
In other words, if you consider an RGB image to be 3 feature arrays of size N x N, the average image will be the mean of each feature and the mean pixel will be the mean of all features.
Another question is the following: I want to use such NN with OpenCV
which instead of RGB uses BGR: How to know if the mean 3x256x256 uses
RGB or BGR?
I doubt the binary file you are reading stores any information about its color format, but a practical way to figure out is to plot this image using matplotlib and see if the colors make sense.
For example, face images. If red and blue channels are swapped the skin tone will look blueish.
In fact, the image above is an example of average image (face images) :)
You could also assume it is BGR since OpenCV uses this color format.
However, the correct way to find out how this mean.binaryproto was generated is by looking at their repositories or by asking the owner of the model.
import os, sys, glob, caffe
import numpy as np
mean_file= "path/to/file/mean.binaryproto"
#convert mean file to image
blob= caffe.proto.caffe_pb2.BlobProto()
try:
data = open( mean_file, 'rb' ).read()
except:
data = open( mean_file, 'r' ).read()
blob.ParseFromString(data)
arr = np.uint8(np.array( caffe.io.blobproto_to_array(blob) )[0])
#a= arr[0]; b= arr[1]; c= arr[2]
img= np.zeros([128,200,3])
img[:,:,0]= arr[0]; img[:,:,1]= arr[1]; img[:,:,2]= arr[2]
import cv2
cv2.imwrite(mean_file.replace(".binaryproto", ".bmp"), img)

PIL image to array and back

EDIT: Sorry, the first version of the code was bullshit, I tried to remove useless information and made a mistake. Problem stays the same, but now it's the code I actually used
I think my problem is probably very basic but I cant find a solution. I basically just wanted to play around with PIL and convert an image to an array and backward, then save the image. It should look the same, right? In my case the new image is just gibberish, it seems to have some structure but it is not a picture of a plane like it should be:
def array_image_save(array, image_path ='plane_2.bmp'):
image = Image.fromarray(array, 'RGB')
image.save(image_path)
print("Saved image: {}".format(image_path))
im = Image.open('plane.bmp').convert('L')
w,h = im.size
array_image_save(np.array(list(im.getdata())).reshape((w,h)))
Not entirely sure what you are trying to achieve but if you just want to transform the image to a numpy array and back, the following works:
from PIL import Image
import numpy as np
def array_image_save(array, image_path ='plane_2.bmp'):
image = Image.fromarray(array)
image.save(image_path)
print("Saved image: {}".format(image_path))
im = Image.open('plane.bmp')
array_image_save(np.array(im))
You can just pass a PIL image to np.array and it takes care of the proper shaping. The reason you get distorted data is because you convert the pil image to greyscale (.convert('L')) but then try to save it as RGB.

Image height and width getting swapped when read using opencv imread

When I read an image using opencv imread function, I find its height and width being swapped as what it should be. Like my original image is of dimensions (610 by 406) but on being read using opencv::imread function, its dimensions are 406 by 610. Also, if I rotate my original image before passing it to the function then also, no change. The image read still has original dimensions.
Please see example code and images for clarification:
So, below I have provided the input images: one is original and second one is rotated (I rotated it using windows rotate command, by right-clicking and selecting 'rotate right'). Output I get for both the images is same. It seems to me that rotating image did not actually change its shape. I think so because, when I try to put the rotated image here then also, it was showing the un-rotated version of it only (in the preview) so, I had to take a screen-capture of it and then, paste it here.
This is the code:
import cv2
import numpy as np
import sys
import os
image = cv2.imread("C:/img_8075.jpg")
print "image shape: ",image.shape
cv2.imshow("image",image)
cv2.waitKey(0)
image2 = cv2.imread("C:/img_8075_Rotated.jpg")
print "image shape: ",image2.shape
cv2.imshow("image",image2)
cv2.waitKey(0)
The result I get for this is: image shape: (406,610,3)
image shape: (406,610,3)
for both the images.
I am unable to paste input/output pictures here since, it says you should have '10 reputations' and I have just joined.
Any suggestions would be helpful. thanks!
I believe you are just getting the conventions mixed up. OpenCV Mat structures can be accessed (ROW,COLUMN).
So a 1920x1080 image will be 1080 ROWS by 1920 COLUMNS (1080,1920)
Commonly Mat.rows represent the image's height,and the Mat.cols represent the image's width.

Categories

Resources