Tensorflow dataset generator error in channel dimension - python

I am trying to train my model for image segmentation task and for that i am using a generator to yield the dataset. i have trained it multiple times before but recently I am facing this error.
ValueError:'generator' yielded an element of shape (128,192,3) where an element of shape (128,192,1) was expected.
when i printed out the shapes of my image and mask that comes out of generator it shows.
image:(128,192,1)
mask:(128,192,3)
The generator element gets both the image and mask data loaded from the tensorflow dataset. The question is how does the shape of the mask of an grayscale image changes to 3 when even the input image is grayscale of 1?
How to possibly convert the mask back to channel of 1?
Unfortunately I cannot post the complete code to reproduce as its under privacy

Without knowing more of the library you're using for reading the image it's hard to know. I am assuming you're using PIL and I'll do
from PIL import Image
img = Image.open('im1.jpg','r')
img = img.convert('L')
'L' is for the grayscale, you can check more mode -> https://pillow.readthedocs.io/en/stable/handbook/concepts.html
While you're updating your code, you should check the preprocessing module of keras, then the code will be
# Returns a PIL image
image = tf.keras.preprocessing.image.load_img(image_path, color_mode="grayscale")
input_arr = keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
predictions = model.predict(input_arr)
More information -> https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/load_img

Related

Error when running reshape on image. dataset

I am trying to reshape the image so as to run gan models but I get this error.
AttributeError: 'BatchDataset' object has no attribute 'reshape
I'm assuming you're trying to utilize a Tensorflow Dataset with an existing Keras example. In this case, you'll need to use a map function to achieve the image resize and conversion to grayscale which would match the MNIST input shape I believe you're going for. This example below should show you how to accomplish this in your code.
a = tf.ones([180,180,3])
dataset = tf.data.Dataset.from_tensors(a)
# Note, how we're first using tf.image.resize to reduce the image size to 28x28 and then using rgb_to_grayscale to convert from 3 channels into 1.
dataset = dataset.map(lambda x: tf.image.rgb_to_grayscale(tf.image.resize(x, (28,28))))

Save Pytorch 4D tensor as image

I have a 4-d Pytorch tensor that I would like to save to disk as a .jpg
My tensor is the following size:
print(image_tensor.size())
>>>torch.Size([1, 3, 400, 711])
I can view the entire tensor as one image within my IDE:
ax1.imshow(im_convert(image_tensor))
Since I am able to view the entire tensor as one image, I am assuming there is a way to also save it as such. However, when I try to save the image, it looks like it only saves the blue color channel. I would like to save the entire tensor as a single image.
img1 = image_tensor[0]
save_image(img1, 'img1.jpg')
In PyTorch this snippet is working and saving the image:
from torchvision.utils import save_image
import torch
import torchvision
tensor= torch.rand(2, 3, 400, 711)
img1 = tensor[0]
save_image(img1, 'img1.png')
Before saving the image can you check the shape of the img1 in any case something happened.

Apply mean filter to all images while resizing them by utf code. Working on tensor flow

in the image the code is using the UTF code for the resizing of the images , how can i apply mean filter to all images loading the data set
There are various filters available for image transformation in TensorFlow addons.
Judging from the code in your parse function, you are reading the image and you want to implement your logic inside the parse function.
Note: tensorflow_addons is a separate package and you have to install it seperately.
import tensorflow_addons as tfa
img_raw = tf.io.read_file(img_path) # byte string
img = tf.io.decode_image(img_raw) # numpy array with dtype unint8
img = tf.image.convert_image_dtype(img, tf.float32) # numpy array with dtype float32
img = tf.image.resize(img, [500,500])
plt.title("TensorFlow Logo with shape {}".format(img.shape))
_ = plt.imshow(img)
Output:
Now applying mean filter using tfa.image.mean_filter2d().
You can see the documentation here: Link
Code:
mean = tfa.image.mean_filter2d(img, filter_shape=11)
_ = plt.imshow(mean)
Output:
There are various other image filters in tensorflow addons.
You can check the working here: Link

How does ImageDataGenerator work? Does it rescale input automatically?

I'm trying to train a neural net (autoencoder) reading the '.tif' images from a folder, so I decided to use ImageDataGenerator class. The images values are variables, sometimes the maximum can be 4000, sometimes can be 0.5, but when I use the above mentioned class and its methods (flow_from_directory or flow_from_dataframe) it's like the images are automatically rescaled. Is it possible to leave the values as they were before? Is there anything wrong with the code?
train_datagen = ImageDataGenerator(shear_range=0.2,zoom_range=0.2,horizontal_flip=True,dtype='float32')
train_generator = train_datagen.flow_from_directory(directory =train_data_dir,color_mode = 'grayscale',target_size=(img_width, img_height),batch_size=batch_size,class_mode='input',)
I control the input images in that way:
batch = np.concatenate([next(train_generator)[0] for _ in range(2)])
I expected the input images to have different range of values, but it seems that every image has pixel in range [0,255].
Under the hood, ImageDataGenerator uses PIL to load images. You'll find that your .tif images are opened with PIL and converted to 'L' mode (Luminance, see this excellent explanation on different color modes in PIL) when setting the color mode to grayscale:
...
img = pil_image.open(path)
if color_mode == 'grayscale':
if img.mode != 'L':
img = img.convert('L')
...
L mode means that your image will be represented by a single-channel array containing 1-byte luminance values. These are the values between 0 and 255 that you mention.
Now, PIL is probably not the best library to read in tiff images. If you want to pass the images with their original values to your neural network, you will probably need to write a custom python generator (there are plenty tutorials for this) which reads the images with a third-party library suited for reading tif and converts them to numpy arrays.

How to interpret the file mean.binaryproto when loading a Neural Network?

I want to load a Neural Network that has been trained with caffe for image classification.
The NN contains a file mean.binaryproto which has the means to be subtracted before inputting an image to be classified.
I am trying to understand what is contained in this file so I used Google Colab to see what is inside it.
The code to load it is the following:
# Load the Drive helper and mount
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive')
!ls "/content/drive/My Drive"
#install packages
!apt install -y caffe-cuda
!apt update
!apt upgrade
!apt dist-upgrade
!ls "/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/"
import caffe
import numpy as np
with open('/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/mean.binaryproto', 'rb') as f:
blob = caffe.proto.caffe_pb2.BlobProto()
blob.ParseFromString(f.read())
arr = np.array( caffe.io.blobproto_to_array(blob) )
print(arr.shape)
out = arr[0]
data = np.array(blob.data).reshape([blob.channels, blob.height, blob.width])
print (data.shape)
print(data[0])
#display the mean image
from PIL import Image
from IPython.display import Image as Im, display
display(Image.fromarray(data[0], 'RGB'))
which outputs:
(1, 3, 256, 256)
(3, 256, 256)
What I have understood is that the file contain the means and the images we are talking about are 3 channel images so there is a mean for each channel.
However I was expecting a single value per channel instead I found a 256x256 array: does it mean that a mean on each pixel of each channel has been taken?
Another question is the following: I want to use such NN with OpenCV which instead of RGB uses BGR: How to know if the mean 3x256x256 uses RGB or BGR?
The link to the model is this. The model I am looking at is contained in the zip file CNRPark-Trained-Models.zip within the folder: mAlexNet-on-CNRPark.
However I was expecting a single value per channel instead I found a
256x256 array: does it mean that the took a mean on each pixel of each
channel?
Exactly. According to the shape of mean.binaryproto, this file is the average image of some dataset, which means that it took the mean of each pixel (feature) for each channel.
This should not be confused with the mean pixel, which, as you stated, is a single value for each channel.
For example, mean pixel was adoped by Very Deep Convolutional Networks for Large-Scale Image Recognition. According to their paper:
The only pre-processing we do is subtracting the mean RGB value,
computed on the training set, from each pixel
In other words, if you consider an RGB image to be 3 feature arrays of size N x N, the average image will be the mean of each feature and the mean pixel will be the mean of all features.
Another question is the following: I want to use such NN with OpenCV
which instead of RGB uses BGR: How to know if the mean 3x256x256 uses
RGB or BGR?
I doubt the binary file you are reading stores any information about its color format, but a practical way to figure out is to plot this image using matplotlib and see if the colors make sense.
For example, face images. If red and blue channels are swapped the skin tone will look blueish.
In fact, the image above is an example of average image (face images) :)
You could also assume it is BGR since OpenCV uses this color format.
However, the correct way to find out how this mean.binaryproto was generated is by looking at their repositories or by asking the owner of the model.
import os, sys, glob, caffe
import numpy as np
mean_file= "path/to/file/mean.binaryproto"
#convert mean file to image
blob= caffe.proto.caffe_pb2.BlobProto()
try:
data = open( mean_file, 'rb' ).read()
except:
data = open( mean_file, 'r' ).read()
blob.ParseFromString(data)
arr = np.uint8(np.array( caffe.io.blobproto_to_array(blob) )[0])
#a= arr[0]; b= arr[1]; c= arr[2]
img= np.zeros([128,200,3])
img[:,:,0]= arr[0]; img[:,:,1]= arr[1]; img[:,:,2]= arr[2]
import cv2
cv2.imwrite(mean_file.replace(".binaryproto", ".bmp"), img)

Categories

Resources