I want to adjust the image brightness of an input image that is feed into a keras model. The data is supplied from a simulator and feed into the model in real time so i need a way to adjust the image data in the model itself. I am currently using my own layer with openCV to perform the task but i am getting the following error.
File "/usr/lib/python3/dist-packages/numpy/core/_methods.py", line 70, in _mean
ret = ret.dtype.type(ret / rcount)
AttributeError: 'DType' object has no attribute 'type'
The issue appears to be with 'gamma = np.median(img) / 25' and the code trying to do numpy maths on a class 'tensorflow.python.framework.ops.Tensor'.
My class code is
class ImageLayer(Layer):
def __init__(self, **kwargs):
super(ImageLayer, self).__init__(**kwargs)
def call(self, img, mask=None):
print(type(img))
# adjust the image brightness to help normalise dark and light images
gamma = np.median(img) / 25
if gamma > 5.:
gamma = 5
elif gamma < 0.5:
gamma = 0.5
# build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values
# http://www.pyimagesearch.com/2015/10/05/opencv-gamma-correction/
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) * 255
for i in np.arange(0, 256)]).astype("uint8")
# apply gamma correction using the lookup table
return cv2.LUT(img, table)
The model calls the class from the model
inputs = Input(shape=(160, 320, 3), dtype='int8')
x = Cropping2D(cropping=((50,0), (0,0)), input_shape=(160, 320, 3), dim_ordering='tf')(inputs)
x = ImageLayer()(x)
x = BatchNormalization(epsilon=0.001, mode=0, axis=2, momentum=0.99)(x)
Is it possible to do what i want to do?
Is it possible to perform numpy arithmetic in Keras? I know that you can in Tensorflow with .eval().
As far as I see, you take an image and then 1. crop it 2. change the brightness. Then feed it into your model. So instead of defining the Input layer of the shape (160, 320, 3), why don't you define one of the shape that you will get after cropping and changing brightness. Then define the rest of your model as usual. If you do this, then instead of writing a layer, you will only have to write your own generator, in which you can change brightness/crop etc. using normal opencv/python/numpy. For example, see my post for how to define a multi-threaded generator capable of working with multiple workers.
Do not do this if you want you want to treat the change in brightness as a learnable parameter or include it in backpropagation. In other words, use the above technique if the brightness change is a pre-processing operation and has nothing to do with how you learn.
A simple generator (works with only 1 worker) on MNIST data is given below which fetches 32 images at a time. You may include your brightness change operation immediately after you read the image. Treat this code only as a skeleton. I have not defined all the variables and it will not work out of the box.
def myGenerator(): # write the definition of your data generator
while True:
count = 0
for i in range(len(allImgFilenames)):
if count == 0:
imgBatch = np.empty((batchSize, 3, 32, 32), dtype=float)
labelsBatch = np.empty((batchSize,), dtype=int)
img = cv2.imread(allImgFilenames[i])
img = cv2.cvtcolor(img, cv2.COLOR_BGR2RGB) # change the brightness
img = np.float32(img)/255.
imgBatch[count, :, :, :] = np.transpose(img, (2,0,1))
labelsBatch[count] = np.random.randint(0,10,(1,1))
count += 1
if count == batchSize:
count = 0
yield (imgBatch, labelsBatch)
Call the generator in the fit function as follows:
my_generator = myGenerator()
print("Built the generator")
model.fit_generator(my_generator, samples_per_epoch=60000, nb_epoch=10)
Testing:
You want to get the data from simulator in real-time. For this, you can replace cv2.imread() by a function which gets the data from simulator. You may also change the batch size to 1, if you want to classify the image as soon as it is simulated. Fetch the image from generator as follows:
img, label = my_generator.next() # this will give you `batchSize` number of samples.
model.predict(img) # `img` should have 4 dimensions if RGB, img.shape = (1,3,nRows,nCols)
I hope this helps.
Related
I'm here asking a general question about image processing applied to a machine learning pipeline. In this post, I will refer to ML as every algorithm that is not deep learning (therefore it doesn't use a neural network).
I'm developing a classifier to catalog different clothes .png images. I have labels (for each image I know the category) so it's a supervised learning problem.
My objective is to use PCA to reduce the problem's dimensionality and then use bag of visual words to perform the classification. I'm using python for this project.
The problem is that each photo has a different size and a different ratio between width and height (therefore I can't only resize them because I wouldn't have a unique height value for each image).
My, inelegant, solution is to fix the width at 200 px and then pad a bunch of zeros rows to each image (each image is a NumPy array of maximum_h rows and each row is width long).
Here the script:
#help function to convert images in array
def get_image(image_path: str, resize=True, w=300):
"""
:param image_path: string, path of the image
:param resize: boolean, if True the image is resized. Default: True
:param w: integer, specify the width of the resized image
:return: numpy array of the greyscale version of the image
"""
try:
image = Image.open(image_path).convert("L")
if resize:
wpercent = (w/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
image = image.resize((w,hsize), Image.ANTIALIAS)
#pixel_values = np.array(image.getdata())
return image
except:
#AI19/04442.png corrupted
#AI18/02971.png corrupted
#print(image_path)
return None
def extract_images(paths:list, categories: list, w: int, maximum_h: int):
A = np.zeros([len(paths), w * maximum_h])
y = []
counter = 0
for image_path, label in tqdm(zip(paths, categories)):
im = get_image(image_path, w=w)
if im:
#adapt images to fit
h,w = np.array(im).shape
delta_h = maximum_h-h
zeros_ = np.zeros((delta_h, w), dtype=int)
im = np.concatenate((im, zeros_), axis=0)
A[counter, :] = im.reshape(1, -1)
y.append(label)
counter += 1
else:
continue
return (A,y)
The problem here is the classifier performs badly (20%) because I add a significant amount of zeros to each image that increases the dimensionality but doesn't add information.
Looking at the biggest eigenvectors of the PCA algorithm I see that a lot of information is concentrated in these "padding" area (and this confirm my impression).
Is there a better way to handle different size images in python?
I would like to take an image and change the scale of the image, while it is a numpy array.
For example I have this image of a coca-cola bottle:
bottle-1
Which translates to a numpy array of shape (528, 203, 3) and I want to resize that to say the size of this second image:
bottle-2
Which has a shape of (140, 54, 3).
How do I change the size of the image to a certain shape while still maintaining the original image? Other answers suggest stripping every other or third row out, but what I want to do is basically shrink the image how you would via an image editor but in python code. Are there any libraries to do this in numpy/SciPy?
Yeah, you can install opencv (this is a library used for image processing, and computer vision), and use the cv2.resize function. And for instance use:
import cv2
import numpy as np
img = cv2.imread('your_image.jpg')
res = cv2.resize(img, dsize=(54, 140), interpolation=cv2.INTER_CUBIC)
Here img is thus a numpy array containing the original image, whereas res is a numpy array containing the resized image. An important aspect is the interpolation parameter: there are several ways how to resize an image. Especially since you scale down the image, and the size of the original image is not a multiple of the size of the resized image. Possible interpolation schemas are:
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free
results. But when the image is zoomed, it is similar to the
INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
Like with most options, there is no "best" option in the sense that for every resize schema, there are scenarios where one strategy can be preferred over another.
While it might be possible to use numpy alone to do this, the operation is not built-in. That said, you can use scikit-image (which is built on numpy) to do this kind of image manipulation.
Scikit-Image rescaling documentation is here.
For example, you could do the following with your image:
from skimage.transform import resize
bottle_resized = resize(bottle, (140, 54))
This will take care of things like interpolation, anti-aliasing, etc. for you.
One-line numpy solution for downsampling (by 2):
smaller_img = bigger_img[::2, ::2]
And upsampling (by 2):
bigger_img = smaller_img.repeat(2, axis=0).repeat(2, axis=1)
(this asssumes HxWxC shaped image. note this method only allows whole integer resizing (e.g., 2x but not 1.5x))
For people coming here from Google looking for a fast way to downsample images in numpy arrays for use in Machine Learning applications, here's a super fast method (adapted from here ). This method only works when the input dimensions are a multiple of the output dimensions.
The following examples downsample from 128x128 to 64x64 (this can be easily changed).
Channels last ordering
# large image is shape (128, 128, 3)
# small image is shape (64, 64, 3)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((output_size, bin_size,
output_size, bin_size, 3)).max(3).max(1)
Channels first ordering
# large image is shape (3, 128, 128)
# small image is shape (3, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((3, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
For grayscale images just change the 3 to a 1 like this:
Channels first ordering
# large image is shape (1, 128, 128)
# small image is shape (1, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((1, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
This method uses the equivalent of max pooling. It's the fastest way to do this that I've found.
If anyone came here looking for a simple method to scale/resize an image in Python, without using additional libraries, here's a very simple image resize function:
#simple image scaling to (nR x nC) size
def scale(im, nR, nC):
nR0 = len(im) # source number of rows
nC0 = len(im[0]) # source number of columns
return [[ im[int(nR0 * r / nR)][int(nC0 * c / nC)]
for c in range(nC)] for r in range(nR)]
Example usage: resizing a (30 x 30) image to (100 x 200):
import matplotlib.pyplot as plt
def sqr(x):
return x*x
def f(r, c, nR, nC):
return 1.0 if sqr(c - nC/2) + sqr(r - nR/2) < sqr(nC/4) else 0.0
# a red circle on a canvas of size (nR x nC)
def circ(nR, nC):
return [[ [f(r, c, nR, nC), 0, 0]
for c in range(nC)] for r in range(nR)]
plt.imshow(scale(circ(30, 30), 100, 200))
Output:
This works to shrink/scale images, and works fine with numpy arrays.
For people who wants to resize(interpolate) a batch of numpy array, pytorch provide a faster function names torch.nn.functional.interpolate, just remember to use np.transpose first to change the channel from batchxWxHx3 to batchx3xWxH.
SciPy's imresize() method was another resize method, but it will be removed starting with SciPy v 1.3.0 . SciPy refers to PIL image resize method: Image.resize(size, resample=0)
size – The requested size in pixels, as a 2-tuple: (width, height).
resample – An optional resampling filter. This can be one of PIL.Image.NEAREST (use nearest neighbour), PIL.Image.BILINEAR (linear interpolation), PIL.Image.BICUBIC (cubic spline interpolation), or PIL.Image.LANCZOS (a high-quality downsampling filter). If omitted, or if the image has mode “1” or “P”, it is set PIL.Image.NEAREST.
Link here:
https://pillow.readthedocs.io/en/3.1.x/reference/Image.html#PIL.Image.Image.resize
Stumbled back upon this after a few years. It looks like the answers so far fall into one of a few categories:
Use an external library. (OpenCV, SciPy, etc)
User Power-of-Two Scaling
Use Nearest Neighbor
These solutions are all respectable, so I offer this only for completeness. It has three advantages over the above: (1) it will accept arbitrary resolutions, even non-power-of-two scaling factors; (2) it uses pure Python+Numpy with no external libraries; and (3) it interpolates all the pixels for an arguably 'nicer-looking' result.
It does not make good use of Numpy and, thus, is not fast, especially for large images. If you're only rescaling smaller images, it should be fine. I offer this under Apache or MIT license at the discretion of the user.
import math
import numpy
def resize_linear(image_matrix, new_height:int, new_width:int):
"""Perform a pure-numpy linear-resampled resize of an image."""
output_image = numpy.zeros((new_height, new_width), dtype=image_matrix.dtype)
original_height, original_width = image_matrix.shape
inv_scale_factor_y = original_height/new_height
inv_scale_factor_x = original_width/new_width
# This is an ugly serial operation.
for new_y in range(new_height):
for new_x in range(new_width):
# If you had a color image, you could repeat this with all channels here.
# Find sub-pixels data:
old_x = new_x * inv_scale_factor_x
old_y = new_y * inv_scale_factor_y
x_fraction = old_x - math.floor(old_x)
y_fraction = old_y - math.floor(old_y)
# Sample four neighboring pixels:
left_upper = image_matrix[math.floor(old_y), math.floor(old_x)]
right_upper = image_matrix[math.floor(old_y), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
left_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), math.floor(old_x)]
right_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
# Interpolate horizontally:
blend_top = (right_upper * x_fraction) + (left_upper * (1.0 - x_fraction))
blend_bottom = (right_lower * x_fraction) + (left_lower * (1.0 - x_fraction))
# Interpolate vertically:
final_blend = (blend_top * y_fraction) + (blend_bottom * (1.0 - y_fraction))
output_image[new_y, new_x] = final_blend
return output_image
Sample rescaling:
Original:
Downscaled by Half:
Upscaled by one and one quarter:
Are there any libraries to do this in numpy/SciPy
Sure. You can do this without OpenCV, scikit-image or PIL.
Image resizing is basically mapping the coordinates of each pixel from the original image to its resized position.
Since the coordinates of an image must be integers (think of it as a matrix), if the mapped coordinate has decimal values, you should interpolate the pixel value to approximate it to the integer position (e.g. getting the nearest pixel to that position is known as Nearest neighbor interpolation).
All you need is a function that does this interpolation for you. SciPy has interpolate.interp2d.
You can use it to resize an image in numpy array, say arr, as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image is RGB, you have to perform the interpolation for each channel.
If you would like to understand more, I suggest watching Resizing Images - Computerphile.
import cv2
import numpy as np
image_read = cv2.imread('filename.jpg',0)
original_image = np.asarray(image_read)
width , height = 452,452
resize_image = np.zeros(shape=(width,height))
for W in range(width):
for H in range(height):
new_width = int( W * original_image.shape[0] / width )
new_height = int( H * original_image.shape[1] / height )
resize_image[W][H] = original_image[new_width][new_height]
print("Resized image size : " , resize_image.shape)
cv2.imshow(resize_image)
cv2.waitKey(0)
I'm trying to use the transforms.Compose() in my segmentation task. But I'm not sure how to use the same (almost) random transforms for both the image and the mask.
So in my segmentation task, I have the raw picture and the corresponding mask, I'd like to generate more random transformed image pairs for training popurse. Meaning if I do some transform on my raw pictures, and this transformation should also happen on my mask pictures, and then this pair can go into my CNN. My transformer is something like:
train_transform = transforms.Compose([
transforms.Resize(512), # resize, the smaller edge will be matched.
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomRotation(90),
transforms.RandomResizedCrop(320,scale=(0.3, 1.0)),
AddGaussianNoise(0., 1.),
transforms.ToTensor(), # convert a PIL image or ndarray to tensor.
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)) # normalize to Imagenet mean and std
])
mask_transform = transforms.Compose([
transforms.Resize(512), # resize, the smaller edge will be matched.
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomRotation(90),
transforms.RandomResizedCrop(320,scale=(0.3, 1.0)),
##---------------------!------------------
transforms.ToTensor(), # convert a PIL image or ndarray to tensor.
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)) # normalize to Imagenet mean and std
])
Notice, in the code block, I added a class that can add random noise to the raw images transformation, which is not in the mask_transformation, that I want my mask images follow the raw image transformation, but ignore the random noise. So how can these two transformations happen in pairs (with the same random act)?
This seems to have an answer here: How to apply same transform on a pair of picture.
Basically, you can use the torchvision functional API to get a handle to the randomly generated parameters of a random transform such as RandomCrop. Then call torchvision.transforms.functional.crop() on both images with the same parameter values. It seems a bit lengthy but gets the job done. You can skip some transforms on some images, as per your need.
Another option that I've seen elsewhere is to re-seed the random generator with the same seed, to force generation of the same random transformations twice. I would think that such implementations are hacky and keep changing with pytorch versions (e.g. whether to re-seed np.random, random, or torch.manual_seed() ?)
So Sabyasachi's answer is really helpful for me, and I was able to use the transformer in PyTorch to transform my images. This usage of the torchvision.transformer is not the most straightforward way for transferring images. So I'm adding my solution that has an example of using the torchvision.transforms.functional, but also using skimage.filters, and lots of transform functions are available here: https://scikit-image.org/docs/dev/api/skimage.filters.html#skimage.filters.unsharp_mask.
import torchvision.transforms.functional as TF
from skimage.filters import gaussian
from skimage.filters import unsharp_mask
def transformer(image, mask):
# image and mask are PIL image object.
img_w, img_h = image.size
# Random horizontal flipping
if random.random() > 0.5:
image = TF.hflip(image)
mask = TF.hflip(mask)
# Random vertical flipping
if random.random() > 0.5:
image = TF.vflip(image)
mask = TF.vflip(mask)
# Random affine
affine_param = transforms.RandomAffine.get_params(
degrees = [-180, 180], translate = [0.3,0.3],
img_size = [img_w, img_h], scale_ranges = [1, 1.3],
shears = [2,2])
image = TF.affine(image,
affine_param[0], affine_param[1],
affine_param[2], affine_param[3])
mask = TF.affine(mask,
affine_param[0], affine_param[1],
affine_param[2], affine_param[3])
image = np.array(image)
mask = np.array(mask)
# Randome GaussianBlur -- only for images
if random.random() < 0.25:
sigma_param = random.uniform(0.01, 1)
image = gaussian(image, sigma=sigma_param)
# Randome Gaussian Noise -- only for images
if random.random() < 0.25:
factor_param = random.uniform(0.01, 0.5)
image = image + factor_param * image.std() * np.random.randn(image.shape[0], image.shape[1])
# Unsharp filter -- only for images
if random.random() < 0.25:
radius_param = random.uniform(0, 5)
amount_param = random.uniform(0.5, 2)
image = unsharp_mask(image, radius = radius_param, amount=amount_param)
f, ax = plt.subplots(1, 2, figsize=(8, 8))
ax[0].imshow(image)
ax[1].imshow(mask)
return image, mask
I think I have a simple solution:
If the images are concatenated, the transformations are applied to all of them identically:
import torch
import torchvision.transforms as T
# Create two fake images (identical for test purposes):
image = torch.randn((3, 128, 128))
target = image.clone()
# This is the trick (concatenate the images):
both_images = torch.cat((image.unsqueeze(0), target.unsqueeze(0)),0)
# Apply the transformations to both images simultaneously:
transformed_images = T.RandomRotation(180)(both_images)
# Get the transformed images:
image_trans = transformed_images[0]
target_trans = transformed_images[1]
# Compare the transformed images:
torch.all(image_trans == target_trans).item()
>> True
So i have preprocessed some dicom images to feed a neural network, and in image augmentation step, the image data generator expects a 4d input while my data is 3d (200, 420, 420)
i tried reshaping the array and expanding dimensions, but in both cases i cannot plot the individual images in the array (expects image with shape 420, 420 and instead my new images have shape 420, 420, 1)
and here are my codes;
I have three functions to convert DICOM images into images with good contrast;
This one takes housefield units
def transform_to_hu(medical_image, image):
intercept = medical_image.RescaleIntercept
slope = medical_image.RescaleSlope
hu_image = image * slope + intercept
return hu_image
This one sets window image values;
def window_image(image, window_center, window_width):
img_min = window_center - window_width // 2
img_max = window_center + window_width // 2
window_image = image.copy()
window_image[window_image < img_min] = img_min
window_image[window_image > img_max] = img_max
return window_image
And this function loads the image:
def load_image(file_path):
medical_image = dicom.read_file(file_path)
image = medical_image.pixel_array
hu_image = transform_to_hu(medical_image, image)
brain_image = window_image(hu_image, 40, 80)
return brain_image
Then i load my images:
files = sorted(glob.glob('F:\CT_Data_Classifier\*.dcm'))
images = np.array([load_image(path) for path in files])
images.shape returns (200, 512, 512)
and everything is fine about the data, for example i can plot 100th image by
plt.imshow(images[100]) and it plots an image
i then feed the data into image data generator
train_image_data = ImageDataGenerator(
rescale=1./255,
shear_range=0.,
zoom_range=0.05,
rotation_range=180,
width_shift_range=0.05,
height_shift_range=0.05,
horizontal_flip=True,
vertical_flip=True,
fill_mode='constant',
cval=0
but then, when i try to plot, with this code:
plt.figure(figsize=(12, 12))
for X_batch, y_batch in train_image_data.flow(trainX, trainY, batch_size=9):
for i in range(0, 9):
plt.subplot(330 + 1 + i)
plt.imshow(X_batch[i])
plt.show()
break
it returns
(ValueError: ('Input data in "NumpyArrayIterator" should have rank 4. You passed an array with shape', (162, 420, 420)))
i tried expand_dims and reshape to add an extra dimension at the end of the array to represent channels
but then it returns
TypeError: Invalid shape (420, 420, 1) for image data
in the plt.imshow stage
im a doctor and not an experienced programmer, so i would really appreciate your help. cheers.
You are correct in adding an extra dimension to represent channels. That part seems fine. The problem is with plotting. For that, you can use:
plt.matshow(x[..., 0]).
where x is the 3D array. The syntax x[..., 0] means take index 0 of the last dimension of array x. The ellipsis (...) is shorthand to fill in the dimensions. For a 3D array, the equivalent call would be x[:, :, 0].
I'm attempting to train a Unet to provide each pixel of a 256x256 image with a label, similar to the tutorial given here. In the example, the predictions of the Unet are a (128x128x3) output where the 3 denotes one of the classifications assigned to each pixel. In my case, I need a (256x256x10) output having 10 different classifications (Essentially a one-hot encoded array for each pixel in the image).
I can load the images but I'm struggling to convert each image's corresponding segmentation mask to the correct format. I have created DataSets by defining a map function called process_path which takes a saved numpy representation of the mask and creates a tensor of dimension (256 256 10), but I get a ValueError when I call model.fit, telling me that it cannot call as_list because the shape of the Tensor cannot be found:
# --------------------------------------------------------------------------------------
# DECODE A NUMPY .NPY FILE INTO THE REQUIRED FORMAT FOR TRAINING
# --------------------------------------------------------------------------------------
def decode_npy(npy):
filename = npy.numpy()
data = np.load(filename)
data = kerasUtils.to_categorical(data, 10)
return data
# --------------------------------------------------------------------------------------
# DECODE AN IMAGE (PNG) FILE INTO THE REQUIRED FORMAT FOR TRAINING
# --------------------------------------------------------------------------------------
def decode_img(img):
img = tf.image.decode_png(img, channels=3)
return tf.image.convert_image_dtype(img, tf.float32)
# --------------------------------------------------------------------------------------
# PROCESS A FILE PATH FOR THE DATASET
# input - path to an image file
# output - an input image and output mask
# --------------------------------------------------------------------------------------
def process_path(filePath):
parts = tf.strings.split(filePath, '/')
fileName = parts[-1]
parts = tf.strings.split(fileName, '.')
prefix = tf.convert_to_tensor(maskDir, dtype=tf.string)
suffix = tf.convert_to_tensor("-mask.png", dtype=tf.string)
maskFileName = tf.strings.join((parts[-2], suffix))
maskPath = tf.strings.join((prefix, maskFileName), separator='/')
# load the raw data from the file as a string
img = tf.io.read_file(filePath)
img = decode_img(img)
mask = tf.py_function(decode_npy, [maskPath], tf.float32)
return img, mask
trainDataSet = allDataSet.take(trainSize)
trainDataSet = trainDataSet.map(process_path).batch(4)
validDataSet = allDataSet.skip(trainSize)
validDataSet = validDataSet.map(process_path).batch(4)
How can I take each images' corresponding (256 256 3) segmentation mask (stored as png) and convert it to a (256 256 10) tensor, where the i-th channel represents the pixels value as in the tutorial? Can anyone explain how this is achieved, either in the process_path function or wherever it would be most efficient to perform the conversion?
Update:
Here is an example of a segmentation mask. Every mask contains the same 10 colours shown:
import numpy as np
from cv2 import imread
im = imread('hfoa7.png', 0) # read as grayscale to get 10 unique values
n_classes = 10
one_hot = np.zeros((im.shape[0], im.shape[1], n_classes))
for i, unique_value in enumerate(np.unique(im)):
one_hot[:, :, i][im == unique_value] = 1
hfao7 is the name of the image you posted. This code snippet creates a one-hot matrix from the image.
You will want to insert this code into decode_npy(). However, since you sent me a png, the code above won't work with a npy file. You could pass in the names of the pngs instead of the npys instead. Don't worry about using kerasUtils.to_categorical - the function I posted makes categorical labels.
You can do this in pure Tensorflow, see my Blogpost: https://www.spacefish.biz/2020/11/rgb-segmentation-masks-to-classes-in-tensorflow/