Reshaping Pytorch tensor - python

I have a tensor of size (24, 2, 224, 224) in Pytorch.
24 = batch size
2 = matrixes representing foreground and
background
224 = image height dimension
224 = image width
dimension
This is the output of a CNN that performs binary segmentation. In each cell of the 2 matrixes is stored the probability for that pixel to be foreground or background: [n][0][h][w] + [n][1][h][w] = 1 for every coordinate
I want to reshape it into a tensor of size (24, 1, 224, 224). The values in the new layer should be 0 or 1 according to the matrix in which the probability was higher.
How can I do that? Which function should I use?

Using torch.argmax() (for PyTorch +0.4):
prediction = torch.argmax(tensor, dim=1) # with 'dim' the considered dimension
prediction = prediction.unsqueeze(1) # to reshape from (24, 224, 224) to (24, 1, 224, 224)
If the PyTorch version is below 0.4.0, one can use tensor.max() which returns both the max values and their indices (but which isn't differentiable over the index values):
_, prediction = tensor.max(dim=1)
prediction = prediction.unsqueeze(1) # to reshape from (24, 224, 224) to (24, 1, 224, 224)

Related

Resizing a numpy array to 224x224 for VGG16 Model

I am solving a Multiview Classification problem using VGG16 pretrained model. In my case, I have 4 views that are my inputs and they are of size (64,64,3). But VGG16 uses input size of (224,224,3).
Now for solving the problem, I am supposed to create my own data loader instead of using quick built-in methods like keras load_img() or openCV imread(). So I am doing all this with plain numpy arrays.
I am trying to resize the shape of my input from 64x64 to 224X224. But I am unable to do it, it keeps throwing one error or another. This is my code for data loader:
def data_loader(dataframe, classDict, basePath, batch_size=16):
while True:
x_batch = np.zeros((batch_size, 4, 64, 64, 3)) #Create a zeros array for images
y_batch = np.zeros((batch_size, 20)) #Create a zeros array for classes
for i in range(0, batch_size):
rndNumber = np.random.randint(len(dataframe))
*images, class_id = dataframe.iloc[rndNumber]
for j in range(4):
x_batch[i,j] = plt.imread(os.path.join(basePath, images[j])) / 255.
# x_batch[i,j] = x_batch[i,j].resize(1, 224, 224, 3) #<--- Try(1)
class_id = classDict[class_id]
y_batch[i, class_id] = 1.0
# yield {'image1': np.resize(x_batch[:, 0],(batch_size, 224, 224, 3)), #<--- Try(2)
# 'image2': np.resize(x_batch[:, 1],(1, 224, 224, 3)),
# 'image3': np.resize(x_batch[:, 2],(1, 224, 224, 3)),
# 'image4': np.resize(x_batch[:, 3],(1, 224, 224, 3)) }, {'class_out': y_batch} #'yield' is a keyword that is used like return, except the function will return a generator"
yield {'image1': x_batch[:, 0],
'image2': x_batch[:, 1],
'image3': x_batch[:, 2],
'image4': x_batch[:, 3], }, {'class_out': y_batch}
## Testing the data loader
example, lbl= next(data_loader(df_train, classDictTrain, basePath))
print(example['image1'].shape) #example['image1'][0].shape
print(lbl['class_out'].shape)
I have made several attempts to resizing the images. I am listing them below with error messages I am receiving with each TRY:
Try(1) : Using x_batch[i,j] = x_batch[i,j].resize(1, 224, 224, 3) >> Error: ValueError: cannot resize this array: it does not own its data
Try(2) : Using yield {'image1': np.resize(x_batch[:, 0],(batch_size, 224, 224, 3)), ....... } >> The output shape is (16, 224, 224, 3) which seems fine but when I plot this, the resultant is an image like this
where I need original image just bigger in size like this
Please tell me what am I doing wrong and how can I fix it?
If I understand your problem correctly, you have an image which is 64x64, and you want to upscale it to a resolution of 224x224. Notice that the latter resolution contains many more pixels and you cannot simply force a reshape, because the original image has way less pixel.
You have to upsample the image, generating the missing pixels. A tool you can try is PIL Resize function which can be used with different resampling filters.
As far as I know, numpy does not easily support upscaling filters. Check out this post to understand how to convert a PIL image to a numpy array and you are ready to go.

Multi Scale Segmentation mask outputs in keras in U Net

So this is the model, with input as a single image and outputs at different scales of the image, i.e., I, 1/2 I, 1/4 I and 1/8 I, Model(inputs=[inputs], outputs=[out6, out7, out8, out9])
I am not sure how to create the train dataset. Suppose the input to the y_train will be data of say shape (50, 192, 256, 3) where 3 = channel of the image, 192 is the width and 256 is the height, and there are 50 of them, but how to create a y_train which will have 4 components? I have tried with zip and then converting it to numpy but that doesn't works...
If you necessarily want the model to learn to generate multi-scale masks then you can try downsampling to generate the scaled masks for supervised learning using UNET. You can use interpolation-based methods to automatically resize an image with minimum loss. Here is a post where I compare benchmarks against multiple such methods.
If you want to create [masks, masks_half, masks_quarter, masks_eighth] for your model.fit, which is the list of original + rescaled versions of the mask images, you may wanna try a fast downsampling method (depending on the size of your dataset).
Here I have used skimage.transform.pyramid_reduce to downsample a mask to half, quarter, and eighth of its scale. The method uses interpolation (spline), but can be controlled via parameters. Check this for more details.
from skimage.transform import pyramid_reduce
masks = np.random.random((50, 192, 256, 3))
masks_half = np.stack([pyramid_reduce(i, 2, multichannel=True) for i in masks])
masks_quater = np.stack([pyramid_reduce(i, 4, multichannel=True) for i in masks])
masks_eighth = np.stack([pyramid_reduce(i, 8, multichannel=True) for i in masks])
print('Shape of original',masks.shape)
print('Shape of half scaled',masks_half.shape)
print('Shape of quater scaled',masks_quater.shape)
print('Shape of eighth scaled',masks_eighth.shape)
Shape of original (50, 192, 256, 3)
Shape of half scaled (50, 96, 128, 3)
Shape of quater scaled (50, 48, 64, 3)
Shape of eighth scaled (50, 24, 32, 3)
Testing on a single image/mask -
from skimage.data import camera
from skimage.transform import pyramid_reduce
def plotit(img, h, q, e):
fig, axes = plt.subplots(1,4, figsize=(10,15))
axes[0].imshow(img)
axes[1].imshow(h)
axes[2].imshow(q)
axes[3].imshow(e)
axes[0].title.set_text('Original')
axes[1].title.set_text('Half')
axes[2].title.set_text('Quarter')
axes[3].title.set_text('Eighth')
img = camera() #(512,512)
h = pyramid_reduce(img, 2) #Half
q = pyramid_reduce(img, 4) #Quarter
e = pyramid_reduce(img, 8) #Eighth
plotit(img, h, q, e)
Notice the change in scale over x and y-axis --------------------->

TensorFlow vs PyTorch convolution confusion

I am confused on how to replicate Keras (TensorFlow) convolutions in PyTorch.
In Keras, I can do something like this. (the input size is (256, 237, 1, 21) and the output size is (256, 237, 1, 1024).
import tensorflow as tf
x = tf.random.normal((256,237,1,21))
y = tf.keras.layers.Conv1D(filters=1024, kernel_size=5,padding="same")(x)
print(y.shape)
(256, 237, 1, 1024)
However, in PyTorch, when I try to do the same thing I get a different output size:
import torch.nn as nn
x = torch.randn(256,237,1,21)
m = nn.Conv1d(in_channels=237, out_channels=1024, kernel_size=(1,5))
y = m(x)
print(y.shape)
torch.Size([256, 1024, 1, 17])
I want PyTorch to give me the same output size that Keras does:
This previous question seems to imply that Keras filters are PyTorch's out_channels but thats what I have. I tried to add the padding in PyTorch of padding=(0,503) but that gives me torch.Size([256, 1024, 1, 1023]) but that still not correct. This also takes so much longer than keras does so I feel that I have incorrectly assigned a parameter.
How can I replicate what Keras did with convolution in PyTorch?
In TensorFlow, tf.keras.layers.Conv1D takes in a tensor of shape (batch_shape + (steps, input_dim)). Which means that what is commonly known as channels appears on the last axis. For instance in 2D convolution you would have (batch, height, width, channels). This is different from PyTorch where the channel dimension is right after the batch axis: torch.nn.Conv1d takes in shapes of (batch, channel, length). So you will need to permute two axes.
For torch.nn.Conv1d:
in_channels is the number of channels in the input tensor
out_channels is the number of filters, i.e. the number of channels the output will have
stride the step size of the convolution
padding the zero-padding added to both sides
In PyTorch there is no option for padding='same', you will need to choose padding correctly. Here stride=1, so padding must equal to kernel_size//2 (i.e. padding=2) in order to maintain the length of the tensor.
In your example, since x has a shape of (256, 237, 1, 21), in TensorFlow's terminology it will be considered as an input with:
a batch shape of (256, 237),
steps=1, so the length of your 1D input is 1,
21 input channels.
Whereas in PyTorch, x of shape (256, 237, 1, 21) would be:
batch shape of (256, 237),
1 input channel
a length of 21.
Have kept the input in both examples below (TensorFlow vs. PyTorch) as x.shape=(256, 237, 21) assuming 256 is the batch size, 237 is the length of the input sequence, and 21 is the number of channels (i.e. the input dimension, what I see as the dimension on each timestep).
In TensorFlow:
>>> x = tf.random.normal((256, 237, 21))
>>> m = tf.keras.layers.Conv1D(filters=1024, kernel_size=5, padding="same")
>>> y = m(x)
>>> y.shape
TensorShape([256, 237, 1024])
In PyTorch:
>>> x = torch.randn(256, 237, 21)
>>> m = nn.Conv1d(in_channels=21, out_channels=1024, kernel_size=5, padding=2)
>>> y = m(x.permute(0, 2, 1))
>>> y.permute(0, 2, 1).shape
torch.Size([256, 237, 1024])
So in the latter, you would simply work with x = torch.randn(256, 21, 237)...
PyTorch now has out of the box same convolution operation you can take a look at this link [Same convolution][1]
class InceptionNet(nn.Module):
def __init__(self, in_channels, in_1x1, in_3x3reduce, in_3x3, in_5x5reduce, in_5x5, in_1x1pool):
super(InceptionNet, self).__init__()
self.incep_1 = ConvBlock(in_channels, in_1x1, kernel_size=1, padding='same')
Note a same convolution only supports the default stride value which is 1 anything other won't work.
[1]: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

Display extracted feature vector from trained layer of the model as an image

I am using Transfer learning for recognizing objects. I used trained VGG16 model as the base model and added my classifier on top of it using Keras. I then trained the model on my data, the model works well. I want to see the feature generated by the intermediate layers of the model for the given data. I used the following code for this purpose:
def ModeloutputAtthisLayer(model, layernme, imgnme, width, height):
layer_name = layernme
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
img = image.load_img(imgnme, target_size=(width, height))
imageArray = image.img_to_array(img)
image_batch = np.expand_dims(imageArray, axis=0)
processed_image = preprocess_input(image_batch.copy())
intermediate_output = intermediate_layer_model.predict(processed_image)
print("outshape of ", layernme, "is ", intermediate_output.shape)
In the code, I used np.expand_dims to add one extra dimension for the batch as the input matrix to the network should be of the form (batchsize, height, width, channels). This code works fine. The shape of the feature vector is 1, 224, 224, 64.
Now I wish to display this as image, for this I understand there is an additional dimension added as batch so I should remove it. Following this I used the following lines of the code:
imge = np.squeeze(intermediate_output, axis=0)
plt.imshow(imge)
However it throws an error:
"Invalid dimensions for image data"
I wonder how can I display the extracted feature vector as an image. Any suggestion please.
Your feature shape is (1,224,224,64), you cannot directly plot a 64 channel image. What you can do is plot the individual channels independently like following
imge = np.squeeze(intermediate_output, axis=0)
filters = imge.shape[2]
plt.figure(1, figsize=(32, 32)) # plot image of size (32x32)
n_columns = 8
n_rows = math.ceil(filters / n_columns) + 1
for i in range(filters):
plt.subplot(n_rows, n_columns, i+1)
plt.title('Filter ' + str(i))
plt.imshow(imge[:,:,i], interpolation="nearest", cmap="gray")
This will plot 64 images in 8 rows and 8 columns.
A possible way to go consists in combining the 64 channels into a single-channel image through a weighted sum like this:
weighted_imge = np.sum(imge*weights, axis=-1)
where weights is an array with 64 weighting coefficients.
If you wish to give all the channels the same weight you could simply compute the average:
weighted_imge = np.mean(imge, axis=-1)
Demo
import numpy as np
import matplotlib.pyplot as plt
intermediate_output = np.random.randint(size=(1, 224, 224, 64),
low=0, high=2**8, dtype=np.uint8)
imge = np.squeeze(intermediate_output, axis=0)
weights = np.random.random(size=(imge.shape[-1],))
weighted_imge = np.sum(imge*weights, axis=-1)
plt.imshow(weighted_imge)
plt.colorbar()
In [33]: intermediate_output.shape
Out[33]: (1, 224, 224, 64)
In [34]: imge.shape
Out[34]: (224, 224, 64)
In [35]: weights.shape
Out[35]: (64,)
In [36]: weighted_imge.shape
Out[36]: (224, 224)

Pytorch: Size Mismatch during running a test image through a trained CNN

I am going through tutorials to train/test a convolutional neural network(CNN), and I am having an issue with prepping a test image to run it through the trained network. My initial guess is that it has something to do with having a correct format of the tensor input for the net.
Here is the code for the Net.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.init as I
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## 1. This network takes in a square (same width and height), grayscale image as input
## 2. It ends with a linear layer that represents the keypoints
## this last layer output 136 values, 2 for each of the 68 keypoint (x, y) pairs
# input size 224 x 224
# after the first conv layer, (W-F)/S + 1 = (224-5)/1 + 1 = 220
# after one pool layer, this becomes (32, 110, 110)
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size = 2, stride = 2
self.pool = nn.MaxPool2d(2,2)
# second conv layer: 32 inputs, 64 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (110-3)/1 + 1 = 108
## output dimension: (64, 108, 108)
## after another pool layer, this becomes (64, 54, 54)
self.conv2 = nn.Conv2d(32, 64, 3)
# third conv layer: 64 inputs, 128 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (54-3)/1 + 1 = 52
## output dimension: (128, 52, 52)
## after another pool layer, this becomes (128, 26, 26)
self.conv3 = nn.Conv2d(64,128,3)
self.conv_drop = nn.Dropout(p = 0.2)
self.fc_drop = nn.Dropout(p = 0.4)
# 64 outputs * 5x5 filtered/pooled map = 186624
self.fc1 = nn.Linear(128*26*26, 1000)
#
self.fc2 = nn.Linear(1000, 1000)
self.fc3 = nn.Linear(1000, 136)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.conv_drop(x)
# prep for linear layer
# flattening
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc_drop(x)
x = self.fc2(x)
x = self.fc_drop(x)
x = self.fc3(x)
return x
Maybe my calculation in layer inputs is wrong?
And here is the test-running code block: (you can think of 'roi' as a standard numpy image.)
# loop over the detected faces from your haar cascade
for i, (x,y,w,h) in enumerate(faces):
plt.figure(figsize=(10,5))
ax = plt.subplot(1, len(faces), i+1)
# Select the region of interest that is the face in the image
roi = image_copy[y:y+h, x:x+w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = np.multiply(roi, 1/255)
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
roi = roi.reshape(roi.shape[0], roi.shape[1], 1)
roi = roi.transpose((2, 0, 1))
## TODO: Change to tensor
roi = torch.from_numpy(roi)
roi = roi.type(torch.FloatTensor)
roi = roi.unsqueeze(0)
print (roi.shape)
## TODO: run it through the net
output_pts = net(roi)
And I get the error message saying:
RuntimeError: size mismatch, m1: [1 x 100352], m2: [86528 x 1000] at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:2033
The caveat is if I run my trained network in the provided test suite (where the tensor inputs are already prepped), it gives no errors there and runs as it is supposed to. I think that means there's nothing wrong with the design of the network architecture itself. I think there's something wrong with the way that I am prepping the image.
The output of the 'roi.shape' is:
torch.Size([1, 1, 244, 244])
Which should be okay because ([batch_size, color_channel, x, y]).
UPDATE: I have printed out the shape of the layers during running through the net. It turns out the matching input dimensions for FC are different for the test image for the task and the given test images from the test suite. Then I'm almost 80% sure that my prepping the input image for the net is wrong. But how can they have different matching dimensions if the input tensor for both has the exact same dimension ([1,1,244,244])?
when using the provided test suite (where it runs fine):
input: torch.Size([1, 1, 224, 224])
layer before 1st CV: torch.Size([1, 1, 224, 224])
layer after 1st CV pool: torch.Size([1, 32, 110, 110])
layer after 2nd CV pool: torch.Size([1, 64, 54, 54])
layer after 3rd CV pool: torch.Size([1, 128, 26, 26])
flattend layer for the 1st FC: torch.Size([1, 86528])
When prepping/running the test image:
input: torch.Size([1, 1, 244, 244])
layer before 1st CV: torch.Size([1, 1, 244, 244])
layer after 1st CV pool: torch.Size([1, 32, 120, 120]) #<- what happened here??
layer after 2nd CV pool: torch.Size([1, 64, 59, 59])
layer after 3rd CV pool: torch.Size([1, 128, 28, 28])
flattend layer for the 1st FC: torch.Size([1, 100352])
Did you noticed you have this line in the image preparation.
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
so you just resized it to 244x244 and not to 224x224.

Categories

Resources