So this is the model, with input as a single image and outputs at different scales of the image, i.e., I, 1/2 I, 1/4 I and 1/8 I, Model(inputs=[inputs], outputs=[out6, out7, out8, out9])
I am not sure how to create the train dataset. Suppose the input to the y_train will be data of say shape (50, 192, 256, 3) where 3 = channel of the image, 192 is the width and 256 is the height, and there are 50 of them, but how to create a y_train which will have 4 components? I have tried with zip and then converting it to numpy but that doesn't works...
If you necessarily want the model to learn to generate multi-scale masks then you can try downsampling to generate the scaled masks for supervised learning using UNET. You can use interpolation-based methods to automatically resize an image with minimum loss. Here is a post where I compare benchmarks against multiple such methods.
If you want to create [masks, masks_half, masks_quarter, masks_eighth] for your model.fit, which is the list of original + rescaled versions of the mask images, you may wanna try a fast downsampling method (depending on the size of your dataset).
Here I have used skimage.transform.pyramid_reduce to downsample a mask to half, quarter, and eighth of its scale. The method uses interpolation (spline), but can be controlled via parameters. Check this for more details.
from skimage.transform import pyramid_reduce
masks = np.random.random((50, 192, 256, 3))
masks_half = np.stack([pyramid_reduce(i, 2, multichannel=True) for i in masks])
masks_quater = np.stack([pyramid_reduce(i, 4, multichannel=True) for i in masks])
masks_eighth = np.stack([pyramid_reduce(i, 8, multichannel=True) for i in masks])
print('Shape of original',masks.shape)
print('Shape of half scaled',masks_half.shape)
print('Shape of quater scaled',masks_quater.shape)
print('Shape of eighth scaled',masks_eighth.shape)
Shape of original (50, 192, 256, 3)
Shape of half scaled (50, 96, 128, 3)
Shape of quater scaled (50, 48, 64, 3)
Shape of eighth scaled (50, 24, 32, 3)
Testing on a single image/mask -
from skimage.data import camera
from skimage.transform import pyramid_reduce
def plotit(img, h, q, e):
fig, axes = plt.subplots(1,4, figsize=(10,15))
axes[0].imshow(img)
axes[1].imshow(h)
axes[2].imshow(q)
axes[3].imshow(e)
axes[0].title.set_text('Original')
axes[1].title.set_text('Half')
axes[2].title.set_text('Quarter')
axes[3].title.set_text('Eighth')
img = camera() #(512,512)
h = pyramid_reduce(img, 2) #Half
q = pyramid_reduce(img, 4) #Quarter
e = pyramid_reduce(img, 8) #Eighth
plotit(img, h, q, e)
Notice the change in scale over x and y-axis --------------------->
Related
Given groups=1, weight of size [48, 3, 3, 3], expected input [5, 128, 129, 4] to have 3 channels, but got 128 channels instead.
This is my code:
**model_ft.eval()
for image in test_loader:
image = image.cuda()
output = model_ft(image)
output = output.cpu().detach().numpy()
for i, (e, n) in enumerate(list(zip(output, name))):
sub.loc[sub['id_code'] == n.split('/')[-1].split('.')[0], 'diagnosis'] = le.inverse_transform([np.argmax(e)])
sub.to_csv('submission.csv', index=False)**
print(X_test.shape)
(3071, 128, 128, 3)
from torch.utils.data import DataLoader
test_loader = DataLoader(X_test, batch_size=5, shuffle=True)
print(train_data)
i don't know how to fix this problem to predict my compete
I'm assuming by
print(X_test.shape)
(3071, 128, 128, 3)
you mean that the test data has 3071 samples with 128x128 pixels and 3 color channels each.
Also I'm assuming that the model you are using doesn't transpose the inputs, so the convolution layers expect the default layout which is shape (N, C, H, W) but you provide your data as (N, H, W, C).
Solution: Try image.transpose_(1, 3) or image = image.cuda().transpose(1, 3) before handing it to the model.
I have this code (during testing, not training) for my input image and first convolution and Relu layer:
convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, IMAGE_CHANNELS], name='input')
convnet1 = conv_2d(convnet, FIRST_NUM_CHANNEL, FILTER_SIZE, activation='relu')
convnet1 = max_pool_2d(convnet1, FILTER_SIZE)
If I print the variable convnet1, I get this result Tensor("MaxPool2D/MaxPool:0", shape=(?, 52, 52, 32), dtype=float32) which is right because my input image is 256x256 and filter size is 5x5.
My question is how can I visualize my convnet1 data/variable? It has 32 channels so I'm assuming I can display 32 black and white images with dimensions 52x52.
If you want to print 32 of them in one plot you can do something like this
def plot_convnet(convnet, input_num=0):
# since convnet1 is 4dim (?,52,52,32) Assuming the first dim is Batch size you
# can plot the 32 channels of a single image from the batch given by input_num
C = Session.run(convnet) # remove the session run if the tensor is already
#evaluated
# Number of channels -- 32 in your case
num_chnls = C.shape[3]
# Number of grids to plot.
# Rounded-up, square-root of the number of channels
grids = math.ceil(math.sqrt(num_chnls))
#Create figure with a grid of sub-plots.
fig, axes = plt.subplots(grids, grids)
for i, ax in enumerate(axes.flat):
if i<num_chnls:
im = C[input_num,:, :, i]
#Plot image.
ax.imshow(im,
interpolation='nearest', cmap='seismic')
plt.show()
I am using Transfer learning for recognizing objects. I used trained VGG16 model as the base model and added my classifier on top of it using Keras. I then trained the model on my data, the model works well. I want to see the feature generated by the intermediate layers of the model for the given data. I used the following code for this purpose:
def ModeloutputAtthisLayer(model, layernme, imgnme, width, height):
layer_name = layernme
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
img = image.load_img(imgnme, target_size=(width, height))
imageArray = image.img_to_array(img)
image_batch = np.expand_dims(imageArray, axis=0)
processed_image = preprocess_input(image_batch.copy())
intermediate_output = intermediate_layer_model.predict(processed_image)
print("outshape of ", layernme, "is ", intermediate_output.shape)
In the code, I used np.expand_dims to add one extra dimension for the batch as the input matrix to the network should be of the form (batchsize, height, width, channels). This code works fine. The shape of the feature vector is 1, 224, 224, 64.
Now I wish to display this as image, for this I understand there is an additional dimension added as batch so I should remove it. Following this I used the following lines of the code:
imge = np.squeeze(intermediate_output, axis=0)
plt.imshow(imge)
However it throws an error:
"Invalid dimensions for image data"
I wonder how can I display the extracted feature vector as an image. Any suggestion please.
Your feature shape is (1,224,224,64), you cannot directly plot a 64 channel image. What you can do is plot the individual channels independently like following
imge = np.squeeze(intermediate_output, axis=0)
filters = imge.shape[2]
plt.figure(1, figsize=(32, 32)) # plot image of size (32x32)
n_columns = 8
n_rows = math.ceil(filters / n_columns) + 1
for i in range(filters):
plt.subplot(n_rows, n_columns, i+1)
plt.title('Filter ' + str(i))
plt.imshow(imge[:,:,i], interpolation="nearest", cmap="gray")
This will plot 64 images in 8 rows and 8 columns.
A possible way to go consists in combining the 64 channels into a single-channel image through a weighted sum like this:
weighted_imge = np.sum(imge*weights, axis=-1)
where weights is an array with 64 weighting coefficients.
If you wish to give all the channels the same weight you could simply compute the average:
weighted_imge = np.mean(imge, axis=-1)
Demo
import numpy as np
import matplotlib.pyplot as plt
intermediate_output = np.random.randint(size=(1, 224, 224, 64),
low=0, high=2**8, dtype=np.uint8)
imge = np.squeeze(intermediate_output, axis=0)
weights = np.random.random(size=(imge.shape[-1],))
weighted_imge = np.sum(imge*weights, axis=-1)
plt.imshow(weighted_imge)
plt.colorbar()
In [33]: intermediate_output.shape
Out[33]: (1, 224, 224, 64)
In [34]: imge.shape
Out[34]: (224, 224, 64)
In [35]: weights.shape
Out[35]: (64,)
In [36]: weighted_imge.shape
Out[36]: (224, 224)
I am going through tutorials to train/test a convolutional neural network(CNN), and I am having an issue with prepping a test image to run it through the trained network. My initial guess is that it has something to do with having a correct format of the tensor input for the net.
Here is the code for the Net.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.init as I
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## 1. This network takes in a square (same width and height), grayscale image as input
## 2. It ends with a linear layer that represents the keypoints
## this last layer output 136 values, 2 for each of the 68 keypoint (x, y) pairs
# input size 224 x 224
# after the first conv layer, (W-F)/S + 1 = (224-5)/1 + 1 = 220
# after one pool layer, this becomes (32, 110, 110)
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size = 2, stride = 2
self.pool = nn.MaxPool2d(2,2)
# second conv layer: 32 inputs, 64 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (110-3)/1 + 1 = 108
## output dimension: (64, 108, 108)
## after another pool layer, this becomes (64, 54, 54)
self.conv2 = nn.Conv2d(32, 64, 3)
# third conv layer: 64 inputs, 128 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (54-3)/1 + 1 = 52
## output dimension: (128, 52, 52)
## after another pool layer, this becomes (128, 26, 26)
self.conv3 = nn.Conv2d(64,128,3)
self.conv_drop = nn.Dropout(p = 0.2)
self.fc_drop = nn.Dropout(p = 0.4)
# 64 outputs * 5x5 filtered/pooled map = 186624
self.fc1 = nn.Linear(128*26*26, 1000)
#
self.fc2 = nn.Linear(1000, 1000)
self.fc3 = nn.Linear(1000, 136)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.conv_drop(x)
# prep for linear layer
# flattening
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc_drop(x)
x = self.fc2(x)
x = self.fc_drop(x)
x = self.fc3(x)
return x
Maybe my calculation in layer inputs is wrong?
And here is the test-running code block: (you can think of 'roi' as a standard numpy image.)
# loop over the detected faces from your haar cascade
for i, (x,y,w,h) in enumerate(faces):
plt.figure(figsize=(10,5))
ax = plt.subplot(1, len(faces), i+1)
# Select the region of interest that is the face in the image
roi = image_copy[y:y+h, x:x+w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = np.multiply(roi, 1/255)
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
roi = roi.reshape(roi.shape[0], roi.shape[1], 1)
roi = roi.transpose((2, 0, 1))
## TODO: Change to tensor
roi = torch.from_numpy(roi)
roi = roi.type(torch.FloatTensor)
roi = roi.unsqueeze(0)
print (roi.shape)
## TODO: run it through the net
output_pts = net(roi)
And I get the error message saying:
RuntimeError: size mismatch, m1: [1 x 100352], m2: [86528 x 1000] at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:2033
The caveat is if I run my trained network in the provided test suite (where the tensor inputs are already prepped), it gives no errors there and runs as it is supposed to. I think that means there's nothing wrong with the design of the network architecture itself. I think there's something wrong with the way that I am prepping the image.
The output of the 'roi.shape' is:
torch.Size([1, 1, 244, 244])
Which should be okay because ([batch_size, color_channel, x, y]).
UPDATE: I have printed out the shape of the layers during running through the net. It turns out the matching input dimensions for FC are different for the test image for the task and the given test images from the test suite. Then I'm almost 80% sure that my prepping the input image for the net is wrong. But how can they have different matching dimensions if the input tensor for both has the exact same dimension ([1,1,244,244])?
when using the provided test suite (where it runs fine):
input: torch.Size([1, 1, 224, 224])
layer before 1st CV: torch.Size([1, 1, 224, 224])
layer after 1st CV pool: torch.Size([1, 32, 110, 110])
layer after 2nd CV pool: torch.Size([1, 64, 54, 54])
layer after 3rd CV pool: torch.Size([1, 128, 26, 26])
flattend layer for the 1st FC: torch.Size([1, 86528])
When prepping/running the test image:
input: torch.Size([1, 1, 244, 244])
layer before 1st CV: torch.Size([1, 1, 244, 244])
layer after 1st CV pool: torch.Size([1, 32, 120, 120]) #<- what happened here??
layer after 2nd CV pool: torch.Size([1, 64, 59, 59])
layer after 3rd CV pool: torch.Size([1, 128, 28, 28])
flattend layer for the 1st FC: torch.Size([1, 100352])
Did you noticed you have this line in the image preparation.
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
so you just resized it to 244x244 and not to 224x224.
I have a tensor of size (24, 2, 224, 224) in Pytorch.
24 = batch size
2 = matrixes representing foreground and
background
224 = image height dimension
224 = image width
dimension
This is the output of a CNN that performs binary segmentation. In each cell of the 2 matrixes is stored the probability for that pixel to be foreground or background: [n][0][h][w] + [n][1][h][w] = 1 for every coordinate
I want to reshape it into a tensor of size (24, 1, 224, 224). The values in the new layer should be 0 or 1 according to the matrix in which the probability was higher.
How can I do that? Which function should I use?
Using torch.argmax() (for PyTorch +0.4):
prediction = torch.argmax(tensor, dim=1) # with 'dim' the considered dimension
prediction = prediction.unsqueeze(1) # to reshape from (24, 224, 224) to (24, 1, 224, 224)
If the PyTorch version is below 0.4.0, one can use tensor.max() which returns both the max values and their indices (but which isn't differentiable over the index values):
_, prediction = tensor.max(dim=1)
prediction = prediction.unsqueeze(1) # to reshape from (24, 224, 224) to (24, 1, 224, 224)