All is in the title, I just want to know, how can I load my own test data (image.jpg) in pytorch in order to test my CNN.
You need to feed images to net the same as in training: that is, you should apply exactly the same transformations to get similar results.
Assuming your net was trained using this code (or similar), you can see that an input image (for validation) undergoes the following transformations:
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
Following torchvision.transforms docs you can see that an input image goes through:
Resizing to 256x256 pix
Cropping 224x224 rect from the center of the image
The image is converted from uint8 datatype to float in range [0, 1], and transposed to 3-by-224-by-224 array
The image is normalize by subtracting mean and dividing by std.
You can do all this manually to any image
import numpy as np
from PIL import Image
pil_img = Image.open('image.jpg').resize((256, 256), Image.BILINEAR) # read and resize
# center crop
w, h = pil_img.size
i = int(round((h - 224) / 2.))
j = int(round((w - 224) / 2.))
pil_img = pil_img.crop((j, i, j+224, i+224))
np_img = np.array(pil_img).astype(np.float32) / 255.
np_img = np.transpose(np_img, (2, 0, 1))
# normalize
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
for c in range(3):
np_img = (np_img[c, ...] - mean[c]) / std[c]
Once you have np_img ready for your model, you can run a feed forward pass:
pred = model(np_img[None, ...]) # note that we add a singleton leading dim for batch
thanks for your response. My problem was loading test data and I found a solution.
test_data = datasets.ImageFolder('root/test_cnn', transform=transform)
For example if I have 2 directories cat & dog (in the test_cnn directory) that contains images, the Object ImageFolder will assign automatically the classes cat and dog to my images.
During testing, I have just to drop the classes.
Related
I try to implement a API for pre-trained resnet(machine learning model). so server can accepts a single valid image file in the request to be analyzed. Returns the output from running the image against the model.
I'm wondering what's the general structure of my api look like. So far I have
app
api
init.py (for blueprint)
errors.py (for exceptions)
main
resnet18.py (for actual model and pic classification)
In my resnet18.py:
import torchvision.models as models
import torch
from torchvision import transforms
resnet18 = models.resnet18(pretrained=True)
transform = transforms.Compose([
transforms.Resize(256), # resize the image to 256*256
transforms.CenterCrop(224), # crop the image to 224*224 pixels about the center
transforms.ToTensor(), # convert the image to PyTorch Tensor data type
transforms.Normalize(mean=[0.485, 0.456, 0.406], # Normalize the image by setting its mean and standard deviation to the specified values
std=[0.229, 0.224, 0.225])
])
from PIL import Image
img = Image.open('dog.jpg')
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)
out = resnet18(batch_t)
with open('imagenet_classes.txt') as f:
classes = [line.strip() for line in f.readlines()]
_, index = torch.max(out, 1)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
# print(classes[index[0]], percentage[index[0]].item())
most_likely = classes[index[0]]
confidence = percentage[index[0]].item()
# _, indices = torch.sort(out, descending=True)
# [(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
def to_json():
json_prediction = {
'most_likely': most_likely,
'confidence': confidence
}
return json_prediction
So i want to call this fill once i upload a image. I'm wondering how can i make it more elegant. I never done anything like this.
First off, why am I using Keras? I'm trying to stay as high level as possible, which doesn't mean I'm scared of low-level Tensorflow; I just want to see how far I can go while keeping my code as simple and readable as possible.
I need my Keras model (custom-built using the Keras functional API) to read the left image from a stereo pair and minimize a loss function that needs to access both the right and left images. I want to store the data in a tf.data.Dataset.
What I tried:
Reading the dataset as (left image, right image), i.e. as tensors with shape ((W, H, 3), (W, H, 3)), then use function closure: define a keras_loss(left_images) that returns a loss(y_true, y_pred), with y_true being a tf.Tensor that holds the right image. The problem with this approach is that left_images is a tf.data.Dataset and Tensorflow complains (rightly so) that I'm trying to operate on a dataset instead of a tensor.
Reading the dataset as (left image, (left image, right image)), which should make y_true a tf.Tensor with shape ((W, H, 3), (W, H, 3)) that holds both the right and left images. The problem with this approach is that it...does not work and raises the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), for inputs ['tf_op_layer_resize/ResizeBilinear']
but instead got the following list of 2 arrays: [<tf.Tensor 'args_1:0'
shape=(None, 512, 256, 3) dtype=float32>, <tf.Tensor 'args_2:0'
shape=(None, 512, 256, 3) dtype=float32>]...
So, is there anything I did not consider? I read the documentation and found nothing about what gets considered as y_pred and what as y_true, nor about how to convert a dataset into a tensor smartly and without loading it all in memory.
My model is designed as such:
def my_model(input_shape):
width = input_shape[0]
height = input_shape[1]
inputs = tf.keras.Input(shape=input_shape)
# < a few more layers >
outputs = tf.image.resize(tf.nn.sigmoid(tf.slice(disp6, [0, 0, 0, 0], [-1, -1, -1, 2])), tf.Variable([width, height]))
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
And my dataset is built as such (in case 2, while in case 1 only the function read_stereo_pair_from_line() changes):
def read_img_from_file(file_name):
img = tf.io.read_file(file_name)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_png(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [args.input_width, args.input_height])
def read_stereo_pair_from_line(line):
split_line = tf.strings.split(line, ' ')
return read_img_from_file(split_line[0]), (read_img_from_file(split_line[0]), read_img_from_file(split_line[1]))
# Dataset loading
list_ds = tf.data.TextLineDataset('test/files.txt')
images_ds = list_ds.map(lambda x: read_stereo_pair_from_line(x))
images_ds = images_ds.batch(1)
Solved. I just needed to read the dataset as (left image, [left image, right image]) instead of (left image, (left image, right image)) i.e. make the second item a list and not a tuple. I can then access the images as input_r = y_true[:, 1, :, :] and input_l = y_true[:, 0, :, :]
After training my model, I tried to plot graph of the softmax output, but it resulted in the runtime error mentioned in the title.
Here is the following code snippet:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import helper
# Test out your network!
dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
# Plot the image and probabilities
helper.view_classify(img, ps, version='Fashion')
The problem is with this part (I guess).
img = images[1]
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
Problem: image you are loading is of dimension 28x28, however, the first index in input to the model is generally batch size. Since there is 1 image only, so you have to make the first dimension to be of size 1. To do that do img = img.view( (-1,) + img.shape) or img=img.unsqueeze(dim=0). Also, it seems that the first layer weight is 784 x 128. i.e the image should be converted to vector and fed to model. For that we do img=img.view(1, -1).
So, in total, you need to do
img = images[1]
img = img.unsqueeze(dim=0)
img=img.view(1, -1)
# TODO: Calculate the class probabilities (softmax) for img
ps = torch.exp(model(img))
or you can just use one command instead of two (unsqueeze is unnecessary)
img = images[1]
img=img.view(1, -1)
I am using Transfer learning for recognizing objects. I used trained VGG16 model as the base model and added my classifier on top of it using Keras. I then trained the model on my data, the model works well. I want to see the feature generated by the intermediate layers of the model for the given data. I used the following code for this purpose:
def ModeloutputAtthisLayer(model, layernme, imgnme, width, height):
layer_name = layernme
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
img = image.load_img(imgnme, target_size=(width, height))
imageArray = image.img_to_array(img)
image_batch = np.expand_dims(imageArray, axis=0)
processed_image = preprocess_input(image_batch.copy())
intermediate_output = intermediate_layer_model.predict(processed_image)
print("outshape of ", layernme, "is ", intermediate_output.shape)
In the code, I used np.expand_dims to add one extra dimension for the batch as the input matrix to the network should be of the form (batchsize, height, width, channels). This code works fine. The shape of the feature vector is 1, 224, 224, 64.
Now I wish to display this as image, for this I understand there is an additional dimension added as batch so I should remove it. Following this I used the following lines of the code:
imge = np.squeeze(intermediate_output, axis=0)
plt.imshow(imge)
However it throws an error:
"Invalid dimensions for image data"
I wonder how can I display the extracted feature vector as an image. Any suggestion please.
Your feature shape is (1,224,224,64), you cannot directly plot a 64 channel image. What you can do is plot the individual channels independently like following
imge = np.squeeze(intermediate_output, axis=0)
filters = imge.shape[2]
plt.figure(1, figsize=(32, 32)) # plot image of size (32x32)
n_columns = 8
n_rows = math.ceil(filters / n_columns) + 1
for i in range(filters):
plt.subplot(n_rows, n_columns, i+1)
plt.title('Filter ' + str(i))
plt.imshow(imge[:,:,i], interpolation="nearest", cmap="gray")
This will plot 64 images in 8 rows and 8 columns.
A possible way to go consists in combining the 64 channels into a single-channel image through a weighted sum like this:
weighted_imge = np.sum(imge*weights, axis=-1)
where weights is an array with 64 weighting coefficients.
If you wish to give all the channels the same weight you could simply compute the average:
weighted_imge = np.mean(imge, axis=-1)
Demo
import numpy as np
import matplotlib.pyplot as plt
intermediate_output = np.random.randint(size=(1, 224, 224, 64),
low=0, high=2**8, dtype=np.uint8)
imge = np.squeeze(intermediate_output, axis=0)
weights = np.random.random(size=(imge.shape[-1],))
weighted_imge = np.sum(imge*weights, axis=-1)
plt.imshow(weighted_imge)
plt.colorbar()
In [33]: intermediate_output.shape
Out[33]: (1, 224, 224, 64)
In [34]: imge.shape
Out[34]: (224, 224, 64)
In [35]: weights.shape
Out[35]: (64,)
In [36]: weighted_imge.shape
Out[36]: (224, 224)
I am new to mxnet. I just installed mxnet 1.0.0 and python 3.5 on a Ubuntu 14.04 machine with CUDA 8.0 and cudnn 7.0.5.
My code is given below. I am trying to store image data in an ndarray. (see https://github.com/ypwhs/DogBreed_gluon/blob/master/get_features_v3.ipynb for the original code)
-
X_224 = nd.zeros((n, 3, 224, 224))
X_299 = nd.zeros((n, 3, 299, 299))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
for i, (fname, breed) in tqdm(df.iterrows(), total=n):
img = cv2.imread('data/train/%s.jpg' % fname)
img_224 = ((cv2.resize(img, (224, 224))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
img_299 = ((cv2.resize(img, (299, 299))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
X_224[i] = nd.array(img_224) <-- I get error in this line
X_299[i] = nd.array(img_299)
Here is the error I get:
ValueError: Indexing NDArray with index=0 and type=class 'numpy.int64' is not supported.
I am assuming it has to with indexing a multi dimensional nd array. So I tried slicing - X_224[i:i+1] = .... but that gave me another error.
You could convert the type of the index from numpy.int64 to int; e.g. i = int(i) before trying to set the slice.
df.iterrows() returns tuples, where the type of the first element depends on the type of the dataframe index. df.iterrows() returned tuples of types (int,pandas.core.series.Series) when running the Github example, so no conversion was necessary for me (using Pandas 0.22).
Aside from this specific issue, you might want to take a look at Gluon Datasets and DataLoaders for this task. mxnet.gluon.data.vision.datasets.ImageFolderDataset can be used for loading images, and it accepts an image transformation function through the transform argument.
def transform_fn(data, label):
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
data = ((cv2.resize(data, (224, 224))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
return data, label
image_directory = os.path.join(data_dir, "train")
dataset = mx.gluon.data.vision.ImageFolderDataset(image_directory, transform=transform_fn)
data_loader = mx.gluon.data.DataLoader(dataset, batch_size=10, shuffle=True)
for data, label in data_loader:
...