flask image classification ML model API implementation

flask image classification ML model API implementation - python

I try to implement a API for pre-trained resnet(machine learning model). so server can accepts a single valid image file in the request to be analyzed. Returns the output from running the image against the model.
I'm wondering what's the general structure of my api look like. So far I have
app
api
init.py (for blueprint)
errors.py (for exceptions)
main
resnet18.py (for actual model and pic classification)
In my resnet18.py:
import torchvision.models as models
import torch
from torchvision import transforms
resnet18 = models.resnet18(pretrained=True)
transform = transforms.Compose([
transforms.Resize(256), # resize the image to 256*256
transforms.CenterCrop(224), # crop the image to 224*224 pixels about the center
transforms.ToTensor(), # convert the image to PyTorch Tensor data type
transforms.Normalize(mean=[0.485, 0.456, 0.406], # Normalize the image by setting its mean and standard deviation to the specified values
std=[0.229, 0.224, 0.225])
])
from PIL import Image
img = Image.open('dog.jpg')
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)
out = resnet18(batch_t)
with open('imagenet_classes.txt') as f:
classes = [line.strip() for line in f.readlines()]
_, index = torch.max(out, 1)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
# print(classes[index[0]], percentage[index[0]].item())
most_likely = classes[index[0]]
confidence = percentage[index[0]].item()
# _, indices = torch.sort(out, descending=True)
# [(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
def to_json():
json_prediction = {
'most_likely': most_likely,
'confidence': confidence
}
return json_prediction
So i want to call this fill once i upload a image. I'm wondering how can i make it more elegant. I never done anything like this.

Related

RuntimeError: number of dims don't match in permute. output = output.squeeze().permute(1, 2, 0)

I created an ESRGAN model and then tried to implement it using the following code:-
import torch
from torchvision import transforms
from PIL import Image
from esrgan_model import ESRGAN
print("imports done")
# load the pretrained ESRGAN model
model = ESRGAN()
print('model imported')
# define the image preprocessing steps
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
print("preprocessing steps defined")
# load the image
img = Image.open("example.jpg")
print("loading the image")
# preprocess the image
img = transform(img).unsqueeze(0)
print('Preprocessing the image...')
# run the image through the model
output = model(img)
print('Running the image through the model...')
# save the output image
output = output.squeeze().permute(1, 2, 0)
output = (output * 0.5) + 0.5
output = transforms.ToPILImage()(output)
output.save("output.jpg")
But when I run the code, i get the following error in saving the output image with line:-# save the output image output = output.squeeze().permute(1, 2, 0)
the error is:-
output = output.squeeze().permute(1, 2, 0)
RuntimeError: number of dims don't match in permute
I tried looking up in the documentation of the method and similar errors but couldn't find any solution. How can I resolve this error?

Only Using Colours From A Specific Part of a Picture For Style Transfer

I've got a neural style transfer model. I'm currently working on trying to use different parts of an image to transfer different pictures. I'm wondering how can I get the model to just use the colours present in an image. Below is an example:
The picture above is the style image that I have gotten from using thresholding along with the original image. Now the transferred picture is below:
Obviously it's transferred some of the black parts of the image but I only want the non black colours present to be transferred. Below is my code for my model:
import torch
import torch.nn as nn
import torch.optim as optim
from PIL import Image
import torchvision.transforms as transforms
import torchvision.models as models
from torchvision.utils import save_image
class VGG(nn.Module):
def __init__(self):
super(VGG, self).__init__()
self.chosen_features = ["0", "5", "10", "19", "28"]
self.model = models.vgg19(pretrained=True).features[:29]
def forward(self, x):
# Store relevant features
features = []
for layer_num, layer in enumerate(self.model):
x = layer(x)
if str(layer_num) in self.chosen_features:
features.append(x)
return features
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_image(image_name):
image = Image.open(image_name)
image = loader(image).unsqueeze(0)
return image.to(device)
imsize = 384
loader = transforms.Compose(
[
transforms.Resize((imsize, imsize)),
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]
)
original_img = load_image("Content Image.jpg")
style_img = load_image("Adaptive Image 2.jpg")
# initialized generated as white noise or clone of original image.
# Clone seemed to work better for me.
generated = original_img.clone().requires_grad_(True)
# generated = load_image("20epoctom.png")
model = VGG().to(device).eval()
# Hyperparameters
total_steps = 10000
learning_rate = 0.001
alpha = 1
beta = 0.01
optimizer = optim.Adam([generated], lr=learning_rate)
for step in range(total_steps):
# Obtain the convolution features in specifically chosen layers
generated_features = model(generated)
original_img_features = model(original_img)
style_features = model(style_img)
# Loss is 0 initially
style_loss = original_loss = 0
# iterate through all the features for the chosen layers
for gen_feature, orig_feature, style_feature in zip(
generated_features, original_img_features, style_features
):
# batch_size will just be 1
batch_size, channel, height, width = gen_feature.shape
original_loss += torch.mean((gen_feature - orig_feature) ** 2)
# Compute Gram Matrix of generated
G = gen_feature.view(channel, height * width).mm(
gen_feature.view(channel, height * width).t()
)
# Compute Gram Matrix of Style
A = style_feature.view(channel, height * width).mm(
style_feature.view(channel, height * width).t()
)
style_loss += torch.mean((G - A) ** 2)
total_loss = alpha * original_loss + beta * style_loss
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
if step % 500 == 0:
print(total_loss)
save_image(generated, f"Generated Pictures/{step//500} Iterations Generated Picture.png")
Any idea of where to potentially go as well would be appreciated!

If you want some ways to preserve non black color in your style transfer model I suggest checking out the github repo here. It has .ipnyb notebooks with entire training pipelines, model weights, a good readme, etc. to reference. According to their readme, they try to implement this paper on preserving color in neural artistic style transfer which should help you. You can also referecne other repos and run some of them on collabs in this paper of codes repo here though I do suggest looking at the first repo first.
If you want to color transfer outside your styles transfer model and rather have two images that transfer color with the help of some functions in a linrary, then I recommend looking at this tutorial
Sarthak Jain

Fine-tuning SOTA video models on your own dataset - Sign Language

I am trying to implement a sign classifier using gluoncv API as part of my final year college project.
Data set: http://facundoq.github.io/datasets/lsa64/
I followed the Fine-tuning SOTA video models on your own dataset tutorial and fine-tuned.
Tutorial: https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html
i3d_resnet50_v1_custom
Accuracy Graph I3D
slowfast_4x16_resnet50_custom
Accuracy Graph Slow Fast
The plotted graph showing almost 90% accuracy but when I running on my inference I am getting miss classification even on the videos I used to train.
So I am stuck, could you have some guide to give anything will be help full.
Thank you
My data loader for I3D:
num_gpus = 1
ctx = [mx.gpu(i) for i in range(num_gpus)]
transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
per_device_batch_size = 5
num_workers = 0
batch_size = per_device_batch_size * num_gpus
train_dataset = VideoClsCustom(root=os.path.expanduser('DataSet/train/'),
setting=os.path.expanduser('DataSet/train/train.txt'),
train=True,
new_length=64,
new_step=2,
video_loader=True,
use_decord=True,
transform=transform_train)
print('Load %d training samples.' % len(train_dataset))
train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=num_workers)
Inference running:
from gluoncv.utils.filesystem import try_import_decord
decord = try_import_decord()
video_fname = 'DataSet/test/006_001_001.mp4'
vr = decord.VideoReader(video_fname)
frame_id_list = range(0, 64, 2)
video_data = vr.get_batch(frame_id_list).asnumpy()
clip_input = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]
transform_fn = video.VideoGroupValTransform(size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
clip_input = transform_fn(clip_input)
clip_input = np.stack(clip_input, axis=0)
clip_input = clip_input.reshape((-1,) + (32, 3, 224, 224))
clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))
print('Video data is readed and preprocessed.')
# Running the prediction
pred = net(nd.array(clip_input, ctx = mx.gpu(0)))
topK = 5
ind = nd.topk(pred, k=topK)[0].astype('int')
print('The input video clip is classified to be')
for i in range(topK):
print('\t[%s], with probability %.3f.'%
(CLASS_MAP[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()))

I found my bug, this is happening because of less augmentation so I changed my transform on both train data loader and inference like below now its working properly.
transform_train = transforms.Compose([
# Fix the input video frames size as 256×340 and randomly sample the cropping width and height from
# {256,224,192,168}. After that, resize the cropped regions to 224 × 224.
video.VideoMultiScaleCrop(size=(224, 224), scale_ratios=[1.0, 0.875, 0.75, 0.66]),
# Randomly flip the video frames horizontally
video.VideoRandomHorizontalFlip(),
# Transpose the video frames from height*width*num_channels to num_channels*height*width
# and map values from [0, 255] to [0,1]
video.VideoToTensor(),
# Normalize the video frames with mean and standard deviation calculated across all images
video.VideoNormalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

load test data in pytorch

All is in the title, I just want to know, how can I load my own test data (image.jpg) in pytorch in order to test my CNN.

You need to feed images to net the same as in training: that is, you should apply exactly the same transformations to get similar results.
Assuming your net was trained using this code (or similar), you can see that an input image (for validation) undergoes the following transformations:
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
Following torchvision.transforms docs you can see that an input image goes through:
Resizing to 256x256 pix
Cropping 224x224 rect from the center of the image
The image is converted from uint8 datatype to float in range [0, 1], and transposed to 3-by-224-by-224 array
The image is normalize by subtracting mean and dividing by std.
You can do all this manually to any image
import numpy as np
from PIL import Image
pil_img = Image.open('image.jpg').resize((256, 256), Image.BILINEAR) # read and resize
# center crop
w, h = pil_img.size
i = int(round((h - 224) / 2.))
j = int(round((w - 224) / 2.))
pil_img = pil_img.crop((j, i, j+224, i+224))
np_img = np.array(pil_img).astype(np.float32) / 255.
np_img = np.transpose(np_img, (2, 0, 1))
# normalize
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
for c in range(3):
np_img = (np_img[c, ...] - mean[c]) / std[c]
Once you have np_img ready for your model, you can run a feed forward pass:
pred = model(np_img[None, ...]) # note that we add a singleton leading dim for batch

thanks for your response. My problem was loading test data and I found a solution.
test_data = datasets.ImageFolder('root/test_cnn', transform=transform)
For example if I have 2 directories cat & dog (in the test_cnn directory) that contains images, the Object ImageFolder will assign automatically the classes cat and dog to my images.
During testing, I have just to drop the classes.

How to get the output from a specific layer from a PyTorch model?

How to extract the features from a specific layer from a pre-trained PyTorch model (such as ResNet or VGG), without doing a forward pass again?

New answer
Edit: there's a new feature in torchvision v0.11.0 that allows extracting features.
For example, if you wanna extract features from the layer layer4.2.relu_2, you can do like:
import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor
x = torch.rand(1, 3, 224, 224)
model = resnet50()
return_nodes = {
"layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)
Old answer
You can register a forward hook on the specific layer you want. Something like:
def some_specific_layer_hook(module, input_, output):
pass # the value is in 'output'
model.some_specific_layer.register_forward_hook(some_specific_layer_hook)
model(some_input)
For example, to obtain the res5c output in ResNet, you may want to use a nonlocal variable (or global in Python 2):
res5c_output = None
def res5c_hook(module, input_, output):
nonlocal res5c_output
res5c_output = output
resnet.layer4.register_forward_hook(res5c_hook)
resnet(some_input)
# Then, use `res5c_output`.

The accepted answer is very helpful! I'm posting a complete example here (using a registered hook as described by #bryant1410) for the lazy ones looking for a working solution:
import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image
def get_feat_vector(path_img, model):
'''
Input:
path_img: string, /path/to/image
model: a pretrained torch model
Output:
my_output: torch.tensor, output of avgpool layer
'''
input_image = Image.open(path_img)
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
with torch.no_grad():
my_output = None
def my_hook(module_, input_, output_):
nonlocal my_output
my_output = output_
a_hook = model.avgpool.register_forward_hook(my_hook)
model(input_batch)
a_hook.remove()
return my_output
There you have your features extraction function, simply call it using the snippet below to obtain features from resnet18.avgpool layer
model = models.resnet18(pretrained=True)
model.eval()
path_ = '/path/to/image'
my_feature = get_feat_vector(path_, model)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

flask image classification ML model API implementation - python

Related

RuntimeError: number of dims don't match in permute. output = output.squeeze().permute(1, 2, 0)

Only Using Colours From A Specific Part of a Picture For Style Transfer

Fine-tuning SOTA video models on your own dataset - Sign Language

load test data in pytorch

How to get the output from a specific layer from a PyTorch model?

Categories

Resources