Fine-tuning SOTA video models on your own dataset - Sign Language

Fine-tuning SOTA video models on your own dataset - Sign Language - python

I am trying to implement a sign classifier using gluoncv API as part of my final year college project.
Data set: http://facundoq.github.io/datasets/lsa64/
I followed the Fine-tuning SOTA video models on your own dataset tutorial and fine-tuned.
Tutorial: https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html
i3d_resnet50_v1_custom
Accuracy Graph I3D
slowfast_4x16_resnet50_custom
Accuracy Graph Slow Fast
The plotted graph showing almost 90% accuracy but when I running on my inference I am getting miss classification even on the videos I used to train.
So I am stuck, could you have some guide to give anything will be help full.
Thank you
My data loader for I3D:
num_gpus = 1
ctx = [mx.gpu(i) for i in range(num_gpus)]
transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
per_device_batch_size = 5
num_workers = 0
batch_size = per_device_batch_size * num_gpus
train_dataset = VideoClsCustom(root=os.path.expanduser('DataSet/train/'),
setting=os.path.expanduser('DataSet/train/train.txt'),
train=True,
new_length=64,
new_step=2,
video_loader=True,
use_decord=True,
transform=transform_train)
print('Load %d training samples.' % len(train_dataset))
train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=num_workers)
Inference running:
from gluoncv.utils.filesystem import try_import_decord
decord = try_import_decord()
video_fname = 'DataSet/test/006_001_001.mp4'
vr = decord.VideoReader(video_fname)
frame_id_list = range(0, 64, 2)
video_data = vr.get_batch(frame_id_list).asnumpy()
clip_input = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]
transform_fn = video.VideoGroupValTransform(size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
clip_input = transform_fn(clip_input)
clip_input = np.stack(clip_input, axis=0)
clip_input = clip_input.reshape((-1,) + (32, 3, 224, 224))
clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))
print('Video data is readed and preprocessed.')
# Running the prediction
pred = net(nd.array(clip_input, ctx = mx.gpu(0)))
topK = 5
ind = nd.topk(pred, k=topK)[0].astype('int')
print('The input video clip is classified to be')
for i in range(topK):
print('\t[%s], with probability %.3f.'%
(CLASS_MAP[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()))

I found my bug, this is happening because of less augmentation so I changed my transform on both train data loader and inference like below now its working properly.
transform_train = transforms.Compose([
# Fix the input video frames size as 256×340 and randomly sample the cropping width and height from
# {256,224,192,168}. After that, resize the cropped regions to 224 × 224.
video.VideoMultiScaleCrop(size=(224, 224), scale_ratios=[1.0, 0.875, 0.75, 0.66]),
# Randomly flip the video frames horizontally
video.VideoRandomHorizontalFlip(),
# Transpose the video frames from height*width*num_channels to num_channels*height*width
# and map values from [0, 255] to [0,1]
video.VideoToTensor(),
# Normalize the video frames with mean and standard deviation calculated across all images
video.VideoNormalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Related

How to resize image tensors

The following is my code where I'm converting every image to PIL and then turning them into Pytorch tensors:
transform = transforms.Compose([transforms.PILToTensor()])
# choose the training and test datasets
train_data = os.listdir('data/training/')
testing_data = os.listdir('data/testing/')
train_tensors = []
test_tensors = []
for train_image in train_data:
img = Image.open('data/training/' + train_image)
train_tensors.append(transform(img))
for test_image in testing_data:
img = Image.open('data/testing/' + test_image)
test_tensors.append(transform(img))
# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(testing_data))
batch_size = 20
train_loader = DataLoader(train_tensors, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_tensors, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['checked', 'unchecked', 'other']
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
However, I am getting this error:
RuntimeError: stack expects each tensor to be equal size, but got [4, 66, 268] at entry 0 and [4, 88, 160] at entry 1
This is because my images are not resized prior to PIL -> Tensor. What is the correct way of resizing data images?

Try to utilize ImageFolder from torchvision, and assuming that images have diff size, you can use CenterCrop or RandomResizedCrop depending on your task. Check the Full list.
Here is an example:
train_dir = "data/training/"
train_dataset = datasets.ImageFolder(
train_dir,
transforms.Compose([
transforms.RandomResizedCrop(img_size), # image size int or tuple
# Add more transforms here
transforms.ToTensor(), # convert to tensor at the end
]))
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

Pytorch and data augmentation: how to augmentate data with blur, rotations, etc

I want to do some data augmentation with Pytorch, but i don't know the libraries very well:
I tried this:
def gaussian_blur(img):
image = np.array(img)
image_blur = cv2.GaussianBlur(image,(65,65),10)
new_image = image_blur
im = Image.fromarray(new_image)
return im
data_transforms = {
'train': transforms.Compose([
transforms.RandomRotation([-8,+8]),
transforms.Lambda(gaussian_blur),
transforms.ColorJitter(brightness=0, contrast=0.4, saturation=0, hue=0),
transforms.Compose([transforms.Lambda(lambda x : x + torch.randn_like(x))]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.RandomRotation([-8,+8]),
transforms.Lambda(gaussian_blur),
transforms.ColorJitter(brightness=0, contrast=0.4, saturation=0, hue=0),
transforms.Compose([transforms.Lambda(lambda x : x + torch.randn_like(x))]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
Because the effects i want to do are: gaussian blur/rotation/contrast/gamma+random noise
But i have errors considering several aspects, like the size of the images doesn't match.
Any suggestions?

If you want to apply Gausian blur there is also already a pytorch class for:
torchvision.transforms.GaussianBlur(kernel_size, sigma=(0.1, 2.0))

If input images are of different sizes, you have different options, depending on your project. For example, you can just resize your image using transforms.Resize((w, h)) or transforms.CenterCrop((w, h)). There are several options for resizing your images so all of them have the same size, check documentation.
Also, you can create your own transforms instead of using Lambda. You could do something like this
def GaussianBlur(torch.nn.Module):
def __init__(self, kernel_size, std_dev):
self.kernel_size = kernel_size
self.std_dev = std_dev
def forward(self, img):
image = np.array(img)
image_blur = cv2.GaussianBlur(image, self.kernel_size, self.std_dev)
return Image.fromarray(image_blur)
data_transforms = {
'train': transforms.Compose([
transforms.CenterCrop(416),
transforms.RandomRotation([-8,+8]),
GaussianBlur((65, 65), 10),
transforms.ColorJitter(brightness=0, contrast=0.4, saturation=0, hue=0),
transforms.ToTensor(),
transforms.Lambda(lambda x : x + torch.randn_like(x)),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.CenterCrop(416),
transforms.RandomRotation([-8,+8]),
GaussianBlur((65, 65), 10),
transforms.ColorJitter(brightness=0, contrast=0.4, saturation=0, hue=0),
transforms.ToTensor(),
transforms.Lambda(lambda x : x + torch.randn_like(x)),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
I removed a useless Compose inside the Compose, and in this example used a CenterCrop for getting a 416 x 416 image. If you have other errors, it might help to post them.

How to load multi-image input in PyTorch?

I have a dataset with 2 RGB images per data sample (6 channels). How to read such a dataset in PyTorch?
For one RGB image I used:
data_transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor()
])
mel_dataset_train = datasets.ImageFolder(root='./ftrain',
transform=data_transform)
train_sampler = torch.utils.data.distributed.DistributedSampler(mel_dataset_train)
dataset_loader_train = torch.utils.data.DataLoader(mel_dataset_train,
batch_size=64, shuffle=True, sampler=train_sampler,
num_workers=config.workers)
Yet I do not see how to modify it to read two images per data sample instead of one.

flask image classification ML model API implementation

I try to implement a API for pre-trained resnet(machine learning model). so server can accepts a single valid image file in the request to be analyzed. Returns the output from running the image against the model.
I'm wondering what's the general structure of my api look like. So far I have
app
api
init.py (for blueprint)
errors.py (for exceptions)
main
resnet18.py (for actual model and pic classification)
In my resnet18.py:
import torchvision.models as models
import torch
from torchvision import transforms
resnet18 = models.resnet18(pretrained=True)
transform = transforms.Compose([
transforms.Resize(256), # resize the image to 256*256
transforms.CenterCrop(224), # crop the image to 224*224 pixels about the center
transforms.ToTensor(), # convert the image to PyTorch Tensor data type
transforms.Normalize(mean=[0.485, 0.456, 0.406], # Normalize the image by setting its mean and standard deviation to the specified values
std=[0.229, 0.224, 0.225])
])
from PIL import Image
img = Image.open('dog.jpg')
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)
out = resnet18(batch_t)
with open('imagenet_classes.txt') as f:
classes = [line.strip() for line in f.readlines()]
_, index = torch.max(out, 1)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
# print(classes[index[0]], percentage[index[0]].item())
most_likely = classes[index[0]]
confidence = percentage[index[0]].item()
# _, indices = torch.sort(out, descending=True)
# [(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
def to_json():
json_prediction = {
'most_likely': most_likely,
'confidence': confidence
}
return json_prediction
So i want to call this fill once i upload a image. I'm wondering how can i make it more elegant. I never done anything like this.

load test data in pytorch

All is in the title, I just want to know, how can I load my own test data (image.jpg) in pytorch in order to test my CNN.

You need to feed images to net the same as in training: that is, you should apply exactly the same transformations to get similar results.
Assuming your net was trained using this code (or similar), you can see that an input image (for validation) undergoes the following transformations:
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
Following torchvision.transforms docs you can see that an input image goes through:
Resizing to 256x256 pix
Cropping 224x224 rect from the center of the image
The image is converted from uint8 datatype to float in range [0, 1], and transposed to 3-by-224-by-224 array
The image is normalize by subtracting mean and dividing by std.
You can do all this manually to any image
import numpy as np
from PIL import Image
pil_img = Image.open('image.jpg').resize((256, 256), Image.BILINEAR) # read and resize
# center crop
w, h = pil_img.size
i = int(round((h - 224) / 2.))
j = int(round((w - 224) / 2.))
pil_img = pil_img.crop((j, i, j+224, i+224))
np_img = np.array(pil_img).astype(np.float32) / 255.
np_img = np.transpose(np_img, (2, 0, 1))
# normalize
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
for c in range(3):
np_img = (np_img[c, ...] - mean[c]) / std[c]
Once you have np_img ready for your model, you can run a feed forward pass:
pred = model(np_img[None, ...]) # note that we add a singleton leading dim for batch

thanks for your response. My problem was loading test data and I found a solution.
test_data = datasets.ImageFolder('root/test_cnn', transform=transform)
For example if I have 2 directories cat & dog (in the test_cnn directory) that contains images, the Object ImageFolder will assign automatically the classes cat and dog to my images.
During testing, I have just to drop the classes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fine-tuning SOTA video models on your own dataset - Sign Language - python

Related

How to resize image tensors

Pytorch and data augmentation: how to augmentate data with blur, rotations, etc

How to load multi-image input in PyTorch?

flask image classification ML model API implementation

load test data in pytorch

Categories

Resources