mxnet ndarray indexing in python - python

I am new to mxnet. I just installed mxnet 1.0.0 and python 3.5 on a Ubuntu 14.04 machine with CUDA 8.0 and cudnn 7.0.5.
My code is given below. I am trying to store image data in an ndarray. (see https://github.com/ypwhs/DogBreed_gluon/blob/master/get_features_v3.ipynb for the original code)
-
X_224 = nd.zeros((n, 3, 224, 224))
X_299 = nd.zeros((n, 3, 299, 299))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
for i, (fname, breed) in tqdm(df.iterrows(), total=n):
img = cv2.imread('data/train/%s.jpg' % fname)
img_224 = ((cv2.resize(img, (224, 224))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
img_299 = ((cv2.resize(img, (299, 299))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
X_224[i] = nd.array(img_224) <-- I get error in this line
X_299[i] = nd.array(img_299)
Here is the error I get:
ValueError: Indexing NDArray with index=0 and type=class 'numpy.int64' is not supported.
I am assuming it has to with indexing a multi dimensional nd array. So I tried slicing - X_224[i:i+1] = .... but that gave me another error.

You could convert the type of the index from numpy.int64 to int; e.g. i = int(i) before trying to set the slice.
df.iterrows() returns tuples, where the type of the first element depends on the type of the dataframe index. df.iterrows() returned tuples of types (int,pandas.core.series.Series) when running the Github example, so no conversion was necessary for me (using Pandas 0.22).
Aside from this specific issue, you might want to take a look at Gluon Datasets and DataLoaders for this task. mxnet.gluon.data.vision.datasets.ImageFolderDataset can be used for loading images, and it accepts an image transformation function through the transform argument.
def transform_fn(data, label):
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
data = ((cv2.resize(data, (224, 224))[:, :, ::-1] / 255.0 - mean) / std).transpose((2, 0, 1))
return data, label
image_directory = os.path.join(data_dir, "train")
dataset = mx.gluon.data.vision.ImageFolderDataset(image_directory, transform=transform_fn)
data_loader = mx.gluon.data.DataLoader(dataset, batch_size=10, shuffle=True)
for data, label in data_loader:
...

Related

Neural Network 'numpy.float64' object cannot be interpreted as an integer

I was trying to use a simple NN on a matrix of chemical compositions of 24 elements a total of 270 analysis (25X270 including the label). However when I am running the gradient descent it shows the error 'numpy.float64' object cannot be interpreted as an integer.
data = np.array(data)
m, n = data.shape
np.random.shuffle(data)
data_dev = data[0:60].T
Y_dev = data_dev[0]
Y_dev = np.array(Y_dev, dtype=np.float_)
X_dev = data_dev[1:n]
X_dev = preprocessing.normalize(X_dev)
data_train = data[60:m].T
Y_train = data_train[0]
Y_train = np.array(Y_train, dtype=np.float_)
X_train = data_train[1:n]
X_train = preprocessing.normalize(X_train)
_,m_train = X_train.shape
enter image description here
enter image description here
This is the error that I got:
enter image description here
I´ve tried the code with the MNIST dataset (Changing the structure of the first W1 to 10,784 and dividing each pixel data by 255 to avoid any conflict) and it works well. I also check the dtype of the arrays and it is the same (float64) I don't know why is there a problem with my data.
What the type of Y array?
If it is a float64 then the problem is that Y.max() returns float value but you need integers when you define the sizes of arrays. so you need to change:
Y = np.array([1, 2, 3], np.float64)
one_hot_Y = np.zeros((Y.size, Y.max() + 1)) # will not work
to
one_hot_Y = np.zeros((Y.size, int(Y.max()) + 1))
Or convert the type of Y to int: Y.astype(int)

Fine-tuning SOTA video models on your own dataset - Sign Language

I am trying to implement a sign classifier using gluoncv API as part of my final year college project.
Data set: http://facundoq.github.io/datasets/lsa64/
I followed the Fine-tuning SOTA video models on your own dataset tutorial and fine-tuned.
Tutorial: https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html
i3d_resnet50_v1_custom
Accuracy Graph I3D
slowfast_4x16_resnet50_custom
Accuracy Graph Slow Fast
The plotted graph showing almost 90% accuracy but when I running on my inference I am getting miss classification even on the videos I used to train.
So I am stuck, could you have some guide to give anything will be help full.
Thank you
My data loader for I3D:
num_gpus = 1
ctx = [mx.gpu(i) for i in range(num_gpus)]
transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
per_device_batch_size = 5
num_workers = 0
batch_size = per_device_batch_size * num_gpus
train_dataset = VideoClsCustom(root=os.path.expanduser('DataSet/train/'),
setting=os.path.expanduser('DataSet/train/train.txt'),
train=True,
new_length=64,
new_step=2,
video_loader=True,
use_decord=True,
transform=transform_train)
print('Load %d training samples.' % len(train_dataset))
train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=num_workers)
Inference running:
from gluoncv.utils.filesystem import try_import_decord
decord = try_import_decord()
video_fname = 'DataSet/test/006_001_001.mp4'
vr = decord.VideoReader(video_fname)
frame_id_list = range(0, 64, 2)
video_data = vr.get_batch(frame_id_list).asnumpy()
clip_input = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]
transform_fn = video.VideoGroupValTransform(size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
clip_input = transform_fn(clip_input)
clip_input = np.stack(clip_input, axis=0)
clip_input = clip_input.reshape((-1,) + (32, 3, 224, 224))
clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))
print('Video data is readed and preprocessed.')
# Running the prediction
pred = net(nd.array(clip_input, ctx = mx.gpu(0)))
topK = 5
ind = nd.topk(pred, k=topK)[0].astype('int')
print('The input video clip is classified to be')
for i in range(topK):
print('\t[%s], with probability %.3f.'%
(CLASS_MAP[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()))
I found my bug, this is happening because of less augmentation so I changed my transform on both train data loader and inference like below now its working properly.
transform_train = transforms.Compose([
# Fix the input video frames size as 256×340 and randomly sample the cropping width and height from
# {256,224,192,168}. After that, resize the cropped regions to 224 × 224.
video.VideoMultiScaleCrop(size=(224, 224), scale_ratios=[1.0, 0.875, 0.75, 0.66]),
# Randomly flip the video frames horizontally
video.VideoRandomHorizontalFlip(),
# Transpose the video frames from height*width*num_channels to num_channels*height*width
# and map values from [0, 255] to [0,1]
video.VideoToTensor(),
# Normalize the video frames with mean and standard deviation calculated across all images
video.VideoNormalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

How to find the best value for mean and STD of Normalize in torchvision.transforms

I have started working with PyTorch and cannot figure it how I am supposed to find mean and std as the input parameters of normalise.
I have seen this
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)) #https://pytorch.org/vision/stable/transforms.html#
and in another example:
transformation = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5,0.5,0.5],std=[0.5,0.5,0.5])
]) #https://github.com/MicrosoftDocs/ml-basics/blob/master/challenges/05%20%20-%20Safari%20CNN%20Solution%20(PyTorch).ipynb
so, how am I supposed to know or get these values if I have a set of images?
Are these three parameters are related to R G B also?
suppose you have already X_train which is a list of numpy matrix eg. 32x32x3:
X_train = X_train / 255 #normalization of pixels
train_mean = X_train.reshape(3,-1).mean(axis=1)
train_std = X_train.reshape(3,-1).std(axis=1)
then you can pass this last 2 variables in your normalizer transformer :
transforms.Normalize(mean = train_mean ,std = train_std)
Like I mentioned in the comment, you can use ImageNet statistics for general domains. Otherwise, you can use this from PyTorch forms here to calculate mean and std for your own dataset which needs to be normalised.
class MyDataset(Dataset):
def __init__(self):
self.data = torch.randn(100, 3, 24, 24)
def __getitem__(self, index):
x = self.data[index]
return x
def __len__(self):
return len(self.data)
dataset = MyDataset()
loader = DataLoader(
dataset,
batch_size=10,
num_workers=1,
shuffle=False
)
mean = 0.
std = 0.
nb_samples = 0.
for data in loader:
batch_samples = data.size(0)
data = data.view(batch_samples, data.size(1), -1)
mean += data.mean(2).sum(0)
std += data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples

I get a tensor of 600 values instead of 3 values for mean and std of train_loader in PyTorch

I am trying to Normalize my images data and for that I need to find the mean and std for train_loader.
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in train_loader:
images, landmarks = data["image"], data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples, images.size(1), -1)
mean += torch.Tensor.float(images_data).mean(2).sum(0)
std += torch.Tensor.float(images_data).std(2).sum(0)
###mean += images_data.mean(2).sum(0)
###std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
the mean and std here are each a torch.Size([600])
When I tried (almost) same code on dataloader, it worked as expected:
# code from https://discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6?u=mona_jalal
mean = 0.0
std = 0.0
nb_samples = 0.0
for data in dataloader:
images, landmarks = data["image"], data["landmarks"]
batch_samples = images.size(0)
images_data = images.view(batch_samples, images.size(1), -1)
mean += images_data.mean(2).sum(0)
std += images_data.std(2).sum(0)
nb_samples += batch_samples
mean /= nb_samples
std /= nb_samples
and I got:
mean is: tensor([0.4192, 0.4195, 0.4195], dtype=torch.float64), std is: tensor([0.1182, 0.1184, 0.1186], dtype=torch.float64)
So my dataloader is:
class MothLandmarksDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = os.path.join(self.root_dir, self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx, 1:]
landmarks = np.array([landmarks])
landmarks = landmarks.astype('float').reshape(-1, 2)
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
root_dir='.',
transform=transforms.Compose(
[
Rescale(256),
RandomCrop(224),
ToTensor()
]
)
)
dataloader = DataLoader(transformed_dataset, batch_size=3,
shuffle=True, num_workers=4)
and train_loader is:
# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Test set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True, num_workers=4)
test_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)
Please let me know if more information is needed.
I basically need to get 3 values for mean of train_loader and 3 values for std of train_loader to use as args for Normalize.
images_data in dataloader is torch.Size([3, 3, 50176]) inside the loop and images_data in train_loader is torch.Size([8, 600, 2400])
First, the weird shape you get for your mean and std ([600]) is unsuprising, it is due to your data having the shape [8, 600, 800, 3]. Basically, the channel dimension is the last one here, so when you try to flatten your images with
# (N, 600, 800, 3) -> [view] -> (N, 600, 2400 = 800*3)
images_data = images.view(batch_samples, images.size(1), -1)
You actually perform a weird operation that fuses together the width and channel dimensions of your image which is now [8, 600, 2400]. Thus, applying
# (8, 600, 2400) -> [mean(2)] -> (8, 600) -> [sum(0)] -> (600)
data.mean(2).sum(0)
Creates a tensor of size [600] which is what you indeed get.
There are two quite simple solutions :
Either you start by permuting the dimensions to make the 2nd dimension the channel one :
batch_samples = images.size(0)
# (N, H, W, C) -> (N, C, H, W)
reordered = images.permute(0, 3, 1, 2)
# flatten image into (N, C, H*W)
images_data = reordered.view(batch_samples, reordered.size(1), -1)
# mean is now (C) = (3)
mean += images_data.mean(2).sum(0)
Or you changes the axis along which to apply mean and sum
batch_samples = images.size(0)
# flatten image into (N, H*W, C), careful this is not what you did
images_data = images.view(batch_samples, -1, images.size(1))
# mean is now (C) = (3)
mean += images_data.mean(1).sum(0)
Finally, why did dataloaderand trainloader behave differently ? Well I think it's because one is using dataset while the other is using transformedDataset. In TransformedDataset, you apply the toTensortransform which cast a PIL image into a torch tensor, and I think that pytorch is smart enough to permute your dimensions during this operation (and put the channels in the second dimension). In other word, your two datasets just do not yield images with identical format, they differ by a permutation of the axis.

load test data in pytorch

All is in the title, I just want to know, how can I load my own test data (image.jpg) in pytorch in order to test my CNN.
You need to feed images to net the same as in training: that is, you should apply exactly the same transformations to get similar results.
Assuming your net was trained using this code (or similar), you can see that an input image (for validation) undergoes the following transformations:
transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
Following torchvision.transforms docs you can see that an input image goes through:
Resizing to 256x256 pix
Cropping 224x224 rect from the center of the image
The image is converted from uint8 datatype to float in range [0, 1], and transposed to 3-by-224-by-224 array
The image is normalize by subtracting mean and dividing by std.
You can do all this manually to any image
import numpy as np
from PIL import Image
pil_img = Image.open('image.jpg').resize((256, 256), Image.BILINEAR) # read and resize
# center crop
w, h = pil_img.size
i = int(round((h - 224) / 2.))
j = int(round((w - 224) / 2.))
pil_img = pil_img.crop((j, i, j+224, i+224))
np_img = np.array(pil_img).astype(np.float32) / 255.
np_img = np.transpose(np_img, (2, 0, 1))
# normalize
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
for c in range(3):
np_img = (np_img[c, ...] - mean[c]) / std[c]
Once you have np_img ready for your model, you can run a feed forward pass:
pred = model(np_img[None, ...]) # note that we add a singleton leading dim for batch
thanks for your response. My problem was loading test data and I found a solution.
test_data = datasets.ImageFolder('root/test_cnn', transform=transform)
For example if I have 2 directories cat & dog (in the test_cnn directory) that contains images, the Object ImageFolder will assign automatically the classes cat and dog to my images.
During testing, I have just to drop the classes.

Categories

Resources