Keras CNN with varying image sizes - python

I'm trying to use the VOC2012 dataset for training a CNN. For my project, I require B&W data, so I extracted the R components. So far so good. The trouble is that the images are of different sizes, so I can't figure out how to pass it to the model. I compiled my model, and then created my mini-batches of size 32 as below (where X_train and Y_train are the paths to the files).
for x in X_train:
img = plt.imread(x)
img = img.reshape(*(img.shape), 1)
X.append(img)
for y in Y_train:
img = plt.imread(y)
img = img.reshape(*(img.shape), 1)
Y.append(img)
model.train_on_batch(np.array(X), np.array(Y))
However, I suspect that because the images are all of different sizes, the numpy array has a shape (32,) rather than (32, height, width, 1) as I'd expect. How do I take care of this?

According to some sources, one is indeed able to train at least some architectures with varying input sizes. (Quora, Cross Validated)
When it comes to generating an array of arrays varying in size, one might simply use a Python list of NumPy arrays, or an ndarray of type object to collect all the image data. Then in the training process, the Quora answer mentioned that only batch size 1 can be used, or one might clump several images together based on the sizes. Even padding with zeros could be used to make the images evenly sized, but I can't say much about the validity of that approach.
Best of luck in your research!
Example code for illustration:
# Generate 10 "images" with different sizes
images = [np.zeros((i+5, i+10)) for i in range(10)]
images = np.array([np.zeros((i+5, i+10)) for i in range(10)])
# Or an empty array to append to
images = np.array([], dtype=object)

Related

Python: How to feed large dataset to Keras Model? [duplicate]

This question already has an answer here:
Keras - data generator for datasets too large to fit into memory
(1 answer)
Closed 26 days ago.
Basically I have a training dataset with 100s of thousands of images with labels that can be used to train an ML model.
However (as expected) I can't simply create a numpy array to hold the images as follows:
all_images = np.zeros(shape=(500000, 256, 256, 3), dtype="uint8")
I don't suppose large companies simply have 'huge' ram to use huge datasets for training.
So how can I use the entire data set for training without having to hold the entire thing in memory before calling model.fit()?
Here's the entire loading function if needed:
(details about it below)
def load_images(images: list):
# Create empty np.ndarray to hold n images of size 256 x 256 with 3 channels (RGB)
resized_images = np.zeros(shape=(len(images), 256, 256, 3), dtype="uint8")
index = 0
for image in images:
print(index)
# Load image with cv2
img = cv2.imread(images)
# Resize image to 256 width, 256 height
img = cv2.resize(img, dsize=(256, 256))
# Add image to ndarray 'resized_images'
resized_images[index] = img
index += 1
return resized_images
The objective of this function is to resize the training images and load them into a single numpy array to be passed to the model in model.fit()
Note: I removed some np.transpose() calls to make the code more legible so this might not work if copied and pasted
So far I've tried saving the model and loading it up to continue the training without success (loading model doesn't retain all properties). But if this is the best way feel free to share your method.
Consider of using such wonderful thing as generator.
At first I would suggest you to pay attantion on tf.keras.preprocessing.image.ImageDataGenerator class and its method flow_from_directory().
In case you want to preprocess images in some unusual way I would recommend you to consider creating your own generator by inheriting from the tf.keras.utils.Sequence class like this:
class CustomImageDataGen(tf.keras.utils.Sequence)
This article may help.

What does the 1 in torch.Size([64, 1, 28, 28]) mean when I check a tensor shape?

I'm following this tutorial on towardsdatascience.com because I wanted to try the MNIST dataset using Pytorch since I've already done it using keras.
So in Step 2, knowing the dataset better, they print the trainloader's shape and it returns torch.Size([64, 1, 28, 28]). I understand that 64 is the number of images in that loader and that each one is a 28x28 image but what does the 1 mean exactly?
It simply defines an image of size 28x28 has 1 channel, which means it's a grayscale image. If it was a colored image then instead of 1 there would be 3 as the colored image has 3 channels such as RGB.
It's the number of channels in the input. In the MNIST data set the images are gray scale thus the shape of the image is [28, 28, 1]. Notice that pytorch set the first dimension to the channel dimension.
Of course once loaded as batches the total input shape is the one you are getting.
refer to the MNIST dataset link, where it states:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
In short ,
Its just the number of channels your 28x28 image has
This would suggest the number of batches present in the dataset. Think of it as groups, so we have 1 batch of 64 images, or you could change that, and say, have 2 batches of 32 images each. The batch size can usually influence the computational complexity for the model.
And, of course, depending on the used library (especially in the training/testing loop), the code would look slightly different if you would use just 1 batch, or X number of batches.
For example (the number of epochs/iterations = 50): imagine you are training a dataset of batch size = 1, in the training loop you would just write train the model epoch times. However, for batch size = x, you would have to loop for each epoch as well as for each batch/group.

How to configure a tf.data.Dataset for variable size images?

I'm setting up a image data pipeline on Tensorflow 2.1. I'm using a dataset with RGB images of variable shapes (h, w, 3) and I can't find a way to make it work. I get the following error when I call tf.data.Dataset.batch() :
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [256,384,3] and element 3 had shape [160,240,3]
I found the padded_batch method but I don't want my images to be padded to the same shape.
EDIT:
I think that I found a little workaround to this by using the function tf.data.experimental.dense_to_ragged_batch (which convert the dense tensor representation to a ragged one).
Unlike tf.data.Dataset.batch, the input elements to be batched may have different shapes, and each batch will be encoded as a tf.RaggedTensor
But then I have another problem. My dataset contains images and their corresponding labels. When I use the function like this:
ds = ds.map(
lambda x: tf.data.experimental.dense_to_ragged_batch(batch_size)
)
I get the following error because it tries to map the function to the entire dataset (thus to images and labels), which is not possible because it can only be applied to a 1 single tensor (not 2).
TypeError: <lambda>() takes 1 positional argument but 2 were given
Is there a way to specify which element of the two I want the transformation to be applied to ?
I just hit the same problem. The solution turned out to be loading the data as 2 datasets and then using dataet.zip() to merge them.
images = dataset.map(parse_images, num_parallel_calls=tf.data.experimental.AUTOTUNE)
images = dataset_images.apply(
tf.data.experimental.dense_to_ragged_batch(batch_size=batch_size, drop_remainder=True))
dataset_total_cost = dataset.map(get_total_cost)
dataset_total_cost = dataset_total_cost.batch(batch_size, drop_remainder=True)
dataset = dataset.zip((dataset_images, dataset_total_cost))
If you do not want to resize your images, you can only use a batch size of 1 and not bigger than that. Thus you can train your model one image at at time. The error you reported clearly says that you are using a batch size bigger than 1 and trying to put two images of different shape/size in a batch. You could either resize your images to a fixed shape (or pad your images), or use batch size of 1 as follows:
my_data = tf.data.Dataset(....) # with whatever arguments you use here
my_data = my_data.batch(1)

loading EMNIST-letters dataset

I have been trying to find a way to load the EMNIST-letters dataset but without much success. I have found interesting stuff in the structure and can't wrap my head around what is happening. Here is what I mean:
I downloaded the .mat format in here
I can load the data using
import scipy.io
mat = scipy.io.loadmat('letter_data.mat') # renamed for conveniance
it is a dictionnary with the keys as follow:
dict_keys(['__header__', '__version__', '__globals__', 'dataset'])
the only key with interest is dataset, which I havent been able to gather data from. printing the shape of it give this:
>>>print(mat['dataset'].shape)
(1, 1)
I dug deeper and deeper to find a shape that looks somewhat like a real dataset and came across this:
>>>print(mat['dataset'][0][0][0][0][0][0].shape)
(124800, 784)
which is exactly what I wanted but I cant find the labels nor the test data, I tried many things but cant seem to understand the structure of this dataset.
If someone could tell me what is going on with this I would appreciate it
Because of the way the dataset is structured, the array of image arrays can be accessed with mat['dataset'][0][0][0][0][0][0] and the array of label arrays with mat['dataset'][0][0][0][0][0][1]. For instance, print(mat['dataset'][0][0][0][0][0][0][0]) will print out the pixel values of the first image, and print(mat['dataset'][0][0][0][0][0][1][0]) will print the first image's label.
For a less...convoluted dataset, I'd actually recommend using the CSV version of the EMNIST dataset on Kaggle: https://www.kaggle.com/crawford/emnist, where each row is a separate image, there are 785 columns where the first column = class_label and each column after represents one pixel value (784 total for a 28 x 28 image).
#Josh Payne's answer is correct, but I'll expand on it for those who want to use the .mat file with an emphasis on typical data splits.
The data itself has already been split up in to a training and test set. Here's how I accessed the data:
from scipy import io as sio
mat = sio.loadmat('emnist-letters.mat')
data = mat['dataset']
X_train = data['train'][0,0]['images'][0,0]
y_train = data['train'][0,0]['labels'][0,0]
X_test = data['test'][0,0]['images'][0,0]
y_test = data['test'][0,0]['labels'][0,0]
There is an additional field 'writers' (e.g. data['train'][0,0]['writers'][0,0]) that distinguishes the original sample writer. Finally, there is another field data['mapping'], but I'm not sure what it is mapping the digits to.
In addition, in Secion II D, the EMNIST paper states that "the last portion of the training set, equal in size to the testing set, is set aside as a validation set". Strangely, the .mat file training/testing size does not match the number listed in Table II, but it does match the size in Fig. 2.
val_start = X_train.shape[0] - X_test.shape[0]
X_val = X_train[val_start:X_train.shape[0],:]
y_val = y_train[val_start:X_train.shape[0]]
X_train = X_train[0:val_start,:]
y_train = y_train[0:val_start]
If you don't want a validation set it is fine to leave these samples in the training set.
Also, if you would like to reshape the data into 2D, 28x28 sized images instead of a 1D 784 array, to get the correct image orientation you'll need to do a numpy reshape using Fortran ordering (Matlab uses column-major ordering, just like Fortran. reference). e.g. -
X_train = X_train.reshape( (X_train.shape[0], 28, 28), order='F')
An alternative solution is to use the EMNIST python package. (Full details at https://pypi.org/project/emnist/)
This lets you pip install emnist in your environment then import the datasets (they will download when you run the program for the first time).
Example from the site:
>>> from emnist import extract_training_samples
>>> images, labels = extract_training_samples('digits')
>>> images.shape
(240000, 28, 28)
>>> labels.shape
(240000,)
You can also list the datasets
>>> from emnist import list_datasets
>>> list_datasets()
['balanced', 'byclass', 'bymerge', 'digits', 'letters', 'mnist']
And replace 'digits' in the first example with your choice.
This gives you all the data in numpy arrays which I have found makes things easy to work with.
I suggest downloading the 'Binary format as the original MNIST dataset' from the Yann LeCun website.
Unzip the downloaded File and then with Python:
import idx2numpy
X_train = idx2numpy.convert_from_file('./emnist-letters-train-images-idx3-ubyte')
y_train = idx2numpy.convert_from_file('./emnist-letters-train-labels-idx1-ubyte')
X_test = idx2numpy.convert_from_file('./emnist-letters-test-images-idx3-ubyte')
y_test = idx2numpy.convert_from_file('./emnist-letters-test-labels-idx1-ubyte')

Save Audio Features extracted using Librosa in a multichannel Numpy array

I am trying to extract features from audio files using Librosa, to feed to a CNN as Numpy arrays.
Currently i save a single feature at a time to feed into the CNN. I save two dimensional (single-channel) log-scaled mel-spectrogram features in Python using Librosa:
def build_features():
y, sr = librosa.load("audio.wav")
mel = librosa.feature.melspectrogram(
n_fft=4096,
n_mels=128, #Mel-bins
hop_length=2048,
)
logamplitude = librosa.amplitude_to_db
logspec = logamplitude(mel, ref=1.0)[np.newaxis, :, :, np.newaxis]
This gives the shape (1,128,323,1).
I would like to add another feature, let's say a tempogram. I can do this, using the same code, but replacing melspectrogram to tempogram', and setting the window length to 128.
This gives me a tempogram shape of (1,128,323,1).
Now i would like to "stack" these 2 feature layers, into a multi-channel numpy object, that i can feed into a CNN in Keras.
How should i code this?
EDIT:
Think I figured it out, using np.vstack()

Categories

Resources