I am trying to build a CNN and want to divide my input images into non-overlapping patches and then use it for training.
However, I am unsure how to combine the extraction of patches with the code below.
I believe a function like tf.image.extract_patches should do the trick but I am unsure how I can include it in the pipeline. It's important for me to use flow_from_directory as I have organised my dataset accordingly.
train_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(train_dir,target_size=(64,64),class_mode='categorical',batch_size=64)
I thought of using extract_patches_2d from scikit but it has two issues :
It gives random overlapping patches
I need to resave all images and again reorganize my dataset (same issue as tf.image.extract_patches unless included in pipeline)
Related
I am doing super resolution with resnet in keras and I have split my data into train and test (70-30) and from the test data 20% for validation .i am trying to read the data with datagen.flow_from_directory but its showing 0 images for 0 classes .The main issue is i dont have classes. I only have high resolution images and low resolution images. The high resolution images goes to output and the low resolution images goes to input. How can i load the data without separating them in classess
from keras.preprocessing.image import ImageDataGenerator
import os
train_dir = r'G:\\images\\train'
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_dir)
To resolve 0 images for 0 classes, notice that a common mistake is that the target folder you specify has no subdirectory. ImageDataGenerator splits data to classes, based on each subdirectory under the directory you specify as it's first argument. So, you should have at least one subdirectory under the target.
Furthermore, the generator should label them in order to feed them to your network. By default it uses categorical method as a 2D one-hot encoded labels. But if you want your labels in other ways, set class_mode argument. For example for autoencoders that inputs has no label, you should specify it as class_mode=input.
Base on the docs here, class_mode should be one of these:
categorical will be 2D one-hot encoded labels, (Default Mode)
binary will be 1D binary labels,
sparse will be 1D integer labels,
input will be images identical to input images (mainly used to work with
autoencoders).
None, no labels are returned (the generator will only yield batches of image data, which is useful to use with model.predict())
I followed a tutorial to make my first Convolutional Neural Network using Keras and I have a small question regarding the rescaling step.
So when we are importing the training set and test set, we create an instance of the tf.keras.preprocessing.image.ImageDataGenerator class and use it as:
train_datagen = ImageDataGenerator(rescale=1/255)
Along with some other augmentation parameters. My understanding is that we use the rescale parameter to normalize the pixel values of the images imported.
But when we load up a single image to run through the CNN, we write something like (code from keras docs):
image = tf.keras.preprocessing.image.load_img(image_path)
input_arr = keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
predictions = model.predict(input_arr)
My question is, I cannot see the single input image being rescaled anywhere. Is it being done implicitly, or is there no need to actually perform rescaling? If the latter, then why is it so?
Thanks!
The image should be normalized that it should be divided by 255, if it's done during the training. Network will not be able to interpret that.
Also, when we use test_datagen, we apply Rescaling by 1/255 for the predict generator.
Normalization, mean subtraction and std deviation needs to be done at the testing time, if that has been applied during the training stage.
I am trying to load data from jpeg files to train a convolution network. The images are large, with 24 million pixels however, so loading and using the full resolution is not practical.
To get the images to a more useful format I am trying to load each image, rescale it and then append it to a list. Once this is done, I can then convert the list into a numpy array and feed into the network for training as usual.
My problem is that my data set is very large and it takes about a second to rescale every image, which means it is not feasible to resize every image the way I have currently implemented this:
length_training_DF = 30000
for i in range(length_training_DF):
im = plt.imread(TRAIN_IM_DIR + trainDF.iloc[i]['image_name'] + '.jpg')
image = block_reduce(im, block_size=(10, 10, 1), func=np.max)
trainX.append(image)
I have also used the following:
length_training_DF = 30000
from keras.preprocessing import image
for i in range(50):
img = image.load_img(TRAIN_IM_DIR + trainDF.iloc[0]['image_name'] + '.jpg', target_size=(224, 224))
trainX.append(ima)
Is there any way to load these images more quickly into a format for training a network? I have thought about using a keras dataset, perhaps by using tf.keras.preprocessing.image_dataset_from_directory(), but the directory in which the image data is stored is not formatted correctly into folders containing the same targets as is required by this method.
The images are for a binary classification problem.
The usual way would be to write a preprocessing script that loads the large images, rescales them, applies other operations if needed, and then saves each class to a separate directory, as required by ImageDataGenerator.
There are at least three good reasons to do that:
Typically, you will run your training process dozens of time. You don't want to every time do the rescaling or e.g. auto-white balance.
ImageDataGenerator provides vital methods for augmenting your training data set.
It's a good generator out of the box. Likely you don't want to load entire data set into memory.
I have a folder (on my windows desktop) containing the images I want to use to build my deep learning classifier. I also have one .csv file which has the image number (for example img_1035) and the corresponding class label. How do I load the dataset with the labels into python/jupyter notebooks?
This is the link to the dataset on kaggle (https://www.kaggle.com/debdoot/bdrw).
I would preferably like to use PyTorch to do this but any other ways would also be highly appreciated.
Luckily, PyTorch has a convenient "ImageFolder" class that you can extend to create your own dataset.
Here's an example of a dataset that uses ImageFolder:
class MyDataset(torchvision.datasets.ImageFolder):
def __init__(self, train_folder_path='.', transform=None, target_transform=None):
super().__init__(train_folder_path, transform, target_transform)
# [ Some functions omitted ]
Then you load your set using PyTorch's "DataLoader".
Here's an example for a training set:
training_set = MyDataset(root_path, transform)
train_loader = torch.utils.data.DataLoader(training_set, batch_size=batch_size, shuffle=True)
Using the train loader you can get batches from your dataset. You can then use these batches to train / validate and so on:
batch = next(iter(train_loader))
images, labels = batch
Training is a rather involved process so I'm not entirely sure how deep you want to dive here. I hope this was a nudge in the right direction.
I have an imbalanced and small dataset which contains 4116 224x224x3 (RGB) aerial images. It's very likely that I will encounter the overfitting problem since the dataset is not big enough. Image preprocessing and data augmentation help to tackle this problem as explained below.
"Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images."
Deep Learning with Python by François Chollet, page 138-139, 5.2.5 Using data augmentation.
I've read Medium - Image Data Preprocessing for Neural Networks and examined Stanford's CS230 - Data Preprocessing and
CS231 - Data Preprocessing courses. It is highlighted once more in SO question and I understand that there is no "one fits all" solution. Here is what forced me to ask this question:
"No translation augmentation was used since we want to achieve high spatial resolution."
Reference: Researchgate - Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks
I know that I will use Keras - ImageDataGenerator Class, but don't know which techniques and what parameters to use for the semantic segmentation on small objects task. Could someone enlighten me? Thanks in advance. :)
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=20, # is a value in degrees (0–180)
width_shift_range=0.2, # is a range within which to randomly translate pictures horizontally.
height_shift_range=0.2, # is a range within which to randomly translate pictures vertically.
shear_range=0.2, # is for randomly applying shearing transformations.
zoom_range=0.2, # is for randomly zooming inside pictures.
horizontal_flip=True, # is for randomly flipping half the images horizontally
fill_mode='nearest', # is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift
featurewise_center=True,
featurewise_std_normalization=True)
datagen.fit(X_train)
The augmentation and preprocessing phases are always depending on the problem that you have. You have to think of all the possible augmentation which can enlarge your dataset. But the most important thing is, that you should not perform extreme augmentations, which makes new training samples in the way which can not happen in real examples. If you do not expect that the real examples will be horizontally flipped do not perform horizontal flip, since this will give your model false information. Think of all the possible changes that can happen in your input images and try to artificially produce new images from your existing one. You can use a lot of built-in functions from Keras. But you should be aware of each that it will not make new examples which are not likely to be present on the input of your model.
As you said, there is no "one fits all" solution, because everything is dependent on the data. Analyse the data and build everything with respect to it.
About the small objects - one direction which you should check are the loss functions which emphasise the impact of target volumes in comparison to the background. Look at the Dice Loss or Generalised Dice Loss.