I followed a tutorial to make my first Convolutional Neural Network using Keras and I have a small question regarding the rescaling step.
So when we are importing the training set and test set, we create an instance of the tf.keras.preprocessing.image.ImageDataGenerator class and use it as:
train_datagen = ImageDataGenerator(rescale=1/255)
Along with some other augmentation parameters. My understanding is that we use the rescale parameter to normalize the pixel values of the images imported.
But when we load up a single image to run through the CNN, we write something like (code from keras docs):
image = tf.keras.preprocessing.image.load_img(image_path)
input_arr = keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
predictions = model.predict(input_arr)
My question is, I cannot see the single input image being rescaled anywhere. Is it being done implicitly, or is there no need to actually perform rescaling? If the latter, then why is it so?
Thanks!
The image should be normalized that it should be divided by 255, if it's done during the training. Network will not be able to interpret that.
Also, when we use test_datagen, we apply Rescaling by 1/255 for the predict generator.
Normalization, mean subtraction and std deviation needs to be done at the testing time, if that has been applied during the training stage.
Related
I have trained my CNN in Tensorflow using MNIST data set; when I tested it, it worked very well using the test data. Even, to prove my model in a better way, I made another set taking images from train and test set randomly. All the images that I took from those set, at the same time, I deleted and I didn't give them to my model. It worked very well too, but with a dowloaded image from Google, it doesn't classify well, so my question is: should I have to apply any filter to that image before I give it to the prediction part?
I resized the image and converted it to gray scale before.
MNIST is an easy dataset. Your model (CNN) structure may do quite well for MNIST, but there is no guarantee that it does well for more complex images too. You can add some more layers and check different activation functions (like Relu, Elu, etc.). Normalizing your image pixel values for small values like between -1 and 1 may help too.
I have an imbalanced and small dataset which contains 4116 224x224x3 (RGB) aerial images. It's very likely that I will encounter the overfitting problem since the dataset is not big enough. Image preprocessing and data augmentation help to tackle this problem as explained below.
"Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images."
Deep Learning with Python by François Chollet, page 138-139, 5.2.5 Using data augmentation.
I've read Medium - Image Data Preprocessing for Neural Networks and examined Stanford's CS230 - Data Preprocessing and
CS231 - Data Preprocessing courses. It is highlighted once more in SO question and I understand that there is no "one fits all" solution. Here is what forced me to ask this question:
"No translation augmentation was used since we want to achieve high spatial resolution."
Reference: Researchgate - Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks
I know that I will use Keras - ImageDataGenerator Class, but don't know which techniques and what parameters to use for the semantic segmentation on small objects task. Could someone enlighten me? Thanks in advance. :)
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=20, # is a value in degrees (0–180)
width_shift_range=0.2, # is a range within which to randomly translate pictures horizontally.
height_shift_range=0.2, # is a range within which to randomly translate pictures vertically.
shear_range=0.2, # is for randomly applying shearing transformations.
zoom_range=0.2, # is for randomly zooming inside pictures.
horizontal_flip=True, # is for randomly flipping half the images horizontally
fill_mode='nearest', # is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift
featurewise_center=True,
featurewise_std_normalization=True)
datagen.fit(X_train)
The augmentation and preprocessing phases are always depending on the problem that you have. You have to think of all the possible augmentation which can enlarge your dataset. But the most important thing is, that you should not perform extreme augmentations, which makes new training samples in the way which can not happen in real examples. If you do not expect that the real examples will be horizontally flipped do not perform horizontal flip, since this will give your model false information. Think of all the possible changes that can happen in your input images and try to artificially produce new images from your existing one. You can use a lot of built-in functions from Keras. But you should be aware of each that it will not make new examples which are not likely to be present on the input of your model.
As you said, there is no "one fits all" solution, because everything is dependent on the data. Analyse the data and build everything with respect to it.
About the small objects - one direction which you should check are the loss functions which emphasise the impact of target volumes in comparison to the background. Look at the Dice Loss or Generalised Dice Loss.
I was checking the design of AlexNet in MATLAB which is summarized as follows:
The input layer says 227x227x3 with zerocenter normalization. What does zerocenter normalization mean? How could I do this keras?
I was going through the preprocessing documentation at keras and was not sure if any of the following attributes satisfy zerocenter normalization? The attributes as also given in the documentation are:
- featurewise_center
- samplewise_center
- featurewise_std_normalization
- samplewise_std_normalization
Zero center normalization typically means that images are normalized to have a mean of 0 and a standard deviation of 1. If your images are NumPy arrays, you can easily achieve this:
img = (img - img.mean()) / img.std()
samplewise_center and samplewise_std_normalization do the same thing, making sure that each image has a mean of 0 and standard deviation 1. If you want to use the mean/std of the dataset, instead of samplewise mean/std, I guess you should do it manually.
The definition of zerocenter normalization in MATLAB has been specified in imageInputLayer documentation:
'zerocenter' — Subtract the average image specified by the AverageImage property. The trainNetwork function automatically computes the average image at training time.
Therefore, the mean image is subtracted from input images to make them have a mean of zero (this helps with a smooth and faster optimization process during training of the model). So, the equivalent option in Keras would be featurewise_center:
featurewise_center: Boolean. Set input mean to 0 over the dataset, feature-wise.
Note that you need to call the fit() method of ImageDataGenerator to compute the mean image:
datagen = ImageDataGenerator(featurewise_center=True, ...)
datagen.fit(train_data)
# now you can call `flow`
I am training an image classification CNN using Keras.
Using the ImageDataGenerator function, I apply some random transformations to the training images (e.g. rotation, shearing, zooming).
My understanding is, that these transformations are applied randomly to each image before passed to the model.
But some things are not clear to me:
1) How can I make sure that specific rotations of an image (e.g. 90°, 180°, 270°) are ALL included while training.
2) The steps_per_epoch parameter of model.fit_generator should be set to the
number of unique samples of the dataset divided by the batch size define in the flow_from_directory method. Does this still apply when using the above mentioned image augmentation methods, since they increase the number of training images?
Thanks,
Mario
Some time ago I raised myself the same questions and I think a possible explanation is here:
Consider this example:
aug = ImageDataGenerator(rotation_range=90, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True,
fill_mode="nearest")
For question 1): I specify a rotation_range=90, which means that while you flow (retrieve) the data, the generator will randomly rotate the image by a degree between 0 and 90 deg. You can not specify an exact angle cause that's what ImageDataGenerator does: generate randomly the rotation. It is also very important concerning your second question.
For question 2): Yes it still applies to the data augmentation method. I was also confused in the beginning. The reason is that since the image is generated randomly, for each epoch, the network sees the images all different from those in previous epoch. That's why the data is "augmented" - the augmentation is not done within an epoch, but throughout the entire training process. However, I have seen other people specifying 2x value of the original steps_per_epoch.
Hope this helps
I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.