How to flatten an image? - python

How do you flatten an image?
I know that we make use conv2d and pooling to detect the edges and minimize the size of the picture, so do we then flatten it after that?
Will the flattened, pooled image will be a vector in one row and features or one column and the features?
Do we make the equation x_data=x_date/255 after flattening or before convolution and pooling?
I hope to know the answer.

Here's the pipeline:
Input image (could be in batches - let's say your network processes 10 images simultaneously) so 10 images of size (28, 28) -- 28 pixels height / weight and let's say the image has 1 filter only (grayscale).
You are supposed to provide to your network an input of size (10, 28, 28, 1), which will be accepted by a convolutional layer. You are free to use max pooling and maybe an activation function. Your convolutional layer will apply a number of filters of your choice -- let's assume you want to apply 40 filters. These are 40 different kernels applied with different weights. If you want to let's say classify these images you will (most likely) have a number of Dense layers after your convolutional layers. Before passing the output of the convolutional layers (which will be a representation of your input image after a feature extraction process) to your dense layers you have to flatten it in a way (You may use the simplest form of flattening, just passing the numbers one after the other). So your dense layer accepts the output of these 40 filters, which will be 'images' -- their size depends on many things (kernel size, stride, original image size) which will later be flattened into a vector, which supposedly propagates forward the information extracted by your conv layer.
Your second question regarding MinMaxScaling (div by 255) - That is supposed to take place before everything else. There are other ways of normalizing your data (Standar scaling -- converting to 0 mean and unit variance) but keep in mind, when using transformations like that, you are supposed to fit the transformation on your train data and transform your test data accordingly. You are not supposed to fit and transform on your test data. Here, dividing by 255 everything is accepted but keep that in mind for the future.

Related

Convolution Neural Networks Intuition - Difference in outcome between high kernel filter size vs high number of features

I wanted to understand architectural intuition behind the differences of:
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1))
and
tf.keras.layers.Conv2D(32, (7,7), activation='relu', input_shape=(28, 28, 1))
Assuming,
As kernel size increases, more complex feature-pattern matching can be performed in the convolution step.
As feature size increases, a larger variance of smaller features can define a particular layer.
How and when (if possible kindly give scenarios) do we justify the tradeoff at an abstract level?
This can be answered from 3 different views.
Parameters:
Since you comparing 2 different convolution2D layers with different sizes, it's important to see the number of training parameters 𝑀∗(𝐾∗𝐾∗𝐷)+𝑀 needed for each, which in-turn makes your model more complex, and easy/difficult to train.
Here, the number of trainable parameters increases over 2.5 times when using the second configuration for conv2d
first conv2d layer: 64*(3*3*1)+64 = 640
second conv2d layer: 32*(7*7*1)+32 = 1600
Input:
Another way of asking what filter size must be used and why is by analyzing the input data in the first place. Since the goal of the first conv2d layer (over the input) is to capture the most basic of patterns in the image, ask yourself if the MOST basic of the pattern in the image really do need a larger filter to learn?
If you think that a large amount of pixels is necessary for the network to recognize the object you will use large filters (as 11x11 or 9x9). If you think what differentiates objects are some small and local features you should use small filters (3x3 or 5x5)
Usually, a better practice is to stack conv2d layers to capture bigger patterns in the image since they are made of a combination of smaller patterns that are more easily captured by smaller filters.
End goal:
Usually the goal of a conv network is to compress the image's height and width into a large number of channels which here are made of filters.
This process of down sampling image into its representative features allows us to finally add a few dense layers at the end to do our classification tasks.
The first conv2d will downsample the image only by a little, and generate a large number of channels, while the second conv2d will downsample it a lot (because larger conv filter strides over the image), and have lesser number of filters.
But the act of downsampling, to get a smaller image with a lesser number of channels (filters) immediately causes loss of information. Therefore it's recommended that it's done gradually to retain as much information as possible from the original image.
Then it can be stacked with other conv2d to get to a near vector representation of the image before classification.
Summary:
The second conv2d will be able to capture larger more complex patterns at once as compared to the first conv2d at that step.
The second conv2d will have a higher loss of information from the original image as it would skip features that are from much smaller and simpler patterns. The first conv2d will be able to capture more basic patterns in the image and use the combinations of those (in stacked Conv layers) to build a more robust set of features for your end task.
Second conv2d needs a higher number of parameters to learn the structure of the image as compared to the first conv2d.
In practice, it is recommended to have a stack of Conv layers with smaller filters to better detect larger more complex patterns in the image.

How to apply a single fully connected layer to each point in an image

I'm trying to set up a non-conventional neural network using keras, and am having trouble efficiently setting this up.
The first few layers are standard convolutional layers, and the output of these have d channels, which each have image shapes of n x n.
What I want to do is use a single dense layer to map this d x n x n tensor onto a single image of size n x n. I want to define a single dense layer, with input size d, and output size 1, and apply this function to each "pixel" on the input (with the inputs taken depthwise across channels).
So far, I have not found a efficient solution to this. I have tried first defining a fully connected layer, then looping over each "pixel" in the input, however this takes many hours to initialize the model, and I am worried that it will slow down backprop, as the computations are likely not properly parallelized.
Is there an efficient way to do this?
What you're describing is a 1x1 convolution with output depth 1. You can implement it just as you implement the rest of the convolution layers. You might want to apply tf.squeeze afterwards to remove the depth, which should have size 1.

Flatten() Layer in Keras with variable input shape

I am working with a CNN implemented in Keras which at some point has a flatten layer. Now, my goal is to allow different input shaped images. So my first conv. layer looks something like:
model.add(Conv2D(...., input_shape=(None, None, 1))
Well in this setup my flatten layer becomes unhappy and tells me to specify the input shape. As such, I am using a GlobalMaxPooling layer currently, which I would like to avoid.
After all, why does the flatten layer bother about the width and height?
Background: I try to train a net for classification (smaller resolution) and afterwards use this net for object detection (higher resolution)
Thanks
It bothers about the shape because you will probably want to connect another layer to it.
And its feature dimension will be the basis for the next layer to create its own weights. A layer can't have a variable size weight matrix, thus, it can't have a variable size feature input.

How to Add Flattened Layer (or Similar) For Variable Input Size of a Convolutional Neural Network

I am wondering if it is possible how to add a similar to flattened layer for images of variable length.
Say we have an input layer for our CNN as:
input_shape=(1, None, None)
After performing your typical series of convolution/maxpooling layers, can we create a flattened layer, such that the shape is:
output_shape=(None,...)
If not, would someone be able to explain why not?
You can add GlobalMaxPooling2D and GlobalAveragePooling2D.
These will eliminate the spatial dimensions and keep only the channels dimension. Max will take the maximum values, Average will get the mean value.
I don't really know why you can't use a Flatten layer, but in fact you can't with variable dimensions.
I understand why a Dense wouldn't work: it would have a variable number of parameters, which is totally infeasible for backpropagation, weight update and things like that. (PS: Dense layers act only on the last dimension, so that is the only that needs to be fixed).
Examples:
A Dense layer requires the last dimension fixed
A Conv layer can have variable spatial dimensions, but needs fixed channels (otherwise the number of parameters will vary)
A recurrent layer can have variable time steps, but needs fixed features and so on
Also, notice that:
For classification models, you'd need a fixed dimension output, so, how to flatten and still guarantee the correct number of elements in each dimension? It's impossible.
For models with variable output, why would you want to have a fixed dimension in the middle of the model anyway?
If you're going totally custom, you can always use K.reshape() inside a Lambda layer and work with the tensor shapes:
import keras.backend as K
def myReshape(x):
shape = K.shape(x)
batchSize = shape[:1]
newShape = K.variable([-1],dtype='int32')
newShape = K.concatenate([batchSize,newShape])
return K.reshape(x,newShape)
The layer: Lambda(myReshape)
I don't think you can because the compile step uses those dimensions to allocate fixed memory when your model is instanced for training or prediction. Some dimensions need to be known ahead of time, so the matrix dimensions can be allocated.
I understand why you want variable-sized image input, the world is not (226, 226, 3). It depends on your specific goals, but for me, scaling up or windowing to a region of interest using say Single Shot Detection as a preprocessing step may be helpful. You could just start with Keras's ImageDataGenerator to scale all images to a fixed size - then you see how much of a performance gain you get from conditional input sizing or windowing preprocessing.
#mikkola, I have found flatten to be very helpful for TimeDistributed models. You can add flatten after the convolution steps using:
your_model.add(Flatten())

Do Keras Conv2d filters have a depth of three if the input has a depth of three?

I've noticed that Conv2d layers are used in code where the input image has three dimensions. However, since we only enter two dimensions for the filter, how does the matrix multiplication take place?
Does the two-dimensional filter convolve each input channel separately (or use broadcasting) (and then just add up the results)?
Or does the depth of the filter automatically match the depth of the input (3 if color images)? If this is the case, 3x3x3 filter should have 27 weights that can be trained as opposed to 9 in the former case.
Tensorflow is more explicit about the filter dimensions for conv2d (you have to input height, width, channels, output_channel)
https://www.tensorflow.org/api_docs/python/tf/nn/conv2d
A Keras Conv2d layer automatically has n input channels for its convolutional filters, where n is the depth / number of channels of the layer before it. This preceding layer feeds as input data into the Conv2d layer.
Assumptions likes these make Keras easier to use for common use cases like chaining together Conv2ds in deep convolutional networks.

Categories

Resources