Flatten() Layer in Keras with variable input shape - python

I am working with a CNN implemented in Keras which at some point has a flatten layer. Now, my goal is to allow different input shaped images. So my first conv. layer looks something like:
model.add(Conv2D(...., input_shape=(None, None, 1))
Well in this setup my flatten layer becomes unhappy and tells me to specify the input shape. As such, I am using a GlobalMaxPooling layer currently, which I would like to avoid.
After all, why does the flatten layer bother about the width and height?
Background: I try to train a net for classification (smaller resolution) and afterwards use this net for object detection (higher resolution)
Thanks

It bothers about the shape because you will probably want to connect another layer to it.
And its feature dimension will be the basis for the next layer to create its own weights. A layer can't have a variable size weight matrix, thus, it can't have a variable size feature input.

Related

How do you calculate the input to the first linear layer of a CNN?

So, I got this code of a CNN build with PyTorch: Pic of the CNN code 1
Until now I've been able to calculate the input size of the linear layer (self.fc) after the last conv1d block (b5 in this case) by printing the product of the last two dimensions of f5.shape in the forward function, but I need to automate the experimentation and I can't just print the value for each test and change the code.
My input tensor is also variable in size (I have signals as my data and I'm using a window to get X points per sample, so for example a 1020 window size will result in a 1020x1x3 tensor I guess, is a 1D CNN and I have 3 input channels).
So, how can I get the self.n_features parameter (the input of the linear layer) automatically using this code?
One way to do this is to use a lazy version of nn.Linear, namely nn.LazyLinear. In your case:
self.fc = nn.LazyLinear(n_class)
It will initialize its weights on the first layer inference.

How to flatten an image?

How do you flatten an image?
I know that we make use conv2d and pooling to detect the edges and minimize the size of the picture, so do we then flatten it after that?
Will the flattened, pooled image will be a vector in one row and features or one column and the features?
Do we make the equation x_data=x_date/255 after flattening or before convolution and pooling?
I hope to know the answer.
Here's the pipeline:
Input image (could be in batches - let's say your network processes 10 images simultaneously) so 10 images of size (28, 28) -- 28 pixels height / weight and let's say the image has 1 filter only (grayscale).
You are supposed to provide to your network an input of size (10, 28, 28, 1), which will be accepted by a convolutional layer. You are free to use max pooling and maybe an activation function. Your convolutional layer will apply a number of filters of your choice -- let's assume you want to apply 40 filters. These are 40 different kernels applied with different weights. If you want to let's say classify these images you will (most likely) have a number of Dense layers after your convolutional layers. Before passing the output of the convolutional layers (which will be a representation of your input image after a feature extraction process) to your dense layers you have to flatten it in a way (You may use the simplest form of flattening, just passing the numbers one after the other). So your dense layer accepts the output of these 40 filters, which will be 'images' -- their size depends on many things (kernel size, stride, original image size) which will later be flattened into a vector, which supposedly propagates forward the information extracted by your conv layer.
Your second question regarding MinMaxScaling (div by 255) - That is supposed to take place before everything else. There are other ways of normalizing your data (Standar scaling -- converting to 0 mean and unit variance) but keep in mind, when using transformations like that, you are supposed to fit the transformation on your train data and transform your test data accordingly. You are not supposed to fit and transform on your test data. Here, dividing by 255 everything is accepted but keep that in mind for the future.

In a Keras Sequential model, Conv2D seems to require the kernel be narrower than the previous layer is thick. Why?

I'm creating a Sequential model in Keras that takes a colour image and convolves it through multiple Conv2D layers of approximately the same size and shape as the top layer (minus the edges sliced off by the convolutions, basically).
My understanding is as follows:
kernel_size indicates the patch size for each convolution's input
filters indicates the layer depth for each convolution's output.
I then do some other stuff after the convolutions, which isn't relevant here.
However, when I tried to compile my model prior to testing it on a little data, I discovered that Tensorflow complains when I try to make kernel_size for a given layer greater than filters for the previous layer. It doesn't actually say that; instead it says
Negative dimension size caused by subtracting 3 from 1 for 'conv2d_2/convolution' (op: 'Conv2D') with input shapes [?,1,1022,1022], [3,3,1022,1]
which isn't exactly informative. However, I noticed that the numbers it puts in correspond to
Negative dimension size caused by subtracting <this layer's kernel_size> from <previous layer's filters> ....
and setting filters to be higher stopped the error.
My question is: why should this be? I thought filters specified depth, and kernel_size specified width. There shouldn't be any need to fit a convolution patch into the thickness of the previous layer. Moreover, this problem does not occur on the first layer, whose channel depth (which I understand to be effectively equivalent to filters) is 3.
Is this a bug, or am I misinterpreting these parameters' meaning, or something else?
Code snippet:
__model = Sequential()
# feature layers
__model.add(Conv2D(input_shape=(3, iX, iY), data_format="channels_first", kernel_size=kernelfilters[0][0],
filters=kernelfilters[0][1], activation=ACTIVATION))
for kernelfilter in kernelfilters:
__model.add(Conv2D(kernel_size=kernelfilter[0], filters=kernelfilter[1], activation=ACTIVATION))
The last line is the one that breaks.
Each kernelfilter in the kernelfilters array is a pair of numbers specifying a kernel_size and a filters value, in that order. iX and iY are the initial image dimensions. ACTIVATION is a constant, currently set to "relu" but I might change it later!
Your premise is wrong, this is not the case in general. This only happens if you fiddle with the image_data_format keras parameter (in ~/.keras/keras.kson) or with the data_format parameter for each layer.
Changing this parameter in an in consistent way (in some layers only) would completely screw up how the data is interpreted, as it changes the position of the channels dimension, which could be interpreter as one of the spatial dimensions. for the TF backend (meaning the input_shape in the top layer should be a tuple of the form (width, height, channels)).

Finding weights for conv2d layer in Tensorflow

I am pretty confused when it comes to the shape of a convolutional layer in tensorflow.
kernels = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, "conv1d(_.)?/kernel:0")
Running this line returns me a kernel with 4 dimensions and a bias.
I expected that the kernel would return me [filter_width, filter_height, filter_number] and a 2D matrix with weights. Instead i have a fourth dimensions and no weights at all.
Maybe I should not interchange dense with convolutional layers in my mind. However most of the explanations I find on the internet stay on a simple level without going into the details of tensorflows model.
So most important for me would be getting the interconnected weights of the edges between the layers. Like seen on this picture:
This link regards to my problem: Something I want from Tensorflow
I hope someone can follow my trouble in understanding, otherwise do not hesitate adding comments.
Filters/kernels always have 4 dimensions, which are (width, height, filter_nr, channels). Channels equals to the number of channels in the input image, but for later layers in the network it may be different.
The weights you are asking are for a fully connected (dense) layer, not for a convolutional layer (which is what Conv2D is).

How to Add Flattened Layer (or Similar) For Variable Input Size of a Convolutional Neural Network

I am wondering if it is possible how to add a similar to flattened layer for images of variable length.
Say we have an input layer for our CNN as:
input_shape=(1, None, None)
After performing your typical series of convolution/maxpooling layers, can we create a flattened layer, such that the shape is:
output_shape=(None,...)
If not, would someone be able to explain why not?
You can add GlobalMaxPooling2D and GlobalAveragePooling2D.
These will eliminate the spatial dimensions and keep only the channels dimension. Max will take the maximum values, Average will get the mean value.
I don't really know why you can't use a Flatten layer, but in fact you can't with variable dimensions.
I understand why a Dense wouldn't work: it would have a variable number of parameters, which is totally infeasible for backpropagation, weight update and things like that. (PS: Dense layers act only on the last dimension, so that is the only that needs to be fixed).
Examples:
A Dense layer requires the last dimension fixed
A Conv layer can have variable spatial dimensions, but needs fixed channels (otherwise the number of parameters will vary)
A recurrent layer can have variable time steps, but needs fixed features and so on
Also, notice that:
For classification models, you'd need a fixed dimension output, so, how to flatten and still guarantee the correct number of elements in each dimension? It's impossible.
For models with variable output, why would you want to have a fixed dimension in the middle of the model anyway?
If you're going totally custom, you can always use K.reshape() inside a Lambda layer and work with the tensor shapes:
import keras.backend as K
def myReshape(x):
shape = K.shape(x)
batchSize = shape[:1]
newShape = K.variable([-1],dtype='int32')
newShape = K.concatenate([batchSize,newShape])
return K.reshape(x,newShape)
The layer: Lambda(myReshape)
I don't think you can because the compile step uses those dimensions to allocate fixed memory when your model is instanced for training or prediction. Some dimensions need to be known ahead of time, so the matrix dimensions can be allocated.
I understand why you want variable-sized image input, the world is not (226, 226, 3). It depends on your specific goals, but for me, scaling up or windowing to a region of interest using say Single Shot Detection as a preprocessing step may be helpful. You could just start with Keras's ImageDataGenerator to scale all images to a fixed size - then you see how much of a performance gain you get from conditional input sizing or windowing preprocessing.
#mikkola, I have found flatten to be very helpful for TimeDistributed models. You can add flatten after the convolution steps using:
your_model.add(Flatten())

Categories

Resources