TensorFlow 1D model, wrong input shape for MaxPooling - python

I'm building a 1D model with TensorFlow for audio but I have a problem with the input shape during the second MaxPool1D in the model.
The problem is here, after this Pooling:
x = Convolution1D(32, 3, activation=relu, padding='valid')(x)
x = MaxPool1D(4)(x)
I get this error:
ValueError: Negative dimension size caused by subtracting 4 from 1 for 'max_pooling1d_5/MaxPool' (op: 'MaxPool') with input shapes: [?,1,1,32].
I tried to reshape x (which is a tensor) but I don't think I'm going in the right way.
In this same model, before that, I have a couple convolutional layers and a maxpooling that are working proporly.
Anyone have suggestions?
Thanks

The number of steps in the input to the MaxPool1D layer is smaller than the pool size.
In the error, it says ...input shapes: [?,1,1,32], which means the output from the Convolution1D layer has shape [1,32]. It needs to be at least 4 steps to be used as input to the MaxPool1D(4) layer, so have a minimum size of [4,32].
You can continue walking this back. For example, the Convolution1D layer will decrease the step size by kernel_size-1=2. This means the input to the Convolution1D layer needs to have at least 4+2=6 steps, meaning a shape of at least [6,?]. Continuing up to the input layer, you'll find the input size is too small.
You'll need to change the architecture to allow the input size, or, if applicable, change the input size.

Related

Neural network hidden layer vs. Convolutional hidden layer intuition

If there are 10 features and 1 output class (sigmoid activation) with a regression objective:
If I use only 5 neurons in my first dense hidden layer: will the first error be calculated solely based on half of the training feature set? Isn't it imperative to match the # of features with the neurons in hidden layer #1 so that the model can see all the features at once? Otherwise it's not getting the whole picture? The first fwd propagation iteration would use 5 out of 10 features, and get the error value (and train during backprop, assume batch grad descent). Then the 2nd fwd propagation iteration would see the remaining 5 out of 10 features with updated weights and hopefully arrive at a smaller error. BUT its only seeing half the features at a time!
Conversely, if I have a convolutional 2D layer of 64 neurons. And my training shape is: (100, 28,28,1) (pictures of cats and dogs in greyscale), will each of the 64 neurons see a different 28x28 vector? No right, because it can only send one example through the forward propagation at a time? So then only a single picture (cat or dog) should be spanned across the 64 neurons? Why would you want that since each neuron in that layer has the same filter, stride, padding and activation function? When you define a Conv2D layer...the parameters of each neuron are the same. So is only a part of the training example going into each neuron? Why have 64 neurons, for example? Just have one neuron, use a filter on it and pass it along to a second hidden layer with another filter with different parameters!
Please explain the flaws in my logic. Thanks so much.
EDIT: I just realized for Conv2D, you flatten the training data sets so it becomes a 1D vector and so a 28x28 image would mean having an input conv2d layer of 724 neurons. But I am still confused for the dense neural network (paragraph #1 above)
What is your "first" layer?
Normally you have an input layer as first layer, which does not contain any weights.
The shape of the input layer must match the shape of your feature data.
So basically when you train a model with 10 features, but only have a input layer of shape (None,5) (where none stands for the batch_size), tensorflow will raise an exception, because it needs data for all inputs in the correct shape.
So what you said is just not going to happen. If you only have 5 features, the next 5 features wont be fit into the net in the next iteration but, the next sample will be send to the model instead. (Lets say no exception is thrown) So of the next sample also only the first 5 features would be used.
What you can do instead, use a input_layer as first layer with the correct shape of your features. Then as secodn layer, you can use any shape you like, 1,10,100 dense neurons, its up to you (and what works well of course). The shape of the output again must match (this time) the shape of your label data.
I hope this makes it more clear

What value do i pass in Input(shape=...) in keras

Suppose I have a numpy array with shape = (1303, 3988, 1). What value do I need to pass to Input() so that my ai learns or do I need it, do I need to reshape it?
I understand that your data is 1303 instances of vectors size (3988,1).
The answer depend on the layer goes after the input:
If you feed it after to Conv1D layer so the input layer should be:
Input(3988,1)
Otherwise you should squeeze the layer with:
np.squeeze(your_numpy_array)
or just flatten the input after the first layer:
x=Input(3988,1)
x=Flatten()(x)

Should I use only "exactly same" input shape for transfer learning?

I trained a CNN model with (5x128x128x3) size of input shape
and I got trained weight of (5x128x128x3)
by the way, I wanna use this weight for training (7x128x128x3) size of input data
So, this is my question
should I use only same shape of input?
I wonder if I can use another size (in this case, 7x128x128x3) of input for transfer learning
ValueError: Error when checking input: expected input_1 to have shape (5, 128, 128, 3) but got arry with shape (7, 128, 128, 3)```
Let's break down the dimensions (5x128x128x3):
The first dimension is the batch size (which was 5 when the original model was trained). This is irrelevant and you can set it to None as pointed out in the comments to feed arbitrary sized batches to the model.
The second to third dimensions (128x128) are the width and height of the image and you may be able to change these, but it's hard to say for sure without knowing the model architecture and which layer output you're using for transfer learning. The reason you can change these is that 2d convolutional filters are repeated across the 2d dimensions (width and height) of the image, so they will remain valid for different widths and heights (assuming compatible padding). But if you change the 2d dimensions too much, it is possible that the receptive fields of the layers are changed in a way that hurts transfer learning performance. Eg. if the 7th conv layer in the network for 128x128 input can see the entire input image in each activation (a receptive field of 128x128), then if you double the width and height, it won't anymore and the layer may not recognize certain global features.
The fourth dimension is the number of channels in the input images and you can't change this, as the filters in the first layer will have 3 weights across the depth dimension.

Is it possible to implement a max pooling layer in Keras which provides the maximum n values from a given pool size?

I have a multidimensional time series dataset which has the following shape (n_samples, 512, 9) where 512 is the timesteps and 9 are the channels.
After the first 1D CNN layer with 64 kernels my output shape is (n_samples, 512, 64). Now I would like to have my input to next layer which is an LSTM to be of the shape (n_samples, 384, 64).
It can be achieved if I have a Maxpool layer that returns maximum 3 values from pool size of 4 but is it possible to implement this in Keras?
You can probably solve this with a keras.layers.Lambda layer, and the backend to tf.nn.in_top_k. Note that the handling is somewhat different from tf.nn.top_k, in that it does not pool if all the values are of the same value!
Now you can define yourself a function that returns the top k values (and does so
somewhat efficiently), and then pass it as a function to the lambda layer.
I sadly haven't worked enough with Keras to type out the specific code, but maybe this is help enough to point you in the right direction.
Also, there exists a similar thread for TensorFlow specifically.

Tensorflow InvalidArgumentError: Incompatible shapes after slicing input by Lambda Layer

In my Convolution Network, I recently add a Lambda Layer as the input layer for select specific channels of the input images following the answer from this question
model.add(Lambda(lambda x: x[:,:,:2], input_shape=(w, h, 3)))
When I tried to add the MaxPooling2D Layer, I got the error ValueError: Negative dimension size caused by subtracting 3 from 2 for 'max_pooling2d_14/MaxPool' (op: 'MaxPool') with input shapes: [?,250,2,64]
I thought I make some mistakes between Theano and Tensorflow dim order so I edited the Lambda Layer:
model.add(Lambda(lambda x: x[:2,:,:], input_shape=(w, h, 3)))
This time I got no problem when adding more layer, but when I tried to use fit_generator, it gets the error: InvalidArgumentError: Incompatible shapes: [64] vs. [2]
The full trace back is very long, I upload them to here.
I'm using on Linux with 4 GPU for calculation, thanks for your help.
The problem lies the way I slice the input using the Lambda Layer.
The input shape has 4 properties in this order: batch_size, width, height, channels.
For selecting multiple array of the input data, because Tensorflow is not supporting the advance indexing method from numpy, we should slice the input tensor first, use dim expanding to add the color depth, and then concatenate them later.

Categories

Resources