Should I use only "exactly same" input shape for transfer learning?

Should I use only "exactly same" input shape for transfer learning? - python

I trained a CNN model with (5x128x128x3) size of input shape
and I got trained weight of (5x128x128x3)
by the way, I wanna use this weight for training (7x128x128x3) size of input data
So, this is my question
should I use only same shape of input?
I wonder if I can use another size (in this case, 7x128x128x3) of input for transfer learning
ValueError: Error when checking input: expected input_1 to have shape (5, 128, 128, 3) but got arry with shape (7, 128, 128, 3)```

Let's break down the dimensions (5x128x128x3):
The first dimension is the batch size (which was 5 when the original model was trained). This is irrelevant and you can set it to None as pointed out in the comments to feed arbitrary sized batches to the model.
The second to third dimensions (128x128) are the width and height of the image and you may be able to change these, but it's hard to say for sure without knowing the model architecture and which layer output you're using for transfer learning. The reason you can change these is that 2d convolutional filters are repeated across the 2d dimensions (width and height) of the image, so they will remain valid for different widths and heights (assuming compatible padding). But if you change the 2d dimensions too much, it is possible that the receptive fields of the layers are changed in a way that hurts transfer learning performance. Eg. if the 7th conv layer in the network for 128x128 input can see the entire input image in each activation (a receptive field of 128x128), then if you double the width and height, it won't anymore and the layer may not recognize certain global features.
The fourth dimension is the number of channels in the input images and you can't change this, as the filters in the first layer will have 3 weights across the depth dimension.

Related

shape of an output tensor after convolutional filter on a colour image

I find it difficult to understand a notion about tensors.
For VGG (https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16), we start from a batch of colour images (none,224,224,3) and apply 64 2D convolutional filters.
At the output we obtain a tensor of (none,224,224,64), we can see this by making a summary of the model.
However, a filter must treat all 3 colours and my intuition tells me that I should have an output tensor of (none,224,224,3,64).
Could one explain to me why my reasoning is wrong?
Thank you for your explanations.

All filters have shape
(kernel_height, kernel_width, input_channels)
When they pass on your input data with 'SAME' padding, the output shape result is
(input_height, input_width)
And that, for all filters, so
(input_height, input_width, n_filters)

What input shape should I take in first layer of Sequential model when the dimensions of the images are (2048*1536)

I am having an image dataset each image is of dimensions=(2048,1536).In ImageDataGenerator to fetch data from the directory, I have used the same target size i.e (2048,1536) but while making Sequential model first layer, what input shape should I have to use?? Will it be same as (2048,1536) or I can take any random shape like (224,224).

You should probably flatten your input data by making a vector of size 3145728 (2048 * 1536). If your data is in a numpy array you can use np.flatten() (numpy flatten).
Then your first layer can have the same shape as this vector.

I would resize first the images with cv2.resize(). Do you really need all the information from such a big image?
For a sequential Model it follows for example:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (height,width, ndim)))
...,
where height and width denote your input image dimensions and ndim = 1 for greyscale and ndim = 3 for colored images.

The first(i.e. input)layer is supposed to be the number of features in your dataset. Regarding images, each pixel is considered as a feature. Hence in your case, the image dimension is (2048,1536) you need to flatten it out to get the total number of the pixel(i.e. features). If it is greyscaled image it would be (2048*1536*1) else if it is colour it would be(2048*1536*3).
Also, you use below code from TensorFlow/Keras API while Sequential model creation and it will take care of your input layer size
tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,activation=tf.nn.relu) #1st hidden layer
tf.keras.layers.Dense(128,activation=tf.nn.relu) #2nd hidden layer
tf.keras.layers.Dense(2,activation=tf.nn.softmax)])#output layer

How does TensorFlow train kernels?

TensorFlow's API describes the function tf.nn.conv2d() which takes in an argument of filter size: [filter_height, filter_width, in_channel, out_channel]. So if I used the mnist dataset and ran the network on an image displaying the number "5," would the filter be trained on the lower, circular bowl of the 5? Or would it just train on multiple parts of the image? How and what would the filters in the conv2d train on?

You should read the basic principles of convolutional layers:
Every filter is small spatially (along width and height), but extends through the full depth of the input volume. For example, a typical filter on a first layer of a ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels).
During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position. As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position.
Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. Now, we will have an entire set of filters in each CONV layer (e.g. 12 filters), and each of them will produce a separate 2-dimensional activation map. We will stack these activation maps along the depth dimension and produce the output volume.
So, in essence, each [filter_height, filter_width] filter is going to match all patches of the same size in the input and produce a single number for each patch. Some patches can be skipped or added, depending on the stride and padding settings. In the backward pass, the filter will be updated for all of them, i.e., it is trained on the whole input.
E.g., here's stride=1 and padding=2 convolution:

Input shape and Conv1d in Keras

The first layer of my neural network is like this:
model.add(Conv1D(filters=40,
kernel_size=25,
input_shape=x_train.shape[1:],
activation='relu',
kernel_regularizer=regularizers.l2(5e-6),
strides=1))
if my input shape is (600,10)
i get (None, 576, 40) as output shape
if my input shape is (6000,1)
i get (None, 5976, 40) as output shape
so my question is what exactly is happening here? is the first example simply ignoring 90% of the input?

It is not "ignoring" a 90% of the input, the problem is simply that if you perform a 1-dimensional convolution with a kernel of size K over an input of size X the result of the convolution will have size X - K + 1. If you want the output to have the same size as the input, then you need to extend or "pad" your data. There are several strategies for that, such as add zeros, replicate the value at the ends or wrap around. Keras' Convolution1D has a padding parameter that you can set to "valid" (the default, no padding), "same" (add zeros at both sides of the input to obtain the same output size as the input) and "causal" (padding with zeros at one end only, idea taken from WaveNet).
Update
About the questions in your comments. So you say your input is (600, 10). That, I assume, is the size of one example, and you have a batch of examples with size (N, 600, 10). From the point of view of the convolution operation, this means you have N examples, each of with a length of at most 600 (this "length" may be time or whatever else, it's just the dimension across which the convolution works) and, at each of these 600 points, you have vectors of size 10. Each of these vectors is considered an atomic sample with 10 features (e.g. price, heigh, size, whatever), or, as is sometimes called in the context of convolution, "channels" (from the RGB channels used in 2D image convolution).
The point is, the convolution has a kernel size and a number of output channels, which is the filters parameter in Keras. In your example, what the convolution does is take every possible slice of 25 contiguous 10-vectors and produce a single 40-vector for each (that, for every example in the batch, of course). So you pass from having 10 features or channels in your input to having 40 after the convolution. It's not that it's using only one of the 10 elements in the last dimension, it's using all of them to produce the output.
If the meaning of the dimensions in your input is not what the convolution is interpreting, or if the operation it is performing is not what you were expecting, you may need to either reshape your input or use a different kind of layer.

what is the difference between Conventional 1D or 2D in keras layers

I'm discovering keras library and i can't tell what does the dimention mean in keras layers and how to choose them ? (model.add(Convolution2D(...)) or model.add(Convolution1D(...)) ).
For example i have a set of 9000 train traces and 1000 of test traces and each trace has 1000 samples, so i created the arrays X_train with a size of 9000*1000, X_test has a size of 1000*1000, Y_train has a size of 9000, and Y_test has a size of 1000.
my question is how can i choose the first layer dimension ?.
I tried using the same example implemented in MNIST such :
model.add(Convolution2D(9000, (1, 1), activation='relu', input_shape(1,9000000,1),dim_ordering='th'))
but it didn't work, i don't even know what should i put in each argument of Convolution function.

The choice of dimension (1D, 2D, etc.) depends on the dimensions of your input. For example, since you're using the MNIST dataset, you would use 2D layers since your input is an image with height and width (two dimensions). Alternatively, if you were using text data, you might use a 1D layer because sentences are linear lists of words (one dimension).
I would suggest looking at Francois Chollet's example of a convolutional neural net with MNIST: https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py. (Note: Conv2D is the same as Convolution2D.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.