I came across the following statement:
convnet = input_data(shape=[None,img_size,img_size,1], name='input')
I tried looking for a description, but couldn't find a clear explanation.
My main question here is what is the function input_data mainly doing? Is it like a place holder for our input data?
Regarding the shape, what is None at the beginning, and 1 at the end?
Thanks.
The input_data is a layer that will be used as the input layer to your network. Before adding any of the usual layer in your sequential model, you need to specify how your input looks like. Like for example in the mnist data set where you have 784 array representing 28x28 images.
In your example the network wants an input with the shape (None, img_size,img_size,1] meaning in human language:
None - many or a number of or how many images of
img_size X img_size - dimensions of the image
1 - with one color channel
If the mnist dataset would be in full RGB color the input data would be of shape (None, 28, 28, 3)
Usually the None you could think of it as the batch_size.
To be even more explicit, if you would have a batch_size of 1 then you would need as input, in our mnist RGB example, three 28x28 matrixes, one representing the R pixels, another the G pixels and lastly one for the B pixels of the image. This is just one entry. In this case the None value would be 1, but usually it is whatever you decide the batch_size is. You get the picture from here.
Hope it clears things out.
Cheers,
Gabriel
De Santa answer is right: input_data is a placeholder for input features. The array you mention holds first None (always), then IMG width and height (seems the image is squared since width=height) and channels (in this case is 1; ex.: in case of RGB you would get 3 channels). This way the net gets to know the dimensions of input features.
Related
I'm following this tutorial on towardsdatascience.com because I wanted to try the MNIST dataset using Pytorch since I've already done it using keras.
So in Step 2, knowing the dataset better, they print the trainloader's shape and it returns torch.Size([64, 1, 28, 28]). I understand that 64 is the number of images in that loader and that each one is a 28x28 image but what does the 1 mean exactly?
It simply defines an image of size 28x28 has 1 channel, which means it's a grayscale image. If it was a colored image then instead of 1 there would be 3 as the colored image has 3 channels such as RGB.
It's the number of channels in the input. In the MNIST data set the images are gray scale thus the shape of the image is [28, 28, 1]. Notice that pytorch set the first dimension to the channel dimension.
Of course once loaded as batches the total input shape is the one you are getting.
refer to the MNIST dataset link, where it states:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
In short ,
Its just the number of channels your 28x28 image has
This would suggest the number of batches present in the dataset. Think of it as groups, so we have 1 batch of 64 images, or you could change that, and say, have 2 batches of 32 images each. The batch size can usually influence the computational complexity for the model.
And, of course, depending on the used library (especially in the training/testing loop), the code would look slightly different if you would use just 1 batch, or X number of batches.
For example (the number of epochs/iterations = 50): imagine you are training a dataset of batch size = 1, in the training loop you would just write train the model epoch times. However, for batch size = x, you would have to loop for each epoch as well as for each batch/group.
I find it difficult to understand a notion about tensors.
For VGG (https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16), we start from a batch of colour images (none,224,224,3) and apply 64 2D convolutional filters.
At the output we obtain a tensor of (none,224,224,64), we can see this by making a summary of the model.
However, a filter must treat all 3 colours and my intuition tells me that I should have an output tensor of (none,224,224,3,64).
Could one explain to me why my reasoning is wrong?
Thank you for your explanations.
All filters have shape
(kernel_height, kernel_width, input_channels)
When they pass on your input data with 'SAME' padding, the output shape result is
(input_height, input_width)
And that, for all filters, so
(input_height, input_width, n_filters)
I trained a CNN model with (5x128x128x3) size of input shape
and I got trained weight of (5x128x128x3)
by the way, I wanna use this weight for training (7x128x128x3) size of input data
So, this is my question
should I use only same shape of input?
I wonder if I can use another size (in this case, 7x128x128x3) of input for transfer learning
ValueError: Error when checking input: expected input_1 to have shape (5, 128, 128, 3) but got arry with shape (7, 128, 128, 3)```
Let's break down the dimensions (5x128x128x3):
The first dimension is the batch size (which was 5 when the original model was trained). This is irrelevant and you can set it to None as pointed out in the comments to feed arbitrary sized batches to the model.
The second to third dimensions (128x128) are the width and height of the image and you may be able to change these, but it's hard to say for sure without knowing the model architecture and which layer output you're using for transfer learning. The reason you can change these is that 2d convolutional filters are repeated across the 2d dimensions (width and height) of the image, so they will remain valid for different widths and heights (assuming compatible padding). But if you change the 2d dimensions too much, it is possible that the receptive fields of the layers are changed in a way that hurts transfer learning performance. Eg. if the 7th conv layer in the network for 128x128 input can see the entire input image in each activation (a receptive field of 128x128), then if you double the width and height, it won't anymore and the layer may not recognize certain global features.
The fourth dimension is the number of channels in the input images and you can't change this, as the filters in the first layer will have 3 weights across the depth dimension.
I am having an image dataset each image is of dimensions=(2048,1536).In ImageDataGenerator to fetch data from the directory, I have used the same target size i.e (2048,1536) but while making Sequential model first layer, what input shape should I have to use?? Will it be same as (2048,1536) or I can take any random shape like (224,224).
You should probably flatten your input data by making a vector of size 3145728 (2048 * 1536). If your data is in a numpy array you can use np.flatten() (numpy flatten).
Then your first layer can have the same shape as this vector.
I would resize first the images with cv2.resize(). Do you really need all the information from such a big image?
For a sequential Model it follows for example:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (height,width, ndim)))
...,
where height and width denote your input image dimensions and ndim = 1 for greyscale and ndim = 3 for colored images.
The first(i.e. input)layer is supposed to be the number of features in your dataset. Regarding images, each pixel is considered as a feature. Hence in your case, the image dimension is (2048,1536) you need to flatten it out to get the total number of the pixel(i.e. features). If it is greyscaled image it would be (2048*1536*1) else if it is colour it would be(2048*1536*3).
Also, you use below code from TensorFlow/Keras API while Sequential model creation and it will take care of your input layer size
tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,activation=tf.nn.relu) #1st hidden layer
tf.keras.layers.Dense(128,activation=tf.nn.relu) #2nd hidden layer
tf.keras.layers.Dense(2,activation=tf.nn.softmax)])#output layer
So far, I've been practicing neural networks on numerical datasets in pandas, but now I need to create a model that will take an image as input and output a binary mask of that image.
I have my training data as numpy arrays of shape (602, 2048, 2048, 1). 602 images of dimensions 2048x2048 with one channel. The array of output masks have the same dimensions.
What I can't figure out is how to define the first layer or how to correctly feed the data into the model. I would greatly appreciate your help on this issue
Well, this is not a "rule", but probably you will be using mostly 2D conv and related layers.
You feed everything as numpy arrays, as usual, maybe normalizing the values. Common options are:
Between 0 and 1 (just divide by 255.)
Between -1 and 1 (divide by 255., multiply by 2, subtract 1)
Caffe style: subtract from each channel a specific value to "center" the values based on their usual mean without rescaling them.
Your model should start with something like:
inputTensor = Input((2048,2048,1))
output = Conv2D(filters, kernel_size, .....)(inputTensor)
Or, in sequential models: model.add(Conv2D(...., input_shape=(2048,2048,1))
Later, it's up to you to decide which layers to use.
Conv2D
MaxPooling2D
Upsampling2D
Whether you're going to create a linear model or if you're going to divide branches, join branches, etc. is also your call.
Models in a U-Net style should be a good start for you.
What you can't do:
Don't use Flatten layers (actually you can, if you later reshape the output for having image dimensions... but why?)
Don't use Global Pooling layers (you don't want to sacrifice your spatial dimensions)