I am learning about deep learning and tensorflow in this website https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/
I want to know is there any rule of thumb to reshape the array of the datasets?
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
For instance, from the code snippet above, I don't understand the meaning of the numbers passed to function reshape(). How to know the suitable numbers to be passed to the function?
The 4 dimensions in
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
are the following:
dimension 0: images in the batch
dimension 1: rows
dimension 2: columns
dimension 3: number of channels in image
Since this is grayscale, the image has 1 channel, so the last number is 1 (if it was RGB image, the last number would have been 3, representing the 3 channels in RGB).
Dimensions 1 and 2 are the numbers of rows and columns in every image, which in your case is just IMG_SIZE. Dimension 0 is the batch size. You can specify there the number if you knowthe batch size, or you can leave it as -1, and it will be determined uniquely because all the other sizes (dimensions 1,2,3) are provided.
Related
I'm following this tutorial on towardsdatascience.com because I wanted to try the MNIST dataset using Pytorch since I've already done it using keras.
So in Step 2, knowing the dataset better, they print the trainloader's shape and it returns torch.Size([64, 1, 28, 28]). I understand that 64 is the number of images in that loader and that each one is a 28x28 image but what does the 1 mean exactly?
It simply defines an image of size 28x28 has 1 channel, which means it's a grayscale image. If it was a colored image then instead of 1 there would be 3 as the colored image has 3 channels such as RGB.
It's the number of channels in the input. In the MNIST data set the images are gray scale thus the shape of the image is [28, 28, 1]. Notice that pytorch set the first dimension to the channel dimension.
Of course once loaded as batches the total input shape is the one you are getting.
refer to the MNIST dataset link, where it states:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
In short ,
Its just the number of channels your 28x28 image has
This would suggest the number of batches present in the dataset. Think of it as groups, so we have 1 batch of 64 images, or you could change that, and say, have 2 batches of 32 images each. The batch size can usually influence the computational complexity for the model.
And, of course, depending on the used library (especially in the training/testing loop), the code would look slightly different if you would use just 1 batch, or X number of batches.
For example (the number of epochs/iterations = 50): imagine you are training a dataset of batch size = 1, in the training loop you would just write train the model epoch times. However, for batch size = x, you would have to loop for each epoch as well as for each batch/group.
So, at one point during my architecture I must replace the 27 first channels of a tensor x (with the shape of (1,32,32,155)) with the 27 channels of a tensor y with (1,32,32,27)
x = tf.unstack(x, axis=3)
x[0:27] = tf.unstack(y, axis=3 )
x = tf.stack(x, axis=3)
Is what I have settled with. It seemingly works fine for a single image. But as soon as I train my network on a batch of images (in this case a batch of 5). It fails at this point with the error message
Shapes of all inputs must match: values[0].shape = [1,32,32] != values[27].shape = [5,32,32] [Op:Pack] name: stack
When I print the x.shape during a batch training pass, it gives me shape a (1,32,32,155) (printed once for each item in batch); but during unstacking of x, I noticed that the shape of an unstacked element is (5,32,32). As if each of the 155 channels have a batch size of 5.
I am not sure what I am doing wrong. Maybe, this is not even the correct way to replace elements.
Any help is appreciated,
Thanks.
In my problem, I want to convolve two tensors in my neural network model.
The shape of two tensors is [None, 2, 1], [None, 3, 1] respectively. The axis with dimension None means the batch size of the input tensor. For each sample in batch, I want to convolve the two tensors with shape [2, 1] and [3, 1].
However, the tf.nn.conv1d in TensorFlow can only convolve the input with a fixed kernel. Is there any function that can support the convolution of two tensors according to the batch size axis, similar to the tf.multiply which can multiply two tensors for each sample or just elementwise multiplication.
The code I ran can be simplified as follows:
input_signal = Input(shape=(L, M), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
What I want to do is that the sample of input_signal can be convolved by the sample of input_h with the same index. However, it just shows my pure idea which can not be able to run in the env. My question is that how I can modify the code to enable the input tensor can be convolved with another input tensor for every sample in the batch.
According to the description of the kernel size arguments for Conv1D layer or any other layer mentioned in the documentation, you cannot add multiple filters with different Kernel size or strides.
Also, Convolutions with Kernels of different sizes will produce outputs of different height and width.
The general formula for output size assuming a symmetric kernel is given by
(X−K+2P)/S+1
Where X is the input Height / Width
K is the Kernel size
P is the zero-padding
S is the stride length
So assuming you are keeping zero paddings and stride same you cannot have multiple kernels with different sizes in ConvD layer.
You can, however, use the tf.keras.Model API to create Conv1D multiple times on the same input OR multiple Conv1D Layer for different inputs and kernel size respectively in your case and then either maxpool, crop or use zero paddings to match the dimensions of the different outputs before stacking them.
Example:
inputs = tf.keras.Input(shape=(n_timesteps,n_features))
x1 = tf.keras.layers.Conv1D(filters=32, kernel_size=2)(inputs)
x2 = tf.keras.layers.Conv1D(filters=16, kernel_size=3)(inputs)
#match dimensions (height and width) of x1 or x2 here
x3 = tf.keras.layers.Concatenate(axis=-1)[x1,x2]
You can use either Zeropadding1D or Cropping2D or Maxpool1D for matching the dimensions.
I came across the following statement:
convnet = input_data(shape=[None,img_size,img_size,1], name='input')
I tried looking for a description, but couldn't find a clear explanation.
My main question here is what is the function input_data mainly doing? Is it like a place holder for our input data?
Regarding the shape, what is None at the beginning, and 1 at the end?
Thanks.
The input_data is a layer that will be used as the input layer to your network. Before adding any of the usual layer in your sequential model, you need to specify how your input looks like. Like for example in the mnist data set where you have 784 array representing 28x28 images.
In your example the network wants an input with the shape (None, img_size,img_size,1] meaning in human language:
None - many or a number of or how many images of
img_size X img_size - dimensions of the image
1 - with one color channel
If the mnist dataset would be in full RGB color the input data would be of shape (None, 28, 28, 3)
Usually the None you could think of it as the batch_size.
To be even more explicit, if you would have a batch_size of 1 then you would need as input, in our mnist RGB example, three 28x28 matrixes, one representing the R pixels, another the G pixels and lastly one for the B pixels of the image. This is just one entry. In this case the None value would be 1, but usually it is whatever you decide the batch_size is. You get the picture from here.
Hope it clears things out.
Cheers,
Gabriel
De Santa answer is right: input_data is a placeholder for input features. The array you mention holds first None (always), then IMG width and height (seems the image is squared since width=height) and channels (in this case is 1; ex.: in case of RGB you would get 3 channels). This way the net gets to know the dimensions of input features.
Reading the Tensorflow MNIST tutorial, I stumbled over the line
x_image = tf.reshape(x, [-1,28,28,1])
28, 28 comes from width, height, 1 comes from the number of channels. But why -1?
I guess this is related to mini-batch training, but I wondered why -1 and not 1 (which seems to give the same result in numpy).
(Probably related: Why does the reshape of numpy give the same results for -1,-2 and 1)?
-1 means that the length in that dimension is inferred. This is done based on the constraint that the number of elements in an ndarray or Tensor when reshaped must remain the same. In the tutorial, each image is a row vector (784 elements) and there are lots of such rows (let it be n, so there are 784n elements). So, when you write
x_image = tf.reshape(x, [-1, 28, 28, 1])
TensorFlow can infer that -1 is n.
In the MNIST tutorial that you are reading, the desired shape for your input layer : [batch_size, 28, 28, 1]
x_image = tf.reshape(x, [-1,28,28,1])
Here -1 for input x specifies that this dimension should be dynamically computed based on the number of input values in x, holding the size of all other dimensions constant. This allows us to treat batch_size(parameter with value -1) as a hyperparameter that we can tune.
−1 indicates that the length on the current axis needs to be automatically deduced according to the rule that the total elements of the tensor remain unchanged