Related
I'm currently working on a simple neural network using Keras, and I'm running into a problem with my labels. The network is making a binary choice, and as such, my labels are all 1s and 0s. My data is composed of a 3d NumPy array, basically pixel data from a bunch of images. Its shape is (560, 560, 32086). However since the first two dimensions are only pixels, I shouldn't assign a label to each one so I tried to make the label array with the shape (1, 1, 32086) so that each image only has 1 label. However when I try to compile this with the following code:
model = Sequential(
[
Rescaling(1.0 / 255),
Conv1D(32, 3, input_shape=datax.shape, activation="relu"),
Dense(750, activation='relu'),
Dense(2, activation='sigmoid')
]
)
model.compile(optimizer=SGD(learning_rate=0.1), loss="binary_crossentropy", metrics=['accuracy'])
model1 = model.fit(x=datax, y=datay, batch_size=1, epochs=15, shuffle=True, verbose=2)
I get this error "ValueError: Data cardinality is ambiguous:
x sizes: 560
y sizes: 1
Make sure all arrays contain the same number of samples." Which I assume means that the labels have to be the same size as the input data, but that doesn't make sense for each pixel to have an individual label.
The data is collected through a for loop looping through files in a directory and reading their pixel data. I then add this to the NumPy array and add their corresponding label to a label array. Any help in this problem would be greatly appreciated.
The ValueError occurs because the first dimension is supposed to be the number of samples and needs to be the same for x and y. In your example that is not the case. You would need datax to to have shape (32086, 560, 560) and datay should be (32086,).
Have a look at this example and note how the 60000 training images have shape (60000, 28, 28).
Also I suspect a couple more mistakes have sneaked into your code:
Are you sure you want a Conv1D layer and not Conv2D? Maybe this example would be informative.
Since you are using binary crossentropy loss your last layer should only have one output instead of two.
I have this data
X_regression = tf.range(0, 1000, 5)
y_regression = X + 100
X_reg_train, X_reg_test = X_regression[:150], X_regression[150:]
y_reg_train, y_reg_test = y_regression[:150], y_regression[150:]
I inspect the data input data
X_reg_train[0], X_reg_train[0].shape, X_reg_train[0].ndim
and it returns:
(<tf.Tensor: shape=(), dtype=int32, numpy=0>, TensorShape([]), 0)
I build a model:
# Set the random seed
tf.random.set_seed(42)
# Create the model
model_reg = tf.keras.models.Sequential()
# Add Input layer
model_reg.add(tf.keras.layers.InputLayer(input_shape=[1]))
# Add Hidden layers
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))
# Add last layer
model_reg.add(tf.keras.layers.Dense(units=1))
# Compile the model
model_reg.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.mae,
metrics=[tf.keras.metrics.mae])
# Fit the model
model_reg.fit(X_reg_train, y_reg_train, epochs=10)
The model works.
However, I am confused about input_shape
Why is it [1] in this situation? Why is it sometimes a tuple?
Would appreciate an explanation of different formats of input_shape in different situations.
InputLayer is actually just the same as specifying the parameter input_shape in a Dense layer. Keras actually uses InputLayer when you use method 2 in the background.
# Method 1
model_reg.add(tf.keras.layers.InputLayer(input_shape=(1,)))
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))
# Method 2
model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(1,), activation=tf.keras.activations.relu))
The parameter input_shape is actually supposed to be a tuple, if you noticed that I set the input_shape in your example to be (1,) this is a tuple with a single element in it. As your data is 1D, you pass in a single element at a time therefore the input shape is (1,).
If your input data was a 2D input for example when trying to predict the price of a house based on multiple variables, you would have multiple rows and multiple columns of data. In this case, you pass in the input shape of the last dimension of the X_reg_train which is the number of inputs. If X_reg_train was (1000,10) then we use the input_shape of (10,).
model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(X_reg_train.shape[1],), activation=tf.keras.activations.relu))
Ignoring the batch_size for a moment, with this we are actually just sending a single row of the data to predict a single house price. The batch_size is just here to chunk multiple rows of data together so that we do not have to load the entire dataset into memory which is computationally expensive, so we send small chunks, with the default value being 32. When running the training you would have noticed that under each epoch it says 5/5 which are for the 5 batches of data you have, since the training size is 150, 150 / 32 = 5(rounded up).
For 3D input with the Dense layer it actually just gets flattened to a 2D input, i.e. from (batch_size, sequence_length, dim) -> (batch_size * sequence_length, dim) -> (batch_size, sequence_length, hidden_units) which is the same as using a Conv1D layer with a kernel of 1. So I wouldn't even use the Dense layer in this case.
In Keras, the input layer itself is not a layer, but a tensor. It's the starting tensor you send to the first hidden layer. This tensor must have the same shape as your training data.
Example: if you have 30 images of 50x50 pixels in RGB (3 channels), the shape of your input data is (30,50,50,3). Then your input layer tensor, must have this shape (see details in the "shapes in keras" section).
Each type of layer requires the input with a certain number of dimensions:
Dense layers require inputs as (batch_size, input_size) or (batch_size, optional,...,optional, input_size) or in your case just (input_size)
2D convolutional layers need inputs as:
if using channels_last: (batch_size, imageside1, imageside2, channels)
if using channels_first: (batch_size, channels, imageside1, imageside2)
1D convolutions and recurrent layers use (batch_size, sequence_length, features)
Here are some helpful links : Keras input explanation: input_shape, units, batch_size, dim, etc https://keras.io/api/layers/core_layers/input/
I am having an image dataset each image is of dimensions=(2048,1536).In ImageDataGenerator to fetch data from the directory, I have used the same target size i.e (2048,1536) but while making Sequential model first layer, what input shape should I have to use?? Will it be same as (2048,1536) or I can take any random shape like (224,224).
You should probably flatten your input data by making a vector of size 3145728 (2048 * 1536). If your data is in a numpy array you can use np.flatten() (numpy flatten).
Then your first layer can have the same shape as this vector.
I would resize first the images with cv2.resize(). Do you really need all the information from such a big image?
For a sequential Model it follows for example:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (height,width, ndim)))
...,
where height and width denote your input image dimensions and ndim = 1 for greyscale and ndim = 3 for colored images.
The first(i.e. input)layer is supposed to be the number of features in your dataset. Regarding images, each pixel is considered as a feature. Hence in your case, the image dimension is (2048,1536) you need to flatten it out to get the total number of the pixel(i.e. features). If it is greyscaled image it would be (2048*1536*1) else if it is colour it would be(2048*1536*3).
Also, you use below code from TensorFlow/Keras API while Sequential model creation and it will take care of your input layer size
tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,activation=tf.nn.relu) #1st hidden layer
tf.keras.layers.Dense(128,activation=tf.nn.relu) #2nd hidden layer
tf.keras.layers.Dense(2,activation=tf.nn.softmax)])#output layer
I do not understand why the channel dimension is not included in the output dimension of a conv2D layer in Keras.
I have the following model
def create_model():
image = Input(shape=(128,128,3))
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_1')(image)
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_2')(x)
x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_3')(x)
flatten = Flatten(name='flatten')(x)
output = Dense(1, activation='relu', name='output')(flatten)
model = Model(input=image, output=output)
return model
model = create_model()
model.summary()
The model summary is given the figure at the end of my question. The input layer takes RGB images with width = 128 and height = 128. The first conv2D layer tells me the output dimension is (None, 61, 61, 24). I have used the kernel size of (8, 8), a stride of (2, 2) no padding. The values 61 = floor( (128 - 8 + 2 * 0)/2 + 1) and 24 (number of kernels/filters) makes sense. But why isn't the dimension for the different channels included in the dimension? As far as I can see the parameters for the 24 filters on each of the channels is included in the number of the parameters. So I would expect the output dimension to be (None, 61, 61, 24, 3) or (None, 61, 61, 24 * 3). Is this just a strange notation in Keras or am I confused about something else?
This question is asked in various forms all over the internet and has a simple answer which is often missed or confused:
SIMPLE ANSWER:
The Keras Conv2D layer, given a multi-channel input (e.g. a color image), will apply the filter across ALL the color channels and sum the results, producing the equivalent of a monochrome convolved output image.
An example, from a CIFAR-10 CNN example:
(1) You're training with the CIFAR image dataset, which is made up of 32x32 color images, i.e. each image is shape (32,32,3) (RGB = 3 channels)
(2) Your first layer of your network is a Conv2D Layer with 32 filters, each specified as 3x3, so:
Conv2D(32, (3,3), padding='same', input_shape=(32,32,3))
(3) Counter-intuitively, Keras will configure each filter as (3,3,3), i.e. a 3D volume covering the 3x3 pixels PLUS all the color channels. As a minor detail each filter has an additional weight for a BIAS value, as per normal neural network layer arithmetic.
(4) Convolution proceeds absolutely as normal, except a 3x3x3 VOLUME from the input image is convolved at each step with the 3x3x3 filter, and a single (monochrome) output value (i.e. like a pixel) is produced at each step.
(5) The result is a Keras Conv2D convolution of a specified (3,3) filter on a (32,32,3) image produces a (32,32) result because the actual filter used is (3,3,3).
(6) In this example, we have also specified 32 filters in the Conv2D layer, so the actual output is (32,32,32) for each input image (i.e. you might think of this as 32 images, one for each filter, each 32x32 monochrome pixels).
As a check, you can look at the count of weights (Param #) for the layer produced by model.summary():
Layer (type) Output shape Param#
conv2d_1 (Conv2D) (None, 32, 32, 32) 896
There are 32 filters, each 3x3x3 (i.e. 27 weights) plus 1 for the bias (i.e. total 28 weights each). And 32 filters x 28 weights each = 896 Parameters.
Each of the convolutional filters (8 x 8) is connected to a (8 x 8) receptive field for all the channels of the image. That is why we have (61, 61, 24) as the output of the second layer. The different channels are encoded implicitly into the weights of the 24 filters. This means, that each filter does not have 8 x 8 = 64 weights but instead 8 x 8 x Number of channels = 8 x 8 x 3 = 192 weights.
See this quote from CS231
Left: An example input volume in red (e.g. a 32x32x3 CIFAR-10 image),
and an example volume of neurons in the first Convolutional layer.
Each neuron in the convolutional layer is connected only to a local
region in the input volume spatially, but to the full depth (i.e. all
color channels). Note, there are multiple neurons (5 in this example)
along the depth, all looking at the same region in the input - see
discussion of depth columns in the text below. Right: The neurons from the
Neural Network chapter remains unchanged: They still compute a dot
product of their weights with the input followed by a non-linearity,
but their connectivity is now restricted to be local spatially.
My guess is that you're misunderstanding how convolutional layers defined.
My notation for the shape of the convolutional layer is (out_channels, in_channels, k, k) where k is a the size of the kernel. The out_channels is the number of filters (i.e. convolutional neurons). Consider following image:
The 3d convolutional kernel weights in the picture slide across different data windows of A_{i-1}(i.e. input image). Patches of 3D data of that image of shape (in_channels, k, k) are paired with individual 3d convolutional kernels of matching dimensionality. How many such 3d kernels are there? As the number of output channels out_channels. The depth dimension that kernel adopts is the in_channels of A_{i-1}. Therefore, the dimension in_channels of A_{i-1} is contracted away by the depth-wise dot product that builds up the output tensor with out_channels channels. The precise way in which the sliding windows are constructed is defined by the sampling tuple (kernel_size, stride, padding) and results in output tensor with spatial dimensions determined by the formula that you're correctly applied.
If you want to understand more, including backpropagation and implementation take a look at this paper.
The formula you're using is correct. It may be little confusing because many popular tutorial use number of filters equal to number of channels in the image. TensorFlow/Keras implementation produces its output by computing num_input_channels * num_output_channels intermediate feature maps of size (kernel_size[0], kernel_size[1]). So for each input channel it produces num_output_channels feature maps which then get multiplied and concatenated together to create output shape of (kernel_size[0], kernel_size[1], num_output_channels) Hope this clarifies Vlad's detailed answer
I'm discovering keras library and i can't tell what does the dimention mean in keras layers and how to choose them ? (model.add(Convolution2D(...)) or model.add(Convolution1D(...)) ).
For example i have a set of 9000 train traces and 1000 of test traces and each trace has 1000 samples, so i created the arrays X_train with a size of 9000*1000, X_test has a size of 1000*1000, Y_train has a size of 9000, and Y_test has a size of 1000.
my question is how can i choose the first layer dimension ?.
I tried using the same example implemented in MNIST such :
model.add(Convolution2D(9000, (1, 1), activation='relu', input_shape(1,9000000,1),dim_ordering='th'))
but it didn't work, i don't even know what should i put in each argument of Convolution function.
The choice of dimension (1D, 2D, etc.) depends on the dimensions of your input. For example, since you're using the MNIST dataset, you would use 2D layers since your input is an image with height and width (two dimensions). Alternatively, if you were using text data, you might use a 1D layer because sentences are linear lists of words (one dimension).
I would suggest looking at Francois Chollet's example of a convolutional neural net with MNIST: https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py. (Note: Conv2D is the same as Convolution2D.)