Transfer learning with Keras: Input shape mismatch - python

I'm running a classification and predition neural network algorithme using pre-trained model with keras.
Now I know the shape of the input for keras is (224,224,3) but my input has this shape (180, 200, 20) and I get the following error:
ValueError: Dimension 0 in both shapes must be equal, but are 3 and 64. Shapes are [3,3,20,64] and [64,3,3,3]. for 'Assign_32' (op: 'Assign') with input shapes: [3,3,20,64], [64,3,3,3].
and here is the code:
from keras import applications
from keras.layers import Input
input_tensor = Input(shape = (180, 200, 20))
vgg_model = applications.VGG16(weights = 'imagenet', include_top = False, input_tensor = input_tensor)
Any idea how to get around this? Thank you

From Documentation:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
You can try to create a vgg16 from scratch from this link. VGG16 model for Keras

You need to resize your input image
from keras.preprocessing import image
img = image.load_img("image1.jpeg",target_size=(224,224))
If you want to learn to do transfer learning from scratch in keras you can read this article. This article has step by step implementation.

In your case, since you are not dealing with images of the right size (or number of channels) you may want to cut out large parts of the vgg network to still save the information contained in the middle layers, but I am not sure how efficient it would be.
You would need to remove the first convolution layer, and all the dense layers at the end, replacing them with your own layers. You would certainly need to retrain the whole network, so rather than transfer learning you would be doing very smart initialization.


Creating a Keras CNN for image alteration

I'm working on a problem that involves computationally evaluating three-dimensional data of the shape (32, 16, 5) and providing a corrected form of this data also in the shape of (32, 16, 5). The problem is relatively specific to my field, but it can be viewed as analogous to processing color images (just with five color channels instead of three). If it helps, this could be thought of as a color correction model.
In my initial efforts, I created a random forest model using XGBoost for each of these output parameters. I had good results, but found that the sheer number of output parameters (32*16*5 = 2560) made the runtime of this approach too long, so I am looking for an alternative.
I'm looking at using Keras to solve this, using a convolutional neural network approach, since the adjacent 'pixels' in my data should have some useful information about their neighbors. Note that 'adjacency' here is both spatial and in the color channels. So far, I am doing alright in creating a simple model that I believe has inputs/outputs of the correct shape, but I am running into an issue when I try to train the model on some dummy images:
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (batch_size, width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv3D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = [input_img]*500
outputs = [output_img]*500
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"]), output_img)
However, when I run this code, I get the following error:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=3. Full shape received: [32, 16, 5]
I am not really sure what the other two expected dimensions are for the data passed into I suspect this is a problem with the way that I am formatting my input data. Even if I have a list of input/output images, that will only bring the ndim of my data to 4, not 5.
I have been trying to find similar examples in the documentation and around the web to see what I'm doing incorrectly, but 3D convolution on a non-classifier network seems a bit off the beaten path, and I'm not having much luck (or just don't know the name of what I should search for).
I have tried passing the dummy training set to instead of two individual images. Fitting with, outputs) instead, I get:
ValueError: Layer sequential expects 1 inputs, but it received 500 input tensors.
It seems that passing a list of tensors isn't correct here. If I convert the list of input images to numpy arrays with:
inputs = np.array(inputs)
outputs = np.array(outputs)
This does bring up the number of dimensions in my input data to 4, but Keras is still expecting 5. The error I get in this case is very similar to the first:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=4. Full shape received: [None, 32, 16, 5]
I'm definitely not understanding something here, and any help would be appreciated.
I think you made two mistakes in your code:
Instead of using Conv3D, you need to use Conv2D., output_img) should be, outputs).
The reason why you need to use Conv2D is the shape of your data is (length,width,channel), it doesn't possess an extra dimension.
Try the script below
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = np.array([input_img]*500)
outputs = np.array([output_img]*500)
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"]), outputs)

Inception v3 and Xception for data with 2 channels

I am trying to use the pre-trained models for my own data which is in the shape (64,256,2) and I am able to change input shape for VGG16 and ResNet50 like this:
base_model = keras.applications.vgg16.VGG16(input_shape=(32,128,2), include_top=False, weights=None)
However, the same method does not work for both Inception v3 and Xception.
The error I get is:
model = keras.applications.inception_v3.InceptionV3(input_shape=(64, 256, 2), weights=None, include_top=False)
Input size must be at least 75x75; got `input_shape=(64, 256, 2)`
Any ideas on how to go over this?
Thank you!
There is a minimum dimension for width/height for most of the convolutional neural network.
There are many pooling layer in the network which reduces the feature map dimension by a factor, if your input is too small the network can pass your input to the end without reaching 0 height/width for feature map. So, you must use the specified minimum dimension for the network, in this case 75by75.

How to change the shape of a layer in pre-trained Keras CNN model? [duplicate]

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them. I'd very much like to fine-tune a pre-trained model (like the ones here).
The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.
I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224).
Is there any way that I can use one of those models? I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers). Is this approach correct?
The model's architecture cannot be changed because the weights have been trained for a specific input configuration. Replacing the first layer with your own would pretty much render the rest of the weights useless.
-- Edit: elaboration suggested by Prune--
CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input. In your case the second layer has been trained to expect the features of the first layer. By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained. I doubt that they could retain any of the knowledge learned during the initial training.
--- end edit ---
There is an easy way, though, which you can make your model work with grayscale images. You just need to make the image to appear to be RGB. The easiest way to do so is to repeat the image array 3 times on a new dimension. Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.
In numpy this can be easily done like this:
print(grayscale_batch.shape) # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape) # (64, 224, 224, 3)
The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.
I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.
Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient. You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal. The modified model will both work out of the box (with reduced accuracy) and be finetunable. Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.
To do this, you'll have to add some code where the pretrained weights are loaded. In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model. The required modification is to sum the weight tensor over the dimension of the input channels. The way the weights tensor is organized varies from framework to framework. The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].
Using PyTorch as an example, in a ResNet50 model from Torchvision (, the shape of the weights for conv1 is [64, 3, 7, 7]. Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7]. At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.
To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights. There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.
Unmodified ResNet50 w/ RGB Images : Prec #1: 75.6, Prec #5: 92.8
Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec #1: 64.6, Prec #5: 86.4
Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec #1: 63.8, Prec #5: 86.1
def _load_pretrained(model, url, inchans=3):
state_dict = model_zoo.load_url(url)
if inchans == 1:
conv1_weight = state_dict['conv1.weight']
state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
elif inchans != 3:
assert False, "Invalid number of inchans for pretrained weights"
def resnet50(pretrained=False, inchans=3):
"""Constructs a ResNet-50 model.
pretrained (bool): If True, returns a model pre-trained on ImageNet
model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
if pretrained:
_load_pretrained(model, model_urls['resnet50'], inchans=inchans)
return model
A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model. Like this:
from keras.models import Model
from keras.layers import Input
resnet = Resnet50(weights='imagenet',include_top= 'TRUE')
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor) # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x)
model = Model(inputs=input_tensor,outputs=out)
Why not try to convert a grayscale image to a fake "RGB" image?
Dropping the input layer will not work out. This will cause that the all following layers will suffer.
What you can do is Concatenate 3 black and white images together to expand your color dimension.
img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])
model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)
I faced the same problem while working with VGG16 along with gray-scale images. I solved this problem like follows:
Let's say our training images are in train_gray_images, each row containing the unrolled gray scale image intensities. So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set. So before passing to fit function do the following:
Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image). The only difference is here we are using the number of the channel is 3.
dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8)
Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images". (Here the dimensions are [no_of_examples, height, width, channel])
dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]
Finally pass the dummy_RGB_images instead of the gray scale data-set, like:,...)
numpy's depth-stack function, np.dstack((img, img, img)) is a natural way to go.
If you're already using scikit-image, you can get the desired result by using gray2RGB.
from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)
I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.
What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
With the following layer:
(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:
torch.Size([64, 3, 7, 7])
So I did: =, 1, 7, 7)
And then check that the output of the new model is the same than the output with the gray scale image:
y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
tensor(710.8861, grad_fn=<SumBackward0>))
input_image_1: one channel image
input_image_3: 3 channel image (gray scale - all channels equal)
model_resnet_1: modified model
model_resnet_3: Original resnet model
It's really easy !
example for 'resnet50':
before do it you should have :
resnet_50= torchvision.models.resnet50()
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3),
Just do this !
resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
the final step is to update state_dict.
resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)
so if run as follow :
results would be :
Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3,
3), bias=False)
As you see input channel is for the grayscale images.
what I did is to just simply expand grayscales into RGB images by using the following transform stage:
import torchvision as tv
tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
When you add the Resnet to model, you should input the input_shape in Resnet definition like
model = ResNet50(include_top=True,input_shape=(256,256,1))

How can I run my Keras net on non-square images?

I trained a UNet based image segmentation model in tf.keras which predicts if and where an object is in a given image. I train with an input shape of (None, 256, 256, 1) and output a (None, 256, 256, 3) shaped prediction.
I now want to predict larger images (eg. (520, 696)) and want to use the same model. I am aware that one can change the input shape of the model to size (None, None, None, 1). However, now it can still only predict square images – for the image mentioned above, it returns a Dimensionality Error as shapes don't match (520 != 696).
Does anyone know how to avoid this or have a working function to stitch together smaller square outputs?
Steps to error:
img = # shaped (520, 696)
pred = model.predict(img[None,...,None])
InvalidArgumentError: _MklConcatOp : Dimensions of inputs should match: shape[0][1]= 64 vs. shape[1][1] = 65
[[{{node concatenate_4/concat}}]]
I found a solution – due to the fact, that I trained a UNet (with concatenation-layers after upsampling), it can only combine powers of 2 (eg. 256 / 512). I therefore have to add padding to bring it to the next power of two before prediction and remove padding from the output.

Keras Error on Convolutional Layer and Input Data

I filtered the training and testing data from CIFAR-100, I take for fruit and vegetables superclass only. Now, I've 2,500 training and 500 testing data. But, I got an error said that wrong dimension input for Convolutional layer.
My array data form:
I hope someone can help me for this case, thank you.
Your input data should have shape (2500, 3, 32, 32), seems you lost two of the dimensions on your preprocessing steps, either fix those or reshape your data as:
inputData = inputData.reshape((2500, 3, 32, 32)).
In general the input for a convolutional layer is (numSamples, numChannels, width, height). Note that when using the tensorflow backend the number of channels dimension goes at the end.

