I am trying to use the pre-trained models for my own data which is in the shape (64,256,2) and I am able to change input shape for VGG16 and ResNet50 like this:
base_model = keras.applications.vgg16.VGG16(input_shape=(32,128,2), include_top=False, weights=None)
However, the same method does not work for both Inception v3 and Xception.
The error I get is:
model = keras.applications.inception_v3.InceptionV3(input_shape=(64, 256, 2), weights=None, include_top=False)
Input size must be at least 75x75; got `input_shape=(64, 256, 2)`
Any ideas on how to go over this?
Thank you!
There is a minimum dimension for width/height for most of the convolutional neural network.
https://github.com/fchollet/deep-learning-models/blob/master/inception_v3.py
There are many pooling layer in the network which reduces the feature map dimension by a factor, if your input is too small the network can pass your input to the end without reaching 0 height/width for feature map. So, you must use the specified minimum dimension for the network, in this case 75by75.
Related
I am using pretrained models to classify image. My question is what kind of layers do I have to add after using the pretrained model structure in my model, resp. why these two implementations differ. To be specific:
Consider two examples, one using the cats and dogs dataset:
One implementation can be found here. The crucial point is that the base model:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
is frozen and a GlobalAveragePooling2D() is added, before a final tf.keras.layers.Dense(1) is added. So the model structure looks like:
model = tf.keras.Sequential([
base_model,
global_average_layer,
prediction_layer
])
which is equivalent to:
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D()
tf.keras.layers.Dense(1)
])
So they added not only a final dense(1) layer, but also a GlobalAveragePooling2D() layer before.
The other using the tf flowers dataset:
In this implementation it is different. A GlobalAveragePooling2D() is not added.
feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2"
feature_extractor_layer = hub.KerasLayer(feature_extractor_url,
input_shape=(224,224,3))
feature_extractor_layer.trainable = False
model = tf.keras.Sequential([
feature_extractor_layer,
layers.Dense(image_data.num_classes)
])
Where image_data.num_classes is 5 representing the different flower classification. So in this example a GlobalAveragePooling2D() layer is not added.
I do not understand this. Why is this different? When to add a GlobalAveragePooling2D() or not? And what is better / should I do?
I am not sure if the reason is that in one case the dataset cats and dogs is binary classification and in the other it is a multiclass classifcation problem. Or the difference is that in one case tf.keras.applications.MobileNetV2 was used to load MobileNetV2 and in the other implementation hub.KerasLayer was used to get the feature_extractor. When I check the model in the first implementation:
I can see that the last layer is a relu activation layer.
When I check the feature_extractor:
model = tf.keras.Sequential([
feature_extractor,
tf.keras.layers.Dense(1)
])
model.summary()
I get the output:
So maybe reason is also that I do not understand the difference between tf.keras.applications.MobileNetV2 vs hub.KerasLayer. The hub.KerasLayer just gives me the feature extractor. I know this, but still I think I did not get the difference between these two methods.
I cannot check the layers of the feature_extractor itself. So feature_extractor.summary() or feature_extractor.layers does not work. How can I inspect the layers here? And how can I know I should add GlobalAveragePooling2D or not?
Summary
Why is this different? When to add a GlobalAveragePooling2D() or not? And what is better / should I do?
The first case it outputs 4 dimensional tensors that are raw outputs of the last convolutional layer. So, you need to flatten them somehow, and in this example you are using GlobalAveragePooling2D (but you could use any other strategy). I can't tell which is better: it depends on your problem, and depending on how hub.KerasLayer version implemented the flatten, they could be exactly the same. That said, I'd just pickup one of them and go on: I don't see huge differences among them,
Long answer: understanding Keras implementation
The difference is in the output of both base models: in your keras examples, outputs are of shape (bz, hh, ww, nf) where bz is batch size, hh and ww are height and weight of the last convolutional layer in the model and nf is the number of filters (or convolutions) applied in this last layer.
So: this outputs the output of the last convolutions (or filters) of the base model.
Hence, you need to convert those outputs (which you can think them as images) to vectors of shape (bz, n_feats), where n_feats is the number of features the base model is computing. Once this conversion is done, you can stack your classification layer (or as many layers as you want) because at this point you have vectors.
How to compute this conversion? Some common alternatives are taking the average or maximum among the convolutional output (which reduces the size), or you could just reshape them as a single row, or add more convolutional layers until you get a vector as an output (I strongly suggest to follow usual practices like average or maximum).
In your first example, when calling tf.keras.applications.MobileNetV2, you are using the default police with respect to this last year, and hence, the last convolutional layer is let "as is": some convolutions. You can modify this behavior with the param pooling, as documented here:
pooling: Optional pooling mode for feature extraction when include_top is False.
None (default) means that the output of the model will be the 4D tensor output of the last convolutional block.
avg means that global average pooling will be applied to the output of the last convolutional block, and thus the output of the model will be a 2D tensor.
max means that global max pooling will be applied.
In summary, in your first example, you are building the base model without telling explicitly what to do with the last layer, the model keeps returning 4 dimensional tensors that you immediately convert to vectors with the usage of average pooling, so you can avoid this explicit average pooling if you tell Keras to do it:
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
pooling='avg', # Tell keras to average last layer
weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
# global_average_layer, -> not needed any more
prediction_layer
])
TFHub implementation
Finally, when you use the TensorFlow Hub implementation, as you picked up the feature_vector version of the model, it already implements some kind of pooling (which I didn't found yet how) to make sure the model outputs vectors rather than 4 dimensional tensors. So, you don't need to add explicitly the layer to convert them because it is already done.
In my opinion, I prefer Keras implementation since it gives you more freedom to pick the strategy you want (in fact you could keep stacking whatever you want).
Lets say there is a model taking [1, 208, 208, 3] images and has 6 pooling layers with kernels [2, 2, 2, 2, 2, 7] which would result in a feature column for image [1, 1, 1, 2048] for 2048 filters in the last conv layer. Note, how the last pooling layer accepts [1, 7, 7, 2048] inputs
If we relax the constrains for the input image (which is typically the case for object deteciton models) than after same set of pooling layers image of size [1, 104, 208, 3] would produce pre-last-pooling output of [1, 4, 7, 2024] and [1, 256, 408, 3] would yeild [1, 8, 13, 2048]. This maps would have about the same amount information as original [1, 7, 7, 2048] but the original pooling layer would not produce a feature column wiht [1, 1, 1, N]. That is why we switch to global pooling layer.
In short, global pooling layer is important if we don't have strict restriction on the input image size (and don't resize the image as the first op in the model).
I think difference in output of models
"https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" has output is 1d vector * batch_size, you just can't apply Conv2D to it.
Output of tf.keras.applications.MobileNetV2 probably more complex, thus you have more capability to transform one.
I am trying to understand each layer of Keras while implementing CNN.
In Conv2D layer i understand that it creates different convolution layer depending on various feature map values.
Now, My question is that
Can i see different feature map matrix that are applied on input image to get the convolution layer
Can i see the value of matrix that is generated after completion of Conv2D step.
Thanks in advance
You can get the output of a certain convolutional layer in this way:
import keras.backend as K
func = K.function([model.get_layer('input').input], model.get_layer('conv').output)
conv_output = func([numpy_input]) # numpy array
where 'input' and 'conv' denote the names of your input layer and convolutional layer. And you can get the weights of a certain layer like this:
conv_weights = model.get_layer('conv').get_weights() # numpy array
I'm running a classification and predition neural network algorithme using pre-trained model with keras.
Now I know the shape of the input for keras is (224,224,3) but my input has this shape (180, 200, 20) and I get the following error:
ValueError: Dimension 0 in both shapes must be equal, but are 3 and 64. Shapes are [3,3,20,64] and [64,3,3,3]. for 'Assign_32' (op: 'Assign') with input shapes: [3,3,20,64], [64,3,3,3].
and here is the code:
from keras import applications
from keras.layers import Input
input_tensor = Input(shape = (180, 200, 20))
vgg_model = applications.VGG16(weights = 'imagenet', include_top = False, input_tensor = input_tensor)
vgg_model.summary()
Any idea how to get around this? Thank you
From Documentation:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
You can try to create a vgg16 from scratch from this link. VGG16 model for Keras
You need to resize your input image
from keras.preprocessing import image
img = image.load_img("image1.jpeg",target_size=(224,224))
If you want to learn to do transfer learning from scratch in keras you can read this article. This article has step by step implementation.
https://medium.com/#1297rohit/transfer-learning-from-scratch-using-keras-339834b153b9
In your case, since you are not dealing with images of the right size (or number of channels) you may want to cut out large parts of the vgg network to still save the information contained in the middle layers, but I am not sure how efficient it would be.
You would need to remove the first convolution layer, and all the dense layers at the end, replacing them with your own layers. You would certainly need to retrain the whole network, so rather than transfer learning you would be doing very smart initialization.
I am having an image dataset each image is of dimensions=(2048,1536).In ImageDataGenerator to fetch data from the directory, I have used the same target size i.e (2048,1536) but while making Sequential model first layer, what input shape should I have to use?? Will it be same as (2048,1536) or I can take any random shape like (224,224).
You should probably flatten your input data by making a vector of size 3145728 (2048 * 1536). If your data is in a numpy array you can use np.flatten() (numpy flatten).
Then your first layer can have the same shape as this vector.
I would resize first the images with cv2.resize(). Do you really need all the information from such a big image?
For a sequential Model it follows for example:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (height,width, ndim)))
...,
where height and width denote your input image dimensions and ndim = 1 for greyscale and ndim = 3 for colored images.
The first(i.e. input)layer is supposed to be the number of features in your dataset. Regarding images, each pixel is considered as a feature. Hence in your case, the image dimension is (2048,1536) you need to flatten it out to get the total number of the pixel(i.e. features). If it is greyscaled image it would be (2048*1536*1) else if it is colour it would be(2048*1536*3).
Also, you use below code from TensorFlow/Keras API while Sequential model creation and it will take care of your input layer size
tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,activation=tf.nn.relu) #1st hidden layer
tf.keras.layers.Dense(128,activation=tf.nn.relu) #2nd hidden layer
tf.keras.layers.Dense(2,activation=tf.nn.softmax)])#output layer
This question exists as a github issue , too.
I would like to build a neural network in Keras which contains both 2D convolutions and an LSTM layer.
The network should classify MNIST.
The training data in MNIST are 60000 grey-scale images of handwritten digits from 0 to 9. Each image is 28x28 pixels.
I've splitted the images into four parts (left/right, up/down) and rearranged them in four orders to get sequences for the LSTM.
| | |1 | 2|
|image| -> ------- -> 4 sequences: |1|2|3|4|, |4|3|2|1|, |1|3|2|4|, |4|2|3|1|
| | |3 | 4|
One of the small sub-images has the dimension 14 x 14. The four sequences are stacked together along the width (shouldn't matter whether width or height).
This creates a vector with the shape [60000, 4, 1, 56, 14] where:
60000 is the number of samples
4 is the number of elements in a sequence (# of timesteps)
1 is the depth of colors (greyscale)
56 and 14 are width and height
Now this should be given to a Keras model.
The problem is to change the input dimensions between the CNN and the LSTM.
I searched online and found this question: Python keras how to change the size of input after convolution layer into lstm layer
The solution seems to be a Reshape layer which flattens the image but retains the timesteps (as opposed to a Flatten layer which would collapse everything but the batch_size).
Here's my code so far:
nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64
model=Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode="valid", input_shape=[1,56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))
This code creates an error message:
ValueError: total size of new array must be unchanged
Apparently the input to the Reshape layer is incorrect. As an alternative, I tried to pass the timesteps to the Reshape layer, too:
model.add(Reshape((4,56*14)))
This doesn't feel right and in any case, the error stays the same.
Am I doing this the right way ?
Is a Reshape layer the proper tool to connect CNN and LSTM ?
There are rather complex approaches to this problem.
Such as this:
https://github.com/fchollet/keras/pull/1456
A TimeDistributed Layer which seems to hide the timestep dimension from following layers.
Or this: https://github.com/anayebi/keras-extra
A set of special layers for combining CNNs and LSTMs.
Why are there so complicated (at least they seem complicated to me) solutions, if a simple Reshape does the trick ?
UPDATE:
Embarrassingly, I forgot that the dimensions will be changed by the pooling and (for lack of padding) the convolutions, too.
kgrm advised me to use model.summary() to check the dimensions.
The output of the layer before the Reshape layer is (None, 32, 26, 5),
I changed the reshape to: model.add(Reshape((32*26*5,))).
Now the ValueError is gone, instead the LSTM complains:
Exception: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2
It seems like I need to pass the timestep dimension through the entire network. How can I do that ? If I add it to the input_shape of the Convolution, it complains, too: Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[4, 1, 56,14])
Exception: Input 0 is incompatible with layer convolution2d_44: expected ndim=4, found ndim=5
According to Convolution2D definition your input must be 4-dimensional with dimensions (samples, channels, rows, cols). This is the direct reason why are you getting an error.
To resolve that you must use TimeDistributed wrapper. This allows you to use static (not recurrent) layers across the time.