Keras - specify output shape of LSTM layer - python

I have sequential model in Keras where input and output data have shape (10000, 10, 300) each:
model = Sequential()
model.add(LSTM(input_shape=(10,300), units=300, return_sequences=False, activation="sigmoid", kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal"))
model.add(LSTM(return_sequences=True, units=300, activation="sigmoid", kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal"))
model.compile(loss="cosine_proximity", optimizer="adam", metrics=["accuracy"])
I want first layer to return only last output (result of every cell in layer together (if I understand correctly) because I need context of whole sequence) and feed it as input to second layer (that's why I use return_sequences=False in first layer). Second layer should take it and output full sequence of shape (10,300).
I recieve this error:
ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2
How can I specify correct output shape of the first layer for the second layer to accept it?
Or should I do it all differently?
If you need any further informations I will provide them.
Thank you for any reply

Related

ValueError: Input 0 of layer global_average_pooling2d is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [None, 128]

I load the saved model and for finetuning reason I add classification layers to the output of loaded model, So this what I write :
def create_keras_model():
model = tf.keras.models.load_model('model.h5', compile=False)
resnet_output = model.output
layer1 = tf.keras.layers.GlobalAveragePooling2D()(resnet_output)
layer2 = tf.keras.layers.Dense(units=256, use_bias=False, name='nonlinear')(layer1)
model_output = tf.keras.layers.Dense(units=2, use_bias=False, name='output', activation='relu')(layer2)
model = tf.keras.Model(model.input, model_output)
return model
but I find this error:
ValueError: Input 0 of layer global_average_pooling2d is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [None, 128]
Can anyone please help me and tell me from what this error and how can I resolve this problem.
Thanks!
Could have answered better if you would have shared model.h5 architecture or the last layer of the model.h5.
In your case the input dimension is 2 where as tf.keras.layers.GlobalAveragePooling2D() expects input dimension of 4.
As per tf.keras.layers.GlobalAveragePooling2D documentation, the tf.keras.layers.GlobalAveragePooling2D layer expects below input shape -
Input shape: If data_format='channels_last': 4D tensor with shape
(batch_size, rows, cols, channels). If data_format='channels_first':
4D tensor with shape (batch_size, channels, rows, cols).
In this tensorflow tutorial, you will learn how to classify images of cats and dogs by using transfer learning from a pre-trained network along with fine-tuning.

How to get only last output of sequence model in Keras?

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

Using embedding layer with GRU

I have a working code using GRU creating the input manually as a 3D array (None,10,64). The code is:
model = Sequential()
model.add(GRU(300, return_sequences=False, input_shape=(None, 64)))
model.add(Dropout(0.8))
model.add(Dense(64, input_dim=300))
model.add(Activation("linear"))
This returns the predicted embedding given the input window. Now I want to use the keras embedding layer on top of GRU. My idea is to input a 2D array (None, 10) and use the embedding layer to convert each sample to the corresponding embedding vector.
So now I have this:
model = Sequential()
model.add(Embedding(vocab_size, 64, weights=[embedding_matrix], input_length=10, trainable=False))
model.add(GRU(300, return_sequences=False))
model.add(Dropout(0.8))
model.add(Dense(64))
model.add(Activation("linear"))
I see from the summary that the output of the embedding layer is:
embedding_2 (Embedding) (None, 10, 64)
which is what I expected. But when I try to fit the model I get this error:
expected activation_2 to have shape (64,) but got array with shape (1,)
If I comment the other layers and leave only the embedding and gru I get:
expected gru_5 to have shape (300,) but got array with shape (1,)
So my question is what is the difference between fitting a manually constructed 3D array and an embedding layer generated one?.
Your model reflects the desired computation; however, the error is Y you are passing to the model. You are passing a scalar target instead of an array of size (64,). To clarify your inputs should be sequence of integers, but your targets still need to be vectors.
Also, Dense by default has linear activation, so you don't need the Activation('linear') after Dense(64).

Keras confusion about number of layers

I'm a bit confused about the number of layers that are used in Keras models. The documentation is rather opaque on the matter.
According to Jason Brownlee the first layer technically consists of two layers, the input layer, specified by input_dim and a hidden layer. See the first questions on his blog.
In all of the Keras documentation the first layer is generally specified as
model.add(Dense(number_of_neurons, input_dim=number_of_cols_in_input, activtion=some_activation_function)).
The most basic model we could make would therefore be:
model = Sequential()
model.add(Dense(1, input_dim = 100, activation = None))
Does this model consist of a single layer, where 100 dimensional input is passed through a single input neuron, or does it consist of two layers, first a 100 dimensional input layer and second a 1 dimensional hidden layer?
Further, if I were to specify a model like this, how many layers does it have?
model = Sequential()
model.add(Dense(32, input_dim = 100, activation = 'sigmoid'))
model.add(Dense(1)))
Is this a model with 1 input layer, 1 hidden layer, and 1 output layer or is this a model with 1 input layer and 1 output layer?
Your first one consists of a 100 neurons input layer connected to one single output neuron
Your second one consists of a 100 neurons input layer, one hidden layer of 32 neurons and one output layer of one single neuron.
You have to think of your first layer as your input layer (with the same number of neurons as the dimenson, so 100 for you) connected to another layer with as many neuron as you specify (1 in your first case, 32 in the second one)
In Keras what is useful is the command
model.summary()
For your first question, the model is :
1 input layer and 1 output layer.
For the second question :
1 input layer
1 hidden layer
1 activation layer (The sigmoid one)
1 output layer
For the input layer, this is abstracted by Keras with the input_dim arg or input_shape, but you can find this layer in :
from keras.layers import Input
Same for the activation layer.
from keras.layers import Activation
# Create a `Sequential` model and add a Dense layer as the first layer.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
# Now the model will take as input arrays of shape (None, 16)
# and output arrays of shape (None, 32).
# Note that after the first layer, you don't need to specify
# the size of the input anymore:
model.add(tf.keras.layers.Dense(32))
model.output_shape
(None, 32)
model.layers
[<keras.layers.core.dense.Dense at 0x7f494062e950>,
<keras.layers.core.dense.Dense at 0x7f4944048d90>]
model.summary()
Output
it may help you understand clearly

Input Shape Error in Second-layer (but not first) of Keras LSTM

I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/
from keras.models import Sequential
from keras.layers import LSTM
The following three lines of code (plus comment) are taken directly from the documentation link above:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))
# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2
I get that error above after executing the second model.add() statement, but before exposing the model to my data, or even compiling it.
What am I doing wrong here? I'm using Keras 1.2.1.
Edit
Just upgraded to current 1.2.2, still having same issue.
Thanks to patyork for answering this on Github:
the second LSTM layer is not getting a 3D input that it expects (with a shape of (batch_size, timesteps, features). This is because the first LSTM layer has (by fortune of default values) return_sequences=False, meaning it only output the last feature set at time t-1 which is of shape (batch_size, 32), or 2 dimensions that doesn't include time.
So to offer a code example of how to use a stacked LSTM to achieve many-to-one (return_sequences=False) sequence classification, just make sure to use return_sequences=True on the intermediate layers like this:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(24, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
(no errors)

Categories

Resources