I have a model which works with Conv2D using Keras but I would like to add a LSTM layer. This is the data I am using:
x_train with shape (13984, 334, 35, 1)
y_train with shape (13984, 5)
My model without LSTM is:
inputs = Input(name='input',shape=(334,35,1))
layer = Conv2D(64, kernel_size=3,activation='relu',data_format='channels_last')(inputs)
layer = Flatten()(layer)
predictions = Dense(5, activation='softmax')(layer)
network = Model(inputs=inputs, outputs=predictions)
network.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
What is the correct way of adding a LSTM layer just before the Dense layer?
I tried to use TimeDistributed or Reshape/Permute but I always get errors.
It seems like your question is similar to one that i had yesterday. The answer can be found here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
The method explained by the user deKeijzer works. I found another way to solve the problem. It is to use a Reshape layer (reshaping by (334,35)) just after the last Conv2D layer and then add LSTM layers.
Related
I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.
In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.
Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
I'm trying to do a simple binary classification problem using Keras and its pre-built ImageNet CNN architecture.
For VGG16, I took the following approach,
vgg16_model = keras.application.vgg16.VGG16()
'''Rebuild the vgg16 using an empty sequential model'''
model = Sequential()
for layer in vgg16_model.layers:
model.add(layer)
'''Since the problem is binary, I got rid of the output layer and added a more appropriate output layer.'''
model.pop()
'''Freeze other pre-trained weights'''
for layer in model.layers:
layer.trainable = False
'''Add the modified final layer'''
model.add(Dense(2, activation = 'softmax'))
And this worked marvelously with higher accuracy than my custom built CNN. But it took a while to train and I wanted to take a similar approach using Xception and InceptionV3 since they were lighter models with higher accuracy.
xception_model = keras.applicaitons.xception.Xception()
model = Sequential()
for layer in xception_model.layers:
model_xception.add(layer)
When I run the above code, I get the following error:
ValueError: Input 0 is incompatible with layer conv2d_193: expected axis -1 of input shape to have value 64 but got shape (None, None, None, 128)
Basically, I would like to do the same thing as I did with VGG16 model; keep the other pretrained weights as they are and simply modify the output layer to a binary classification output instead of an output layer with 1000 outcomes. I can see that unlike VGG16, which has relatively straightforward convolution layer structure, Xception and InceptionV3 have some funky nodes that I'm not 100% familiar with and I'm assuming those are causing issues.
Your code fails because InceptionV3 and Xception are not Sequential models (i.e., they contain "branches"). So you can't just add the layers into a Sequential container.
Now since the top layers of both InceptionV3 and Xception consist of a GlobalAveragePooling2D layer and the final Dense(1000) layer,
if include_top:
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)
if you want to remove the final dense layer, you can just set include_top=False plus pooling='avg' when creating these models.
base_model = InceptionV3(include_top=False, pooling='avg')
for layer in base_model.layers:
layer.trainable = False
output = Dense(2, activation='softmax')(base_model.output)
model = Model(base_model.input, output)
I am trying to use a Conv1D and Bidirectional LSTM in keras (much like in this question) for signal processing, but doing a multiclass classification of each time step.
The problem is that even though the shapes used by Conv1D and LSTM are somewhat equivalent:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
The output of the Conv1D is = (length - (kernel_size - 1)/strides), and therefore doesn't match the LSTM shape anymore, even without using MaxPooling1D and Dropout.
To be more specific, my training set X has n samples with 1000 time steps and one channel (n_samples, 1000, 1), and I used LabelEncoder and OneHotEncoder so y has n samples, 1000 time steps and 5 one hot encoded classes (n_samples, 1000, 5).
Since one class is much more prevalent than the others (is actually the absence of signal), I am using loss='sparse_categorical_crossentropy', sample_weight_mode="temporal" and sample_weight to give a higher weight to time steps containing meaningful classes.
model = Sequential()
model.add(Conv1D(128, 3, strides=1, input_shape = (1000, 1), activation = 'relu'))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(TimeDistributed(Dense(5, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'], sample_weight_mode="temporal")
print(model.summary())
Model
When I try to fit the model I get this error message:
Error when checking target: expected time_distributed_1 to have shape
(None, 998, 1) but got array with shape (100, 1000, 5).
Is there a way to make such a neural network configuration work?
Your convolution is cutting the tips of the sequence. Use padding='same' in the convolutional layers.
The message, though, seems not to fit your model. Your model clearly has 5 output features (because of Dense(5)), but the massage says it expects 1. Maybe this is happening because of "sparse" crossentropy. You should probably, by the format of your data, use a "categorical_crossentropy".
I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/
from keras.models import Sequential
from keras.layers import LSTM
The following three lines of code (plus comment) are taken directly from the documentation link above:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))
# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2
I get that error above after executing the second model.add() statement, but before exposing the model to my data, or even compiling it.
What am I doing wrong here? I'm using Keras 1.2.1.
Edit
Just upgraded to current 1.2.2, still having same issue.
Thanks to patyork for answering this on Github:
the second LSTM layer is not getting a 3D input that it expects (with a shape of (batch_size, timesteps, features). This is because the first LSTM layer has (by fortune of default values) return_sequences=False, meaning it only output the last feature set at time t-1 which is of shape (batch_size, 32), or 2 dimensions that doesn't include time.
So to offer a code example of how to use a stacked LSTM to achieve many-to-one (return_sequences=False) sequence classification, just make sure to use return_sequences=True on the intermediate layers like this:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(24, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
(no errors)