I'm trying to build a RNN for text generation. I'm stuck at building my LSTM cell. The data is shaped like this- X is the input sparse matrix of dim(90809,2700) and Y is the output matrix of dimension(90809,27). The following is my code for defining the LSTM Cell-
model = Sequential()
model.add(LSTM(128, input_shape=(X.shape[0], X.shape[1])))
model.add(Dropout(0.2))
model.add(Dense(Y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
My understanding is that the input_shape should be the dimension of the input matrix, and the dense layer should be the size of the output for each observation, i.e 27 in this case. However, I get the following error-
Exception: Error when checking model input: expected lstm_input_3 to have 3 dimensions, but got array with shape (90809, 2700)
I'm not able to figure out what is going wrong. Can anyone please help me figure out why is the lstm_input expecting 3 dimensions?
I tried the following as well-
X= np.reshape(np.asarray(dataX), (n_patterns, n_vocab*seq_length,1))
Y=np.reshape(np.asarray(dataY), (n_patterns, n_vocab,1))
This gave me the following error-
Exception: Error when checking model input: expected lstm_input_7 to have shape (None, 90809, 2700) but got array with shape (90809, 2700, 1)
Any help will be appreciated. Thanks!
You should read about the difference between input_shape, batch_input_shape and input_dim here.
For input_shape, we don't need to define the batch_size. This is how your LSTM layer should look like.
model.add(LSTM(128, input_shape=(X.shape[1], 1)))
or
model.add(LSTM(128, batch_input_shape=(X.shape[0], X.shape[1], 1)))
Related
My Y_train is a one-hot encoded label matrix.
The shape of my Y_train is (10, 1000, 3) because I have three different categories.
My model is defined as:
model = Sequential()
model.add(LSTM(100, input_shape=(1000, 38), return_sequences=False))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
When I train my model, I get the following error:
Error when checking target: expected dense_83 to have 2 dimensions,
but got array with shape (8, 1000, 3)
This occurs because my Y_train is a 3d matrix, instead of a 2d matrix. The only way I've been able to solve this is by setting return_sequences=True but not sure if that will affect my LSTM's output.
Is this the correct way to deal with categorical labels? By setting return_sequences=True as a parameter of LSTM?
In other words, is it okay to return_sequences before a Softmax layer?
Thank you!
I have a working code using GRU creating the input manually as a 3D array (None,10,64). The code is:
model = Sequential()
model.add(GRU(300, return_sequences=False, input_shape=(None, 64)))
model.add(Dropout(0.8))
model.add(Dense(64, input_dim=300))
model.add(Activation("linear"))
This returns the predicted embedding given the input window. Now I want to use the keras embedding layer on top of GRU. My idea is to input a 2D array (None, 10) and use the embedding layer to convert each sample to the corresponding embedding vector.
So now I have this:
model = Sequential()
model.add(Embedding(vocab_size, 64, weights=[embedding_matrix], input_length=10, trainable=False))
model.add(GRU(300, return_sequences=False))
model.add(Dropout(0.8))
model.add(Dense(64))
model.add(Activation("linear"))
I see from the summary that the output of the embedding layer is:
embedding_2 (Embedding) (None, 10, 64)
which is what I expected. But when I try to fit the model I get this error:
expected activation_2 to have shape (64,) but got array with shape (1,)
If I comment the other layers and leave only the embedding and gru I get:
expected gru_5 to have shape (300,) but got array with shape (1,)
So my question is what is the difference between fitting a manually constructed 3D array and an embedding layer generated one?.
Your model reflects the desired computation; however, the error is Y you are passing to the model. You are passing a scalar target instead of an array of size (64,). To clarify your inputs should be sequence of integers, but your targets still need to be vectors.
Also, Dense by default has linear activation, so you don't need the Activation('linear') after Dense(64).
In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.
Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
I am trying to predict using the learned .h5 file.
The learning model is as follows.
model =Sequential()
model.add(Dense(12, input_dim=3, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
And I wrote the form of the input as follows.
x = np.array([[band1_input[input_cols_loop][input_rows_loop]],[band2_input[input_cols_loop][input_rows_loop]],[band3_input[input_cols_loop][input_rows_loop]]])
prediction_prob = model.predict(x)
I thought the shape was correct, but the following error occurred.
ValueError: Error when checking : expected dense_1_input to have shape (3,) but got array with shape (1,)
The shape of x is obviously (3,1), but the above error doesn't disappear (the data is from a csv file in the form of (value 1, value 2, value 3, class)).
How can I solve this problem?
The shape of x is obviously (3,1), but the above error continues.
You are right, but that's not what keras expects. It expects (1, 3) shape: by convention, axis 0 denotes the batch size and axis 1 denotes the features. The first Dense layer accepts 3 features, that's why it complains when it sees just one.
The solution is simply to transpose x.
I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/
from keras.models import Sequential
from keras.layers import LSTM
The following three lines of code (plus comment) are taken directly from the documentation link above:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))
# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2
I get that error above after executing the second model.add() statement, but before exposing the model to my data, or even compiling it.
What am I doing wrong here? I'm using Keras 1.2.1.
Edit
Just upgraded to current 1.2.2, still having same issue.
Thanks to patyork for answering this on Github:
the second LSTM layer is not getting a 3D input that it expects (with a shape of (batch_size, timesteps, features). This is because the first LSTM layer has (by fortune of default values) return_sequences=False, meaning it only output the last feature set at time t-1 which is of shape (batch_size, 32), or 2 dimensions that doesn't include time.
So to offer a code example of how to use a stacked LSTM to achieve many-to-one (return_sequences=False) sequence classification, just make sure to use return_sequences=True on the intermediate layers like this:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(24, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
(no errors)