I have some hard time to get the dimensions of a LSTM network right.
So I have the following data:
train_data.shape
(25391, 3) # to be read as 25391 timesteps and 3 features
train_labels.shape
(25391, 1) # to be read as 25391 timesteps and 1 feature
So I have thought my input dimension is (1, len(train_data), train_data.shape[1]) as I plan to submit 1 batch. But I get the following error:
Error when checking target: expected lstm_10 to have 2 dimensions, but got array with shape (1, 25391, 1)
Here is the model code:
model = Sequential()
model.add(LSTM(1, # predict one feature and one timestep
batch_input_shape=(1, len(train_data), train_data.shape[1]),
activation='tanh',
return_sequences=False))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
# as 1 sample with len(train_data) time steps and train_data.shape[1] features.
model.fit(x=train_data.values.reshape(1, len(train_data), train_data.shape[1]),
y=train_labels.values.reshape(1, len(train_labels), train_labels.shape[1]),
epochs=1,
verbose=1,
validation_split=0.8,
validation_data=None,
shuffle=False)
How should the input dimensions look like?
The problem is in the target (i.e. labels) shape you provide (i.e. Error when checking target). The output of LSTM layer in your model, which is also the output of the model, has a shape of (None, 1) since you are specifying to only the final output to be returned (i.e. return_sequences=False). In order to have the output of each timestep you need to set return_sequences=True. This way the output shape of LSTM layer would be (None, num_timesteps, num_units) which is consistent with the shape of labels array you provide.
Related
I have quite a bit of trouble understanding the expected shape of the input/output for an LSTM problem.
Specifically for this example I have 386 of length 100 each containing 14 features. For each such sequence, I need only predict whether it is in the 0 or 1 class. The respective shapes and model are
X_test.shape,y_test.shape
((358, 100, 14), (358, 1))
model = Sequential()
model.add(LSTM(64,return_sequences=True,input_shape=(None,14)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy' , metrics=['accuracy'])
Now if (after fitting) I want to predict the output of the model, the shape of the prediction is inconsistent with y_test!
y_pred = model.predict_classes(X_test)
y_pred.shape
(358, 100, 1)
Here I'd expect the shape to match y_test, and be (358,1) instead of the output given by predict_classes()
I am clearly misunderstanding something here. What am I missing here? Is there a different way to tackle this problem altogether?
You're returning the 3rd dim of the LSTM return_sequences=True, where the input to the last sigmoid layer will be 3D. Thus, the sigmoid layer will be applied on the last dim.
Just do the following:
model.add(LSTM(64,return_sequences=False,input_shape=(None,14)))
I'm trying to implement the sliding windows approach and use DNN for the forecasting part. The window length = 24
What I did:
I have x (input) and y (output) in the data set. I kept the "y" value as it is (single array). And on the x-value:
def generate_input(data, sequence_length=1):
x_data = []
for i in range(len(data)-sequence_length+1):
a = data[i:(i+sequence_length)]
x_data.append(a)
return np.array (x_data)
sequence_length = 24
x_train = generate_input(train, sequence_length)
#Shape of X train: (201389, 24)
#Shape of y train: (201412,)
model = Sequential()
model.add(Dense(30,input_shape= (x_train.shape[1],)))
model.add(Dense(20))
model.add(Dropout(0.2))
model.compile(loss="mse", optimizer='rmsprop')
model.summary()
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
validation_split=0.1)
The error message I'm receiving:
Error when checking target: expected dropout_5 to have shape (20,) but got
array with shape (1,)
One more question, how can I use the same approach for multivariate time series? I want to use sequences as input to predict y.
I changed the slicing part to:
x_data.append(data[i:i+sequence_length])
But I received an error:
cannot copy sequence with size 24 to array axis with dimension 4
model.summary() should show you that the output layer in your model is the Dropout layer with a shape of (None, 20). That is probably not what you want. It seems that you are trying to predict a single value. Thus you need to add a Dense(1) layer after. It is also highly unusual to have dropout as an output layer.
Also, x_train and y_train should have the same shape[0].
I've got a 2D numpy matrix (from a DataFrame) of already condensed word vectors (I used a max pooling technique, am trying to compare a logres to a bi-LSTM approach), and I'm not sure how to prepare it to use it in a keras model.
I'm aware of the need of a 3D tensor for the Bi-LSTM model, and have tried googling solutions, but couldn't find a solution that worked.
This is what I have right now:
# Set model parameters
epochs = 4
batch_size = 32
input_shape = (1, 10235, 3072)
# Create the model
model = Sequential()
model.add(Bidirectional(LSTM(64, return_sequences = True, input_shape = input_shape)))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
# Try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics = ['accuracy'])
# Fit the training set over the model and correct on the validation set
model.fit(inputs['X_train'], inputs['y_train'],
batch_size = batch_size,
epochs = epochs,
validation_data = [inputs['X_validation'], inputs['y_validation']])
# Get score over the test set
return model.evaluate(inputs['X_test'], inputs['y_test'])
I currently got the following error:
ValueError: Input 0 is incompatible with layer bidirectional_23: expected ndim=3, found ndim=2
The shape of my training data (inputs['X_train']) is (10235, 3072).
Thanks so much!
I've made it work with the suggestion of the reply by doing the following:
Remove return_sequence = True;
Apply the following transformations to the X sets: np.reshape(inputs[dataset], (inputs[dataset].shape[0], inputs[dataset].shape[1], 1))
Change the input shape of the LSTM layer to (10235, 3072, 1) which is the shape of X_train.
In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.
Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
I slightly misunderstand how to create a simple Sequence for my data.
The data has the following dimensions:
X_train.shape
(2369, 12)
y_train.shape
(2369,)
X_test.shape
(592, 12)
y_test.shape
(592,)
This is how I create the model:
batch_size = 128
nb_epoch = 20
in_out_neurons = X_train.shape[1]
dimof_middle = 100
model = Sequential()
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(batch_size))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation('linear'))
# I am solving the regression problem, not the classification one
model.compile(loss="mean_squared_error", optimizer="rmsprop")
history = model.fit(X_train, y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, y_test))
The error message:
Exception: Error when checking model input: expected dense_input_14 to
have shape (None, 1) but got array with shape (2369, 12)รง
The error is:
Error when checking model target: expected activation_42 to have shape
(None, 12) but got array with shape (2369, 1)
This error occurs at line:
model.add(Dense(in_out_neurons))
How to change Dense to make it work?
Another question is how to add a simple autoencoder in order to initialize weights of ANN?
One of your problems is that you seem to misunderstand what a batch is.
A batch is the number of training samples computed at a time, so instead of computing one training sample from X_train at a time you use, for example, 100 at a time. The important bit here is that this has nothing to do with your model.
So when you write
model.add(Dense(batch_size, batch_input_shape=(None, in_out_neurons)))
then you create a fully connected layer with an output size of one batch. That does not make a lot of sense.
Another problem is that your model's output is 12 neurons while your Y is only one value/neuron. Your model looks like this:
|
v
[128]
[128]
[ 12]
|
v
Then what fit() does is, it inputs a matrix of shape (128, 12) ((batch size, X_train.shape[1])) into the model and attempts to compare the output of shape (128,12) from the last layer to the corresponding Y values of the batch (shape (128,1)).