Keras LSTM different input output shape - python

In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.

Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))

Related

TF 2.0 sequential CNN into LSTM for regression "Negative dimension size" error

I'm trying to build a model that predict the price of a certain commodity based on current market conditions, my data are shaped similar to
num_samples = 100
sample_dimension = 10
XXX = np.random.random((num_samples,sample_dimension)).reshape(-1,1,sample_dimension)
YYY = np.random.random(num_samples).reshape(-1,1)
so I've got 100 ordered samples of X data, each consisting of 10 variables. My model looks like the following
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = (2),
activation='sigmoid',
input_shape=(None, sample_dimension),
batch_input_shape = [1,1,sample_dimension]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
model.add(tf.keras.layers.Reshape((1, sample_dimension)))
model.add(tf.keras.layers.LSTM(100,
stateful = True,
return_sequences=False,
activation='sigmoid'))
model.add(keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
so it's a 1D convolution, a pooling, a reshape (so it plays nice with the lstm) and then casting down to a prediction
but when I try to run it, I get the following error
Negative dimension size caused by subtracting 2 from 1 for 'conv1d/conv1d' (op: 'Conv2D') with input shapes: [1,1,1,10], [1,2,10,4].
I've tried a few different values for the kernel size, pool size, and batch_input_shape (have to batch my inputs because my actual data are spread across several large files, so I want to read one at a time and kick it into training the model), but nothing seems to work.
What am I doing wrong? How can I track/predict the shape of my data as it goes through this model? What are the data/variables supposed to look like?
I ended up looking through tutorials for conv2D, and then converting stuff to conv1D (please edit as you feel appropriate)
conv2D solution
model = keras.Sequential()
model.add(tf.keras.layers.Conv2D(4,
kernel_size = (**1**,2),
activation = 'sigmoid',
input_shape = (**1**,sample_dimension,1),
batch_input_shape = [None,**1**,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1,2)))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
Then I converted it to conv1D by taking out a dimension from each of the necessary arguments (the bold 1s)
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = 2,
activation = 'sigmoid',
input_shape = (sample_dimension,1),
batch_input_shape = [None,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
i guess the key takeaway is that tensorflow isn't designed to deal with vectors or even matrices, so the last dimension has to be the dimension of the tensor- in this case, it's a 1D tensor (just a number) being held in a sample_dimension

How to get a 2D shape ready for a Bi-LSTM in Keras

I've got a 2D numpy matrix (from a DataFrame) of already condensed word vectors (I used a max pooling technique, am trying to compare a logres to a bi-LSTM approach), and I'm not sure how to prepare it to use it in a keras model.
I'm aware of the need of a 3D tensor for the Bi-LSTM model, and have tried googling solutions, but couldn't find a solution that worked.
This is what I have right now:
# Set model parameters
epochs = 4
batch_size = 32
input_shape = (1, 10235, 3072)
# Create the model
model = Sequential()
model.add(Bidirectional(LSTM(64, return_sequences = True, input_shape = input_shape)))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
# Try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics = ['accuracy'])
# Fit the training set over the model and correct on the validation set
model.fit(inputs['X_train'], inputs['y_train'],
batch_size = batch_size,
epochs = epochs,
validation_data = [inputs['X_validation'], inputs['y_validation']])
# Get score over the test set
return model.evaluate(inputs['X_test'], inputs['y_test'])
I currently got the following error:
ValueError: Input 0 is incompatible with layer bidirectional_23: expected ndim=3, found ndim=2
The shape of my training data (inputs['X_train']) is (10235, 3072).
Thanks so much!
I've made it work with the suggestion of the reply by doing the following:
Remove return_sequence = True;
Apply the following transformations to the X sets: np.reshape(inputs[dataset], (inputs[dataset].shape[0], inputs[dataset].shape[1], 1))
Change the input shape of the LSTM layer to (10235, 3072, 1) which is the shape of X_train.

How to get only last output of sequence model in Keras?

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

Getting dimensions right for a single layer keras LSTM

I have some hard time to get the dimensions of a LSTM network right.
So I have the following data:
train_data.shape
(25391, 3) # to be read as 25391 timesteps and 3 features
train_labels.shape
(25391, 1) # to be read as 25391 timesteps and 1 feature
So I have thought my input dimension is (1, len(train_data), train_data.shape[1]) as I plan to submit 1 batch. But I get the following error:
Error when checking target: expected lstm_10 to have 2 dimensions, but got array with shape (1, 25391, 1)
Here is the model code:
model = Sequential()
model.add(LSTM(1, # predict one feature and one timestep
batch_input_shape=(1, len(train_data), train_data.shape[1]),
activation='tanh',
return_sequences=False))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
# as 1 sample with len(train_data) time steps and train_data.shape[1] features.
model.fit(x=train_data.values.reshape(1, len(train_data), train_data.shape[1]),
y=train_labels.values.reshape(1, len(train_labels), train_labels.shape[1]),
epochs=1,
verbose=1,
validation_split=0.8,
validation_data=None,
shuffle=False)
How should the input dimensions look like?
The problem is in the target (i.e. labels) shape you provide (i.e. Error when checking target). The output of LSTM layer in your model, which is also the output of the model, has a shape of (None, 1) since you are specifying to only the final output to be returned (i.e. return_sequences=False). In order to have the output of each timestep you need to set return_sequences=True. This way the output shape of LSTM layer would be (None, num_timesteps, num_units) which is consistent with the shape of labels array you provide.

Activation function error in a 1D CNN in Keras

I'm creating a model to classify if the input waverform contains rising edge of SDA of I2C line.
My input has 20000 datapoints and 100 training data.
I've initially found an answer regarding the input in here Keras 1D CNN: How to specify dimension correctly?
However, I'm getting an error in the activation function:
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (100, 1)
My model is:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
adam = Adam(lr=learning_rate)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_label,
nb_epoch=10,
batch_size=batch_size, shuffle=True)
score = np.asarray(model.evaluate(test_new_data, test_label, batch_size=batch_size))*100.0
I can't determine the problem in here. On why the activation function expects a 3D tensor.
The problem lies in the fact that starting from keras 2.0, a Dense layer applied to a sequence will apply the layer to each time step - so given a sequence it will produce a sequence. So your Dense is actually producing a sequence of 1-element vectors and this causes your problem (as your target is not a sequence).
There are several ways on how to reduce a sequence to a vector and then apply a Dense to it:
GlobalPooling:
You may use GlobalPooling layers like GlobalAveragePooling1D or GlobalMaxPooling1D, eg.:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(GlobalMaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Flattening:
You might colapse the whole sequence to a single vector using Flatten layer:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation("sigmoid"))
RNN Postprocessing:
You could also add a recurrent layer on a top of your sequence and make it to return only the last output:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(SimpleRNN(10, return_sequences=False))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Conv1D has its output with 3 dimensions (and it will keep like that until the Dense layer).
Conv output: (BatchSize, Length, Filters)
For the Dense layer to output only one result, you need to add a Flatten() or Reshape((shape)) layer, to make it (BatchSize, Lenght) only.
If you call model.summary(), you will see exactly what shape each layer is outputting. You have to adjust the output to be exactly the same shape as the array you pass as the correct results. The None that appears in those shapes is the batch size and may be ignored.
About your model: I think you need more convolution layers, reducing the number of filters gradually, because condensing so much data in a single Dense layer does not usually bring good results.
About dimensions: keras layers toturial and samples

Categories

Resources