Using embedding layer with GRU - python

I have a working code using GRU creating the input manually as a 3D array (None,10,64). The code is:
model = Sequential()
model.add(GRU(300, return_sequences=False, input_shape=(None, 64)))
model.add(Dropout(0.8))
model.add(Dense(64, input_dim=300))
model.add(Activation("linear"))
This returns the predicted embedding given the input window. Now I want to use the keras embedding layer on top of GRU. My idea is to input a 2D array (None, 10) and use the embedding layer to convert each sample to the corresponding embedding vector.
So now I have this:
model = Sequential()
model.add(Embedding(vocab_size, 64, weights=[embedding_matrix], input_length=10, trainable=False))
model.add(GRU(300, return_sequences=False))
model.add(Dropout(0.8))
model.add(Dense(64))
model.add(Activation("linear"))
I see from the summary that the output of the embedding layer is:
embedding_2 (Embedding) (None, 10, 64)
which is what I expected. But when I try to fit the model I get this error:
expected activation_2 to have shape (64,) but got array with shape (1,)
If I comment the other layers and leave only the embedding and gru I get:
expected gru_5 to have shape (300,) but got array with shape (1,)
So my question is what is the difference between fitting a manually constructed 3D array and an embedding layer generated one?.

Your model reflects the desired computation; however, the error is Y you are passing to the model. You are passing a scalar target instead of an array of size (64,). To clarify your inputs should be sequence of integers, but your targets still need to be vectors.
Also, Dense by default has linear activation, so you don't need the Activation('linear') after Dense(64).

Related

In Keras, how to get 3D input and 3D output for LSTM layers

In my original setting, I got
X1 = (1200,40,1)
y1 = (1200,10)
Then, I work perfectly with my codes:
model = Sequential()
model.add(LSTM(12, input_shape=(40, 1), return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add((Dense(10)))
Now, I further got another time series data same sizes as X1 and y1. i.e.,
X2 = (1200,40,1)
y2 = (1200,10)
Now, I stack X1, X2 and y1, y2 as 3D arrays:
X_stack = (1200,40,2)
y_stack = (1200,10,2)
Then, I try to modify my keras code like:
model = Sequential()
model.add(LSTM(12, input_shape=(40, 2), return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add((Dense((10,2))))
I want my code work directly with the 3D arrays X_stack and y_stack without reshaping them as 2D arrays. Would you give me a hand on how to modify the settings? Thank you.
I am assuming that there is an error somewhere in the shapes that you reported for your arrays. I'm guessing y_stack.shape == (1200, 10, 2), is that correct?
However, here is one possibility to do what you describe:
model = Sequential()
model.add(LSTM(12, input_shape=(40, 2), return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add(Dense(10 * 2))
model.add(Reshape((10, 2)))
The output of the network is created as a 2D tensor by the Dense layer, and then reshaped to a 3D tensor by the Reshape.
From an input-output perspective, this should behave like you specified.
i can not give a short answer to this question however i think there is clarification needed about some basic concepts of LSTM (one-to-one, one-to-many,...)
As a superstructure RNNs (including LSTMs) are sequential, they are constructed to find time-like correlations, while CNNs are spatial they are build to find space-like correlations
Then there is a further differentiation of LSTM in one-to-one, one-to-many, many-to-one and many-to-many like shown in Many to one and many to many LSTM examples in Keras
The network type that is wanted here is point 5 in Many to one and many to many LSTM examples in Keras and it says :
Many-to-many when number of steps differ from input/output length: this is freaky hard in Keras. There are no easy code snippets to code that.
It is type 5 because input shape is X_stack = (1200,40,2)
and output shape is y_stack = (1200,10,2) so the number of timesteps differ (40 input and 10 output)
If you could manage to have an equal number of input and output timesteps you can reshape input and output data (numpy.reshape) like in keras LSTM feeding input with the right shape ( note the arrangement of the [ and ] in the arrays). This does not mean reshaping to 2D ( i.e. flattening ).
In https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ is a complete example for building a many-to-many LSTM with equal input and output timesteps using TimeDistributed layer
Only for completeness, for spatio-temporal data there are also CNN-LSTMs however this does not apply here because two stacked timeseries have no explicit spatial correlations :
If you have a 3D quantity, i.e. a distribution in a volume that changes over time and want to learn this then you have to use a CNN-LSTM network. In this approach both the 3D information and the temporal information is preserved. With 3D information is preserved is meant that the spatial information is not discarded. Normally in time-like learners like LSTM this spatial information is often discarded i.e. by flattening an image before processing it in an LSTM.
A complete tutorial how a (spatio-temporal) CNN-LSTM can be built in keras is in https://machinelearningmastery.com/cnn-long-short-term-memory-networks/
You can use the output tuple of X_stack.shape():
model = Sequential()
model.add(LSTM(12, input_shape=(X_stack.shape[1], X_stack.shape[2]),return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add((Dense((10,2))))
I am assuming that you will need to share the parameters for each array that you stack.
If you were stacking entirely new features, then there wouldn't be an associated target with each one.
If you were stacking completely different examples, then you would not be using 3D arrays, and would just be appending them to the end like normal.
Solution
To solve this problem, I would leverage the TimeDistributed wrapper from Keras.
LSTM layers expect a shape (j, k) where j is the number of time steps, and k is the number of features. Since you want to keep your array as 3D for the input and output, you will want to stack on a different dimension than the feature dimension.
Quick side note:
I think it’s important to note the difference between the approaches. Stacking on the feature dimension gives you multiple features for the same time steps. In that case you would want to use the same LSTM layers and not go this route. Because you want a 3D input and a 3D output, I am proposing that you create a new dimension to stack on which will allow you to apply the same LSTM layers independently.
TimeDistributed:
This wrapper applies a layer to each array at the 1 index.
By stacking your X1 and X2 arrays on the 1 index, and using the TimeDistributed wrapper, you are applying LSTM layers independently to each array that you stack. Notice below that the original and updated model summaries have the exact same number of parameters.
Implementation Steps:
The first step is to reshape the input of (40, 2) into (2, 40, 1). This gives you the equivalent of 2 x (40, 1) array inputs. You can either do this in the model like I’ve done, or when building your dataset and update the input shape.
By adding the extra dimension (..., 1) to the end, we are keeping the data in a format that the LSTM would understand if it was just looking at one of the arrays that we stacked at a time. Notice how your original input_shape is (40, 1) for instance.
Then wrap each layer in the TimeDistributed wrapper.
And finally, reshape the y output to match your data by swapping (2, 10) to (10, 2).
Code
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import LSTM, Dense, TimeDistributed, InputLayer, Reshape
from tensorflow.python.keras import backend
import numpy as np
# Original Model
model = Sequential()
model.add(LSTM(12, input_shape=(40, 1), return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add((Dense(10)))
model.summary()
Original Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 40, 12) 672
_________________________________________________________________
lstm_1 (LSTM) (None, 40, 12) 1200
_________________________________________________________________
lstm_2 (LSTM) (None, 6) 456
_________________________________________________________________
dense (Dense) (None, 10) 70
=================================================================
Total params: 2,398
Trainable params: 2,398
Non-trainable params: 0
_________________________________________________________________
Apply TimeDistributed Wrapper
model = Sequential()
model.add(InputLayer(input_shape=(40, 2)))
model.add(Reshape(target_shape=(2, 40, 1)))
model.add(TimeDistributed(LSTM(12, return_sequences=True)))
model.add(TimeDistributed(LSTM(12, return_sequences=True)))
model.add(TimeDistributed(LSTM(6, return_sequences=False)))
model.add(TimeDistributed(Dense(10)))
model.add(Reshape(target_shape=(10, 2)))
model.summary()
Updated Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 2, 40, 1) 0
_________________________________________________________________
time_distributed (TimeDistri (None, 2, 40, 12) 672
_________________________________________________________________
time_distributed_1 (TimeDist (None, 2, 40, 12) 1200
_________________________________________________________________
time_distributed_2 (TimeDist (None, 2, 6) 456
_________________________________________________________________
time_distributed_3 (TimeDist (None, 2, 10) 70
_________________________________________________________________
reshape_1 (Reshape) (None, 10, 2) 0
=================================================================
Total params: 2,398
Trainable params: 2,398
Non-trainable params: 0
_________________________________________________________________

Keras LSTM different input output shape

In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.
Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))

expected conv2d_1_input to have 4 dimensions, but got array with shape (15936, 64)

Hello I'm new to Deep Learning and Keras and I was doing a project in order to learn Deep Learning and Keras. Here I've made a model.
Model = Sequential()
Model.add(Conv2D(32, (3, 3) , input_shape = (100,64,64,), padding = 'same',
activation='relu'))
Model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
Model.add(MaxPooling2D(pool_size=(2, 2)))
Model.add(Flatten()) #Conversion to Neurons
Model.add(Dense(512, activation='relu'))
Model.add(Dense(1, activation='softmax'))
For training and fitting.
X = signalBuffer.transpose()
Y = np.ones([19920, 1], dtype = int)
x_train, x_test, y_train, y_test = train_test_split(X, Y,
test_size=0.20,shuffle=True)
Model.fit(x_train, y_train,batch_size=100,epochs=epochs,validation_data=
(x_test, y_test),shuffle=True)
Here the X has 19920 rows and 64 columns and Y has 19920 rows and 1 column.
The training and testing splitting is executing without errors. The error is coming in the last line when I try to fit in the model.
The error is
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (15936, 64)
Pardon me if this is a silly question or answer is very easy, but I'm trying to understand the model and I've tried a few solutions but its still giving errors. Any help is appreciated.
From the keras documentation:
2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last".
What you have is 19920 samples and 64 features, assuming that this is correct, you should probably go for a 1D convolutional layer instead. The 1D convolutional layer takes a 2D input, it is the kernel itself that is 1D
you will probably need to do some changes to ensure that your data is in the correct format as well, as the Conv1D takes the following:
Input shape
3D tensor with shape: (batch, steps, channels)
Convolutions are used to exploit the fact that spatial locality matters, e.g. near pixel are important to each other to find edges or classify something. This is seldomn the case with tabular data. If you still want to use NN to solve this you might want to use MLPs, e.g. dense layer for your task. Then you do not have to do anything apart from removing the convolutional part.
If your data is spatially connected then you might want to use Conv1D layers. As a previos post explains the input shape is a 3D tensor with (Batch, steps, channels). The amount of parameters create is dependant on the number of input channels and number of output channels and irrelevant of the step parameter. E.g. reshaping your data to be (19920, 1, 64) will produce a dense layer. The other extreme is (19920, 64, 1) where the amount of parameters is solely dependand on kernel size and outgoing channel maps.

Activation function error in a 1D CNN in Keras

I'm creating a model to classify if the input waverform contains rising edge of SDA of I2C line.
My input has 20000 datapoints and 100 training data.
I've initially found an answer regarding the input in here Keras 1D CNN: How to specify dimension correctly?
However, I'm getting an error in the activation function:
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (100, 1)
My model is:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
adam = Adam(lr=learning_rate)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_label,
nb_epoch=10,
batch_size=batch_size, shuffle=True)
score = np.asarray(model.evaluate(test_new_data, test_label, batch_size=batch_size))*100.0
I can't determine the problem in here. On why the activation function expects a 3D tensor.
The problem lies in the fact that starting from keras 2.0, a Dense layer applied to a sequence will apply the layer to each time step - so given a sequence it will produce a sequence. So your Dense is actually producing a sequence of 1-element vectors and this causes your problem (as your target is not a sequence).
There are several ways on how to reduce a sequence to a vector and then apply a Dense to it:
GlobalPooling:
You may use GlobalPooling layers like GlobalAveragePooling1D or GlobalMaxPooling1D, eg.:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(GlobalMaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Flattening:
You might colapse the whole sequence to a single vector using Flatten layer:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation("sigmoid"))
RNN Postprocessing:
You could also add a recurrent layer on a top of your sequence and make it to return only the last output:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(SimpleRNN(10, return_sequences=False))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Conv1D has its output with 3 dimensions (and it will keep like that until the Dense layer).
Conv output: (BatchSize, Length, Filters)
For the Dense layer to output only one result, you need to add a Flatten() or Reshape((shape)) layer, to make it (BatchSize, Lenght) only.
If you call model.summary(), you will see exactly what shape each layer is outputting. You have to adjust the output to be exactly the same shape as the array you pass as the correct results. The None that appears in those shapes is the batch size and may be ignored.
About your model: I think you need more convolution layers, reducing the number of filters gradually, because condensing so much data in a single Dense layer does not usually bring good results.
About dimensions: keras layers toturial and samples

Building a LSTM Cell using Keras

I'm trying to build a RNN for text generation. I'm stuck at building my LSTM cell. The data is shaped like this- X is the input sparse matrix of dim(90809,2700) and Y is the output matrix of dimension(90809,27). The following is my code for defining the LSTM Cell-
model = Sequential()
model.add(LSTM(128, input_shape=(X.shape[0], X.shape[1])))
model.add(Dropout(0.2))
model.add(Dense(Y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
My understanding is that the input_shape should be the dimension of the input matrix, and the dense layer should be the size of the output for each observation, i.e 27 in this case. However, I get the following error-
Exception: Error when checking model input: expected lstm_input_3 to have 3 dimensions, but got array with shape (90809, 2700)
I'm not able to figure out what is going wrong. Can anyone please help me figure out why is the lstm_input expecting 3 dimensions?
I tried the following as well-
X= np.reshape(np.asarray(dataX), (n_patterns, n_vocab*seq_length,1))
Y=np.reshape(np.asarray(dataY), (n_patterns, n_vocab,1))
This gave me the following error-
Exception: Error when checking model input: expected lstm_input_7 to have shape (None, 90809, 2700) but got array with shape (90809, 2700, 1)
Any help will be appreciated. Thanks!
You should read about the difference between input_shape, batch_input_shape and input_dim here.
For input_shape, we don't need to define the batch_size. This is how your LSTM layer should look like.
model.add(LSTM(128, input_shape=(X.shape[1], 1)))
or
model.add(LSTM(128, batch_input_shape=(X.shape[0], X.shape[1], 1)))

Categories

Resources