Is this a valid seq2seq lstm model? - python

Hello I am trying to build a seq2seq model to generate some music.
I really dont know much about it though.
On the internet I have found this model:
def createSeq2Seq():
#seq2seq model
#encoder
model = Sequential()
model.add(LSTM(input_shape = (None, input_dim), units = num_units, activation= 'tanh', return_sequences = True ))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(num_units, activation= 'tanh'))
#decoder
model.add(RepeatVector(y_seq_length))
num_layers= 2
for _ in range(num_layers):
model.add(LSTM(num_units, activation= 'tanh', return_sequences = True))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(output_dim, activation= 'softmax')))
return model
My data is a list of pianorolls. A piano roll is a matrix with the columns representing a one-hot encoding of the different possible pitches (49 in my case) with each column representing a time (0,02s in my case). The pianoroll matrix is then only ones and zeros.
I have prepared my training data reshaping my pianoroll songs (putting them all one after the other) into
shape = (something, batchsize, 49). So my input data are all the songs one after the other separeted in blocks of size the batchsize. My training data is then the same input but delayed one batch.
The x_seq_length and y_seq_length are equal to the batch_size. Input_dim = 49
My input and output sequences have the same dimension.
Have I made any mistake in my reasoning? Is the seq2seq model Ive found correct? What does the RepeatVector does?

This is not a seq2seq model. RepeatVector takes the last state of the last encoder LSTM and makes one copy per output token. Then you feed these copies into a "decoder" LSTM, which thus has the same input in every time step.
A proper autoregressive decoder takes its previous outputs as input, i.e., at training time, the input of the decoder is the same as its output, but shifted by one position. This also means that your model misses the embedding layer for the decoder inputs.

Related

Keras Flatten and GRU input_shape difference receiving same inputs

I'm looking at an example from a book. The input is of shape (samples=128, timesteps=24, features=13). When defining two different networks both receiving the same input they have different input_shape on flatten and GRU layers.
model 1:
model = Sequential()
model.add(layers.Flatten(input_shape=(24, 13)))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))
model 2:
model = Sequential()
model.add(layers.GRU(32, input_shape=(None, 13)))
model.add(layers.Dense(1))
I understand that input_shape represents the shape of a single input (not considering batch size), so on my understanding the input_shape on both cases should be (24, 13).
Why are the input_shapes differents between model 1 and model 2?
GRU is a recurrent unit (RNN), which takes a sequence of data as input. The expected input shape for GRU is (batch size, sequence length, feature size). In your case the sequence length is 24 and feature size is 13.
As usual, you don't need to specify a batch size for input_shape argument. Additionally, for recurrent units like GRU or LSTM you can use "None" instead of sequence length, so that it can accept sequences of any length. This is why "input_shape=(None, 13)" is allowed here.

TF 2.0 sequential CNN into LSTM for regression "Negative dimension size" error

I'm trying to build a model that predict the price of a certain commodity based on current market conditions, my data are shaped similar to
num_samples = 100
sample_dimension = 10
XXX = np.random.random((num_samples,sample_dimension)).reshape(-1,1,sample_dimension)
YYY = np.random.random(num_samples).reshape(-1,1)
so I've got 100 ordered samples of X data, each consisting of 10 variables. My model looks like the following
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = (2),
activation='sigmoid',
input_shape=(None, sample_dimension),
batch_input_shape = [1,1,sample_dimension]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
model.add(tf.keras.layers.Reshape((1, sample_dimension)))
model.add(tf.keras.layers.LSTM(100,
stateful = True,
return_sequences=False,
activation='sigmoid'))
model.add(keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
so it's a 1D convolution, a pooling, a reshape (so it plays nice with the lstm) and then casting down to a prediction
but when I try to run it, I get the following error
Negative dimension size caused by subtracting 2 from 1 for 'conv1d/conv1d' (op: 'Conv2D') with input shapes: [1,1,1,10], [1,2,10,4].
I've tried a few different values for the kernel size, pool size, and batch_input_shape (have to batch my inputs because my actual data are spread across several large files, so I want to read one at a time and kick it into training the model), but nothing seems to work.
What am I doing wrong? How can I track/predict the shape of my data as it goes through this model? What are the data/variables supposed to look like?
I ended up looking through tutorials for conv2D, and then converting stuff to conv1D (please edit as you feel appropriate)
conv2D solution
model = keras.Sequential()
model.add(tf.keras.layers.Conv2D(4,
kernel_size = (**1**,2),
activation = 'sigmoid',
input_shape = (**1**,sample_dimension,1),
batch_input_shape = [None,**1**,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1,2)))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
Then I converted it to conv1D by taking out a dimension from each of the necessary arguments (the bold 1s)
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = 2,
activation = 'sigmoid',
input_shape = (sample_dimension,1),
batch_input_shape = [None,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
i guess the key takeaway is that tensorflow isn't designed to deal with vectors or even matrices, so the last dimension has to be the dimension of the tensor- in this case, it's a 1D tensor (just a number) being held in a sample_dimension

How to get only last output of sequence model in Keras?

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

Keras LSTM different input output shape

In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].
My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:
ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)
My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?
.
X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)
start = datetime.datetime.now()
print(start)
# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())
# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)
end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)
Please let me know if I have missed out any information, thanks.
Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))
model.add(Dense(y_train.shape[1],activation='sigmoid'))
###Edit:
TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.
model.add(TimeDistributed(Dense(200)))
Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs.
The other case is if you remove TimeDistributed wrapper and use only dense, like this.
model.add(Dense(200))
Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(y_train.shape[1],activation='sigmoid'))

Getting state of predictions in LSTMs

I am attempting to generate shakespeare text using the following model:
model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
The training set consists of characters converted to numbers. Where x is of shape (num_sentences, sentence_len) and same shape for y, where y is simply x offset by one character. In this case sentence_len=40.
However, when I predict I predict one character at a time. See below for how I fit and predict using the model:
for i in range(2):
model.fit(x,y, batch_size=128, epochs=1)
sentence = []
letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
for i in range(100):
sentence.append(val2chr(letter))
# Predict ONE letter at a time
p = model.predict(letter)
letter = np.random.choice(27,1,p=p[0][0])
print(''.join(sentence))
However, regardless of how many epochs I train all I get is jibberish for the output. One of the possible reasons is that I do not get the cell memory from the previous prediction.
So the question is how do I make sure that the state is sent off to the next cell before I predict?
Full jupyter notebook example is here:
Edit 1:
I just realised that I would need to send in the previous LSTMs hidden state and not just cell memory.
I have since tried to redo the model as:
batch_size = 64
model = Sequential()
model.add(Embedding(len_vocab, 64, batch_size=batch_size))
model.add(LSTM(256, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
However, now I cannot predict one letter at a time as it is expecting a batch_size of inputs.
The standard way to train a char-rnn with Keras can be found in the official example: lstm_text_generation.py.
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
This model is trained based on sequences of maxlen characters.
While training this network, LSTM states are reset after each sequence (stateful=False by default).
Once such a network is trained, you may want to feed and predict one character at a time. The simplest way to do that (that I know of), is to build another Keras model with the same structure, initialize it with the weights of the first one, but with RNN layers in Keras "stateful" mode:
model = Sequential()
model.add(LSTM(128, stateful=True, batch_input_shape=(1, 1, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
In this mode, Keras has to know the complete shape of a batch (see the doc here).
Since you want to feed the network only one sample of one step of characters, the shape of a batch is (1, 1, len(chars)).
As #j-c-doe pointed out you can use the stateful option with batch of one and transfer the weights. The other method that I found was to keep unrolling the LSTM and predicting as below:
for i in range(150):
sentence.append(int2char[letter[-1]])
p = model.predict(np.array(letter)[None,:])
letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])
NOTE: The dimensionality of the prediction is really important! np.array(letter)[None,:] gives a (1,i+1) shape. This way no modification to the model is required.
And most importantly it keeps passing on the cell state memory and hidden state. I'm not entirely sure if stateful=True if it passes the hidden state as well, or if its only the cell state.

Categories

Resources