I want to use char sequences and word sequences as inputs. Each of them will be embedded its related vocabulary and then resulted embeddings will be concatenated. I write following code to concatenate two embeddings:
char_model = Sequential()
char_model.add(Embedding(vocab_size, char_emnedding_dim,input_length=char_size,embeddings_initializer='random_uniform',trainable=False, input_shape=(char_size, )))
word_model = Sequential()
word_model.add(Embedding(word_vocab_size,word_embedding_dim, weights=[embedding_matrix], input_length=max_length, trainable=False,input_shape=(max_length, )))
model = Sequential()
model.add(Concatenate([char_model, word_model]))
model.add(Dropout(drop_prob))
model.add(Conv1D(filters=250, kernel_size=3, padding='valid', activation='relu', strides = 1))
model.add(GlobalMaxPooling1D())
model.add(Dense(hidden_dims)) # fully connected layer
model.add(Dropout(drop_prob))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
print(model.summary())
When I execute the code, I have the following error:
ValueError: This model has not yet been built. Build the model first by calling build() or calling fit() with some data. Or specify input_shape or batch_input_shape in the first layer for automatic build.
I defined input_shape for each embedding, but I still have same error. How can I concatenate two sequential model?
The problem is in this line:
model.add(Concatenate([char_model, word_model]))
Let alone you are calling Concatenate layer wrongly, you can't have a concatenation layer in a Sequential model since it would no longer be a Sequential model by definition. Instead, use Keras Functional API to define such a model.
Related
Hello I am trying to build a seq2seq model to generate some music.
I really dont know much about it though.
On the internet I have found this model:
def createSeq2Seq():
#seq2seq model
#encoder
model = Sequential()
model.add(LSTM(input_shape = (None, input_dim), units = num_units, activation= 'tanh', return_sequences = True ))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(num_units, activation= 'tanh'))
#decoder
model.add(RepeatVector(y_seq_length))
num_layers= 2
for _ in range(num_layers):
model.add(LSTM(num_units, activation= 'tanh', return_sequences = True))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(output_dim, activation= 'softmax')))
return model
My data is a list of pianorolls. A piano roll is a matrix with the columns representing a one-hot encoding of the different possible pitches (49 in my case) with each column representing a time (0,02s in my case). The pianoroll matrix is then only ones and zeros.
I have prepared my training data reshaping my pianoroll songs (putting them all one after the other) into
shape = (something, batchsize, 49). So my input data are all the songs one after the other separeted in blocks of size the batchsize. My training data is then the same input but delayed one batch.
The x_seq_length and y_seq_length are equal to the batch_size. Input_dim = 49
My input and output sequences have the same dimension.
Have I made any mistake in my reasoning? Is the seq2seq model Ive found correct? What does the RepeatVector does?
This is not a seq2seq model. RepeatVector takes the last state of the last encoder LSTM and makes one copy per output token. Then you feed these copies into a "decoder" LSTM, which thus has the same input in every time step.
A proper autoregressive decoder takes its previous outputs as input, i.e., at training time, the input of the decoder is the same as its output, but shifted by one position. This also means that your model misses the embedding layer for the decoder inputs.
I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.
I am following a tutorial on building a simple deep neural network in Keras, and the code provided was:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Is the first model.add line to define the first hidden layer, with 8 inputs in the input layer? Is there thus no need to specify the input layer except for the code input_dim=8?
You're right.
When you're creating a Sequential model, the input "layer"* is defined by input_dim or by input_shape, or by batch_input_shape.
* - The input layer is not really a layer, but just a "container" for receiving data in a specific format.
Later you might find it very useful to use functional API models instead of sequential models. In that case, then you will define the input tensor with:
inputs = Input((8,))
And pass this tensor through the layers:
outputs = Dense(12, input_dim=8, activation='relu')(inputs)
outputs = Dense(8, activation='relu')(outputs)
outputs = Dense(1, activation='sigmoid')(outputs)
To create the model:
model = Model(inputs,outputs)
It seems too much trouble at first, but soon you will feel the need to create branches, join models, split models, etc.
I'm creating a model to classify if the input waverform contains rising edge of SDA of I2C line.
My input has 20000 datapoints and 100 training data.
I've initially found an answer regarding the input in here Keras 1D CNN: How to specify dimension correctly?
However, I'm getting an error in the activation function:
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (100, 1)
My model is:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
adam = Adam(lr=learning_rate)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_label,
nb_epoch=10,
batch_size=batch_size, shuffle=True)
score = np.asarray(model.evaluate(test_new_data, test_label, batch_size=batch_size))*100.0
I can't determine the problem in here. On why the activation function expects a 3D tensor.
The problem lies in the fact that starting from keras 2.0, a Dense layer applied to a sequence will apply the layer to each time step - so given a sequence it will produce a sequence. So your Dense is actually producing a sequence of 1-element vectors and this causes your problem (as your target is not a sequence).
There are several ways on how to reduce a sequence to a vector and then apply a Dense to it:
GlobalPooling:
You may use GlobalPooling layers like GlobalAveragePooling1D or GlobalMaxPooling1D, eg.:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(GlobalMaxPooling1D(pool_size=4, strides=None))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Flattening:
You might colapse the whole sequence to a single vector using Flatten layer:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation("sigmoid"))
RNN Postprocessing:
You could also add a recurrent layer on a top of your sequence and make it to return only the last output:
model.add(Conv1D(filters=n_filter,
kernel_size=input_filter_length,
strides=1,
activation='relu',
input_shape=(20000,1)))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_size=4, strides=None))
model.add(SimpleRNN(10, return_sequences=False))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Conv1D has its output with 3 dimensions (and it will keep like that until the Dense layer).
Conv output: (BatchSize, Length, Filters)
For the Dense layer to output only one result, you need to add a Flatten() or Reshape((shape)) layer, to make it (BatchSize, Lenght) only.
If you call model.summary(), you will see exactly what shape each layer is outputting. You have to adjust the output to be exactly the same shape as the array you pass as the correct results. The None that appears in those shapes is the batch size and may be ignored.
About your model: I think you need more convolution layers, reducing the number of filters gradually, because condensing so much data in a single Dense layer does not usually bring good results.
About dimensions: keras layers toturial and samples
I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/
from keras.models import Sequential
from keras.layers import LSTM
The following three lines of code (plus comment) are taken directly from the documentation link above:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))
# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2
I get that error above after executing the second model.add() statement, but before exposing the model to my data, or even compiling it.
What am I doing wrong here? I'm using Keras 1.2.1.
Edit
Just upgraded to current 1.2.2, still having same issue.
Thanks to patyork for answering this on Github:
the second LSTM layer is not getting a 3D input that it expects (with a shape of (batch_size, timesteps, features). This is because the first LSTM layer has (by fortune of default values) return_sequences=False, meaning it only output the last feature set at time t-1 which is of shape (batch_size, 32), or 2 dimensions that doesn't include time.
So to offer a code example of how to use a stacked LSTM to achieve many-to-one (return_sequences=False) sequence classification, just make sure to use return_sequences=True on the intermediate layers like this:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(24, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
(no errors)