I am following a tutorial on building a simple deep neural network in Keras, and the code provided was:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Is the first model.add line to define the first hidden layer, with 8 inputs in the input layer? Is there thus no need to specify the input layer except for the code input_dim=8?
You're right.
When you're creating a Sequential model, the input "layer"* is defined by input_dim or by input_shape, or by batch_input_shape.
* - The input layer is not really a layer, but just a "container" for receiving data in a specific format.
Later you might find it very useful to use functional API models instead of sequential models. In that case, then you will define the input tensor with:
inputs = Input((8,))
And pass this tensor through the layers:
outputs = Dense(12, input_dim=8, activation='relu')(inputs)
outputs = Dense(8, activation='relu')(outputs)
outputs = Dense(1, activation='sigmoid')(outputs)
To create the model:
model = Model(inputs,outputs)
It seems too much trouble at first, but soon you will feel the need to create branches, join models, split models, etc.
Related
Hello I am trying to build a seq2seq model to generate some music.
I really dont know much about it though.
On the internet I have found this model:
def createSeq2Seq():
#seq2seq model
#encoder
model = Sequential()
model.add(LSTM(input_shape = (None, input_dim), units = num_units, activation= 'tanh', return_sequences = True ))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(num_units, activation= 'tanh'))
#decoder
model.add(RepeatVector(y_seq_length))
num_layers= 2
for _ in range(num_layers):
model.add(LSTM(num_units, activation= 'tanh', return_sequences = True))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(output_dim, activation= 'softmax')))
return model
My data is a list of pianorolls. A piano roll is a matrix with the columns representing a one-hot encoding of the different possible pitches (49 in my case) with each column representing a time (0,02s in my case). The pianoroll matrix is then only ones and zeros.
I have prepared my training data reshaping my pianoroll songs (putting them all one after the other) into
shape = (something, batchsize, 49). So my input data are all the songs one after the other separeted in blocks of size the batchsize. My training data is then the same input but delayed one batch.
The x_seq_length and y_seq_length are equal to the batch_size. Input_dim = 49
My input and output sequences have the same dimension.
Have I made any mistake in my reasoning? Is the seq2seq model Ive found correct? What does the RepeatVector does?
This is not a seq2seq model. RepeatVector takes the last state of the last encoder LSTM and makes one copy per output token. Then you feed these copies into a "decoder" LSTM, which thus has the same input in every time step.
A proper autoregressive decoder takes its previous outputs as input, i.e., at training time, the input of the decoder is the same as its output, but shifted by one position. This also means that your model misses the embedding layer for the decoder inputs.
I want to use char sequences and word sequences as inputs. Each of them will be embedded its related vocabulary and then resulted embeddings will be concatenated. I write following code to concatenate two embeddings:
char_model = Sequential()
char_model.add(Embedding(vocab_size, char_emnedding_dim,input_length=char_size,embeddings_initializer='random_uniform',trainable=False, input_shape=(char_size, )))
word_model = Sequential()
word_model.add(Embedding(word_vocab_size,word_embedding_dim, weights=[embedding_matrix], input_length=max_length, trainable=False,input_shape=(max_length, )))
model = Sequential()
model.add(Concatenate([char_model, word_model]))
model.add(Dropout(drop_prob))
model.add(Conv1D(filters=250, kernel_size=3, padding='valid', activation='relu', strides = 1))
model.add(GlobalMaxPooling1D())
model.add(Dense(hidden_dims)) # fully connected layer
model.add(Dropout(drop_prob))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
print(model.summary())
When I execute the code, I have the following error:
ValueError: This model has not yet been built. Build the model first by calling build() or calling fit() with some data. Or specify input_shape or batch_input_shape in the first layer for automatic build.
I defined input_shape for each embedding, but I still have same error. How can I concatenate two sequential model?
The problem is in this line:
model.add(Concatenate([char_model, word_model]))
Let alone you are calling Concatenate layer wrongly, you can't have a concatenation layer in a Sequential model since it would no longer be a Sequential model by definition. Instead, use Keras Functional API to define such a model.
I am trying to create an neural network based on the iris dataset. I have an input of four dimensions. X = dataset[:,0:4].astype(float). Then, I create a neural network with four nodes.
model = Sequential()
model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
As I understand, I pass each dimension to the separate node. Four dimensions - four nodes. When I create a neural network with 8 input nodes, how does it work? Performance still is the same as with 4 nodes.
model = Sequential()
model.add(Dense(8, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
You have an error on your last activation. Use softmax instead of sigmoid and run again.
replace
model.add(Dense(3, init='normal', activation='sigmoid'))
with
model.add(Dense(3, init='normal', activation='softmax'))
To answer your main question of "How does this work?":
From a conceptual standpoint, you are initially creating a fully-connected, or Dense, neural network with 3 layers: an input layer with 4 nodes, a hidden layer with 4 nodes, and an output layer with 3 nodes. Each node in the input layer has a connection to every node in the hidden layer, and same with the hidden to the output layer.
In your second example, you just increased the number of nodes in the hidden layer from 4 to 8. A larger network can be good, as it can be trained to "look" for more things in your data. But too large of a layer and you may overfit; this means the network remembers too much of the training data, when it really just needs a general idea of the training data so it can still recognize slightly different data, which is your testing data.
The reason you may not have seen an increase in performance is likely either overfitting or your activation function; Try a function other than relu in your hidden layer. After trying a few different function combinations, if you don't see any improvement, you are likely overfitting.
Hope this helps.
I have a a list of training data that I am using to train. However, when I predict, the prediction will be done online with a single example at a time.
If I declare my model with input like the following
model = Sequential()
model.add(Dense(64, batch_input_shape=(100, 5, 1), activation='tanh'))
model.add(LSTM(32, stateful=True))
model.add(Dense(1, activation='linear'))
optimizer = SGD(lr=0.0005)
model.compile(loss='mean_squared_error', optimizer=optimizer)
When I go to predict with a single example of shape (1, 5, 1), it gives the following error.
ValueError: Shape mismatch: x has 100 rows but z has 1 rows
The solution I came up with was to just train my model iteratively using a batch_input_shape of (1,5,1) and calling fit for each single example. This is incredibly slow.
Is there not a way to train on a large batch size, but predict with a single example using LSTM?
Thanks for the help.
Try something like this:
model2 = Sequential()
model2.add(Dense(64, batch_input_shape=(1, 5, 1), activation='tanh'))
model2.add(LSTM(32, stateful=True))
model2.add(Dense(1, activation='linear'))
optimizer2 = SGD(lr=0.0005)
model2.compile(loss='mean_squared_error', optimizer=optimizer)
for nb, layer in enumerate(model.layers):
model2.layers[nb].set_weights(layer.get_weights())
You are simply rewritting weights from one model to another.
You have defined the input_shape in the first layer. Therefore sending a shape that does not match the preset-ed input_shape is in valid.
There are two way to achieve that:
You can modify your model by changing
batch_input_shape=(100, 5, 1)
to
input_shape=(5, 1) to avoid a preset-ed batch size. You can setup the batch_size=100 in model.fit().
Edit: Method 2
You define the exact same model as model2. Then model2.set_weights(model1.get_weights()).
If you want to use stateful==True, you actually want to use the hidden layers from the last batch as the initial states for the next batch. Therefore very batch size should be matched. Otherwise, you can just remove the stateful==True.
I am trying to build an LSTM model, working off the documentation example at https://keras.io/layers/recurrent/
from keras.models import Sequential
from keras.layers import LSTM
The following three lines of code (plus comment) are taken directly from the documentation link above:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10))
# for subsequent layers, not need to specify the input size:
model.add(LSTM(16))
ValueError: Input 0 is incompatible with layer lstm_2: expected
ndim=3, found ndim=2
I get that error above after executing the second model.add() statement, but before exposing the model to my data, or even compiling it.
What am I doing wrong here? I'm using Keras 1.2.1.
Edit
Just upgraded to current 1.2.2, still having same issue.
Thanks to patyork for answering this on Github:
the second LSTM layer is not getting a 3D input that it expects (with a shape of (batch_size, timesteps, features). This is because the first LSTM layer has (by fortune of default values) return_sequences=False, meaning it only output the last feature set at time t-1 which is of shape (batch_size, 32), or 2 dimensions that doesn't include time.
So to offer a code example of how to use a stacked LSTM to achieve many-to-one (return_sequences=False) sequence classification, just make sure to use return_sequences=True on the intermediate layers like this:
model = Sequential()
model.add(LSTM(32, input_dim=64, input_length=10, return_sequences=True))
model.add(LSTM(24, return_sequences=True))
model.add(LSTM(16, return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.compile(optimizer = 'RMSprop', loss = 'categorical_crossentropy')
(no errors)