Getting state of predictions in LSTMs - python

I am attempting to generate shakespeare text using the following model:
model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
The training set consists of characters converted to numbers. Where x is of shape (num_sentences, sentence_len) and same shape for y, where y is simply x offset by one character. In this case sentence_len=40.
However, when I predict I predict one character at a time. See below for how I fit and predict using the model:
for i in range(2):
model.fit(x,y, batch_size=128, epochs=1)
sentence = []
letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
for i in range(100):
sentence.append(val2chr(letter))
# Predict ONE letter at a time
p = model.predict(letter)
letter = np.random.choice(27,1,p=p[0][0])
print(''.join(sentence))
However, regardless of how many epochs I train all I get is jibberish for the output. One of the possible reasons is that I do not get the cell memory from the previous prediction.
So the question is how do I make sure that the state is sent off to the next cell before I predict?
Full jupyter notebook example is here:
Edit 1:
I just realised that I would need to send in the previous LSTMs hidden state and not just cell memory.
I have since tried to redo the model as:
batch_size = 64
model = Sequential()
model.add(Embedding(len_vocab, 64, batch_size=batch_size))
model.add(LSTM(256, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
However, now I cannot predict one letter at a time as it is expecting a batch_size of inputs.

The standard way to train a char-rnn with Keras can be found in the official example: lstm_text_generation.py.
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
This model is trained based on sequences of maxlen characters.
While training this network, LSTM states are reset after each sequence (stateful=False by default).
Once such a network is trained, you may want to feed and predict one character at a time. The simplest way to do that (that I know of), is to build another Keras model with the same structure, initialize it with the weights of the first one, but with RNN layers in Keras "stateful" mode:
model = Sequential()
model.add(LSTM(128, stateful=True, batch_input_shape=(1, 1, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
In this mode, Keras has to know the complete shape of a batch (see the doc here).
Since you want to feed the network only one sample of one step of characters, the shape of a batch is (1, 1, len(chars)).

As #j-c-doe pointed out you can use the stateful option with batch of one and transfer the weights. The other method that I found was to keep unrolling the LSTM and predicting as below:
for i in range(150):
sentence.append(int2char[letter[-1]])
p = model.predict(np.array(letter)[None,:])
letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])
NOTE: The dimensionality of the prediction is really important! np.array(letter)[None,:] gives a (1,i+1) shape. This way no modification to the model is required.
And most importantly it keeps passing on the cell state memory and hidden state. I'm not entirely sure if stateful=True if it passes the hidden state as well, or if its only the cell state.

Related

Multi class image classification using CNN

I wanted to classify images which consist five classes. I wanted to use CNN. But when I try with several models, the training accuracy will not increase than 20%. Please some one help me to overcome this. Mostly model will trained within 3 epoches and when epoches increase there is no improvement in accuracy. Can anyone suggest me a solution or model or can specify what could be the problem?
Below is one of the model i have used
#defining training and test sets
x_train,x_val,y_train,y_val=train_test_split(x,y,test_size=0.2, random_state=42)
print('Training data and target sizes: \n{}, {}'.format(x_train.shape,y_train.shape))
print('Test data and target sizes: \n{}, {}'.format(x_val.shape,y_val.shape))
Training data and target sizes:
(2398, 224, 224, 3), (2398,)
Test data and target sizes:
(600, 224, 224, 3), (600,)
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
print(base_model.summary())
#Adding custom Layers
add_model = Sequential()
add_model.add(Dense(1024, activation='relu',input_shape=base_model.output_shape[1:]))
add_model.add(Dropout(0.60))
add_model.add(Dense(1, activation='sigmoid'))
print(add_model.summary())
# creating the final model
model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
# compile the model
opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
reduce_lr = ReduceLROnPlateau(monitor='val_acc',
patience=5,
verbose=1,
factor=0.1,
cooldown=10,
min_lr=0.00001)
model.compile(
loss='categorical_crossentropy',
metrics=['acc'],
optimizer='adam'
)
print(model.summary())
n_fold = 5
kf = model_selection.KFold(n_splits = n_fold, shuffle = True)
eval_fun = metrics.roc_auc_score
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))
is it okay could you share the part of the code where you're fitting the model. It's not available in the post.
And since the output is not reproducible due to lack of data, I suggest you go through this link https://www.kaggle.com/kenconstable/alzheimer-s-multi-class-classification
It's really well explained and it has given the best practices of multi-class-classification based on transfer learning as well as from scratch. In case you don't find this helpful, It would be helpful to share the training script including the model.fit() code.
Okay, so here's the issue,
In your code, you may be creating a base model with inception V3, however, you are not really adding that base model to your add_model variable.
Your add_model variable is essentially a dense network and not a CNN. Also, another thing, although it's not a big deal is that you're creating your own optimiser opt and not using it in model.compile
Can you please try this code out and let me know if it works:
# function to build the model
def build_transfer_model(conv_base,dropout,dense_node,learn_rate,metric):
"""
Build and compile a transfer learning model
Input: a base model, dropout rate, the number of filters in the dense node,
the learning rate and performance metrics
Output: A compiled CNN model
"""
# clear previous run
backend.clear_session()
# build the model
model = Sequential()
model.add(conv_base)
model.add(Dropout(dropout))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(dense_node,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# complile the model
model.compile(
optimizer = tensorflow.keras.optimizers.Adam(lr=learn_rate),
loss = 'categorical_crossentropy',
metrics = metric )
model.summary()
return model
img_rows, img_cols, img_channel = 224, 224, 3
base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',pooling='avg', input_shape=(img_rows, img_cols, img_channel))
model = build_transfer_model(conv_base=base_model,dropout=0.6,dense_node =1024,learn_rate=0.001,metric=['acc'])
print(model.summary())
model.fit(x_train,y_train,epochs=50,batch_size=50,validation_data=(x_val,y_val))
If you pay attention in the function, the first thing we are adding to the instance of Sequential() is the base layer (InceptionV3 in your case). But you were adding a dense layer directly. Although it may get the weights from the output layer of the base inception V3, it will be a dense network, not a CNN. So please check this out.
I may have changed the variable names, although I have tried not to do the same. And, please change the order of the layers in the build_transfer_model function according to your requirement.
In case it doesn't work, let me know.
Thanks.
You have to use model.fit() to actually train the model after compiling. Right now, it has randomly initialized weights, and is therefore making random predictions. Since you have five classes, the accuracy is approximately 1/5 = 20%. Training your model may take time depending on model size and amount of data you have.

Is this a valid seq2seq lstm model?

Hello I am trying to build a seq2seq model to generate some music.
I really dont know much about it though.
On the internet I have found this model:
def createSeq2Seq():
#seq2seq model
#encoder
model = Sequential()
model.add(LSTM(input_shape = (None, input_dim), units = num_units, activation= 'tanh', return_sequences = True ))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(num_units, activation= 'tanh'))
#decoder
model.add(RepeatVector(y_seq_length))
num_layers= 2
for _ in range(num_layers):
model.add(LSTM(num_units, activation= 'tanh', return_sequences = True))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(output_dim, activation= 'softmax')))
return model
My data is a list of pianorolls. A piano roll is a matrix with the columns representing a one-hot encoding of the different possible pitches (49 in my case) with each column representing a time (0,02s in my case). The pianoroll matrix is then only ones and zeros.
I have prepared my training data reshaping my pianoroll songs (putting them all one after the other) into
shape = (something, batchsize, 49). So my input data are all the songs one after the other separeted in blocks of size the batchsize. My training data is then the same input but delayed one batch.
The x_seq_length and y_seq_length are equal to the batch_size. Input_dim = 49
My input and output sequences have the same dimension.
Have I made any mistake in my reasoning? Is the seq2seq model Ive found correct? What does the RepeatVector does?
This is not a seq2seq model. RepeatVector takes the last state of the last encoder LSTM and makes one copy per output token. Then you feed these copies into a "decoder" LSTM, which thus has the same input in every time step.
A proper autoregressive decoder takes its previous outputs as input, i.e., at training time, the input of the decoder is the same as its output, but shifted by one position. This also means that your model misses the embedding layer for the decoder inputs.

How to get only last output of sequence model in Keras?

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)
If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:
from keras.models import Model
from keras.layers import Input, Lambda
inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)
inference_model = Model(inp, out)
Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

Keras: Derivatives of output wrt each input

I am using a very simple MLP with just 1 hidden layer to estimate option prices.
In addition to the actual output of the neural network I would also like to know the partial derivative of the output value (of each line of the data sample) with regard to one of the 6 input parameters such that the resulting value can be interpreted as the percentage change of the output with regard to a change in the input parameter.
As I am pretty new to Keras and Neural Networks in general I was not able to come up with a solution for the problem myself.
# Create Model
model = Sequential()
model.add(Dense(6, input_dim=6)) #input layer
model.add(Dense(10, activation=relu)) #hidden layer
model.add(Dense(1, activation=linear)) #output layer
# Compile Model
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
# Train model
model.fit(X_train, Y_train, epochs=50, batch_size=10 verbose=2, validation_split=0.2)
# Predict Values
Y_pred = model.predict(X_test, batch_size=10)

How to train using batch inputs with Keras, but predicting with single example with an LSTM?

I have a a list of training data that I am using to train. However, when I predict, the prediction will be done online with a single example at a time.
If I declare my model with input like the following
model = Sequential()
model.add(Dense(64, batch_input_shape=(100, 5, 1), activation='tanh'))
model.add(LSTM(32, stateful=True))
model.add(Dense(1, activation='linear'))
optimizer = SGD(lr=0.0005)
model.compile(loss='mean_squared_error', optimizer=optimizer)
When I go to predict with a single example of shape (1, 5, 1), it gives the following error.
ValueError: Shape mismatch: x has 100 rows but z has 1 rows
The solution I came up with was to just train my model iteratively using a batch_input_shape of (1,5,1) and calling fit for each single example. This is incredibly slow.
Is there not a way to train on a large batch size, but predict with a single example using LSTM?
Thanks for the help.
Try something like this:
model2 = Sequential()
model2.add(Dense(64, batch_input_shape=(1, 5, 1), activation='tanh'))
model2.add(LSTM(32, stateful=True))
model2.add(Dense(1, activation='linear'))
optimizer2 = SGD(lr=0.0005)
model2.compile(loss='mean_squared_error', optimizer=optimizer)
for nb, layer in enumerate(model.layers):
model2.layers[nb].set_weights(layer.get_weights())
You are simply rewritting weights from one model to another.
You have defined the input_shape in the first layer. Therefore sending a shape that does not match the preset-ed input_shape is in valid.
There are two way to achieve that:
You can modify your model by changing
batch_input_shape=(100, 5, 1)
to
input_shape=(5, 1) to avoid a preset-ed batch size. You can setup the batch_size=100 in model.fit().
Edit: Method 2
You define the exact same model as model2. Then model2.set_weights(model1.get_weights()).
If you want to use stateful==True, you actually want to use the hidden layers from the last batch as the initial states for the next batch. Therefore very batch size should be matched. Otherwise, you can just remove the stateful==True.

Categories

Resources