Load saved checkpoint and predict not producing same results as in training

Load saved checkpoint and predict not producing same results as in training - python

I'm training based on a sample code I found on the Internet. The accuracy in testing is at 92% and the checkpoints are saved in a directory. In parallel (the training is running for 3 days now) I want to create my prediction code so I can learn more instead of just waiting.
This is my third day of deep learning so I probably don't know what I'm doing. Here's how I'm trying to predict:
Instantiate the model using the same code as in training
Load the last checkpoint
Try to predict
The code works but the results are nowhere near 90%.
Here's how I create the model:
INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal" # : Gaussian initialization scaled by fan_in (He et al., 2014)
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")
def generate_model(output_len, chars=None):
"""Generate the model"""
print('Build model...')
chars = chars or CHARS
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE
# note: in a situation where your input sequences have a variable length,
# use input_shape=(None, nb_feature).
for layer_number in range(INPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, input_shape=(None, len(chars)), init=INITIALIZATION,
return_sequences=layer_number + 1 < INPUT_LAYERS))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For the decoder's input, we repeat the encoded input for each time step
model.add(RepeatVector(output_len))
# The decoder RNN could be multiple layers stacked or a single layer
for _ in range(OUTPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, return_sequences=True, init=INITIALIZATION))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For each of step of the output sequence, decide which character should be chosen
model.add(TimeDistributed(Dense(len(chars), init=INITIALIZATION)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
In a separate file predict.py I import this method to create my model and try to predict:
...import code
model = generate_model(len(question), dataset['chars'])
model.load_weights('models/weights.204-0.20.hdf5')
def decode(pred):
return character_table.decode(pred, calc_argmax=False)
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, t, character_table.char_indices[char]] = 1.
preds = model.predict_classes([x], verbose=0)[0]
print("======================================")
print(decode(preds))
I don't know what the problem is. I have about 90 checkpoints in my directory and I'm loading the last one based on accuracy. All of them saved by a ModelCheckpoint:
checkpoint = ModelCheckpoint(MODEL_CHECKPOINT_DIRECTORYNAME + '/' + MODEL_CHECKPOINT_FILENAME,
save_best_only=True)
I'm stuck. What am I doing wrong?

In the repo you provided, the training and validation sentences are inverted before being fed into the model (as commonly done in seq2seq learning).
dataset = DataSet(DATASET_FILENAME)
As you can see, the default value for inverted is True, and the questions are inverted.
class DataSet(object):
def __init__(self, dataset_filename, test_set_fraction=0.1, inverted=True):
self.inverted = inverted
...
question = question[::-1] if self.inverted else question
questions.append(question)
You can try to invert the sentences during prediction. Specifically,
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, len(question) - t - 1, character_table.char_indices[char]] = 1.

When you generate model in your predict.py file:
model = generate_model(len(question), dataset['chars'])
is your first parameter the same as in your training file? Or is the question length dynamic? If so, you are generating different model, thus your saved checkpoint doesn't work.

It can be the dimentionality of arrays/df passed not matching what is expected by the functions you call. When a single dimention is expected by the called method, try ravel on what you expect to be a single dimention

Related

LSTM autoencoder for anomaly detection

I'm testing out different implementation of LSTM autoencoder on anomaly detection on 2D input.
My question is not about the code itself but about understanding the underlying behavior of each network.
Both implementation have the same number of units (16). Model 2 is a "typical" seq to seq autoencoder with the last sequence of the encoder repeated "n" time to match the input of the decoder.
I'd like to understand why Model 1 seem to easily over-perform Model 2 and why Model 2 isn't able to do better than the mean ?
Model 1:
class LSTM_Detector(Model):
def __init__(self, flight_len, param_len, hidden_state=16):
super(LSTM_Detector, self).__init__()
self.input_dim = (flight_len, param_len)
self.units = hidden_state
self.encoder = layers.LSTM(self.units,
return_state=True,
return_sequences=True,
activation="tanh",
name='encoder',
input_shape=self.input_dim)
self.decoder = layers.LSTM(self.units,
return_sequences=True,
activation="tanh",
name="decoder",
input_shape=(self.input_dim[0],self.units))
self.dense = layers.TimeDistributed(layers.Dense(self.input_dim[1]))
def call(self, x):
output, hs, cs = self.encoder(x)
encoded_state = [hs, cs] # see https://www.tensorflow.org/guide/keras/rnn
decoded = self.decoder(output, initial_state=encoded_state)
output_decoder = self.dense(decoded)
return output_decoder
Model 2:
class Seq2Seq_Detector(Model):
def __init__(self, flight_len, param_len, hidden_state=16):
super(Seq2Seq_Detector, self).__init__()
self.input_dim = (flight_len, param_len)
self.units = hidden_state
self.encoder = layers.LSTM(self.units,
return_state=True,
return_sequences=False,
activation="tanh",
name='encoder',
input_shape=self.input_dim)
self.repeat = layers.RepeatVector(self.input_dim[0])
self.decoder = layers.LSTM(self.units,
return_sequences=True,
activation="tanh",
name="decoder",
input_shape=(self.input_dim[0],self.units))
self.dense = layers.TimeDistributed(layers.Dense(self.input_dim[1]))
def call(self, x):
output, hs, cs = self.encoder(x)
encoded_state = [hs, cs] # see https://www.tensorflow.org/guide/keras/rnn
repeated_vec = self.repeat(output)
decoded = self.decoder(repeated_vec, initial_state=encoded_state)
output_decoder = self.dense(decoded)
return output_decoder
I fitted this 2 models for 200 Epochs on a sample of data (89, 1500, 77) each input being a 2D aray of (1500, 77). And the test data (10,1500,77). Both model had only 16 units.
Here or the results of the autoencoder on one features of the test data.
Results Model 1: (black line is truth, red in reconstructed image)
Results Model 2:
I understand the second one is more restrictive since all the information from the input sequence is compressed into one step, but I'm still surprise that it's barely able to do better than predict the average.
On the other hand, I feel Model 1 tends to be more "influenced" by new data without giving back the input. see example below of Model 1 having a flat line as input :
PS : I know it's not a lot of data for that kind of model, I have much more available but at this stage I'm just experimenting and trying to build my understanding.
PS 2 : Neither models overfitted their data and the training and validation curve are almost text book like.
Is anyone able to explain why there is such a gap in term of behavior ?
Thank you

In model 1, each point of 77 features is compressed and decompressed this way: 77->16->16->77 plus some info from the previous steps. It seems that replacing LSTMs with just TimeDistributed(Dense(...)) may also work in this case, but cannot say for sure as I don't know the data. The third image may become better.
What predicts model 2 usually happens when there is no useful signal in the input and the best thing model can do (well, optimize to do) is just to predict the mean target value of the training set.
In model 2 you have:
...
self.encoder = layers.LSTM(self.units,
return_state=True,
return_sequences=False,
...
and then
self.repeat = layers.RepeatVector(self.input_dim[0])
So, in fact, when it does
repeated_vec = self.repeat(output)
decoded = self.decoder(repeated_vec, initial_state=encoded_state)
it just takes only one last output from the encoder (which in this case represents the last step of 1500), copies it 1500 times (input_dim[0]), and tries to predict all 1500 values from the information about a couple of last ones. Here is where the model loses most of the useful signal. It does not have enough/any information about the input, and the best thing it can learn in order to minimize the loss function (which I suppose in this case is MSE or MAE) is to predict the mean value for each of the features.
Also, a seq to seq model usually passes a prediction of a decoder step as an input to the next decoder step, in the current case, it is always the same value.
TL;DR 1) seq-to-seq is not the best model for this case; 2) due to the bottleneck it cannot really learn to do anything better than just to predict the mean value for each feature.

Correctly structuring text data for text generation with Tensorflow model

I am trying to train my model to generate sentences no longer that 210 characters. From what I have read I have only seen training on 'continuous' text. Like a book. However I am trying to train my model on single sentences.
I'm pretty new to tensorflow and ML so right now I am able to train my model but it generates garbage, seemingly random text. I have 10,000 sentences so I think I have sufficient data.
Overview of my data
Structure [['SENTENCE'], ['SENTENCE2']...]
Data Prep
tokenizer = keras.preprocessing.text.Tokenizer(num_words=209, lower=False, char_level=True, filters='#$%&()*+-<=>#[\\]^_`{|}~\t\n')
tokenizer.fit_on_texts(df['title'].values)
df['encoded_with_keras'] = tokenizer.texts_to_sequences(df['title'].values)
dataset = df['encoded_with_keras'].values
dataset = tf.keras.preprocessing.sequence.pad_sequences(dataset, padding='post')
dataset = dataset.flatten()
dataset = tf.data.Dataset.from_tensor_slices(dataset)
sequences = dataset.batch(seq_len+1, drop_remainder=True)
def create_seq_targets(seq):
input_txt = seq[:-1]
target_txt = seq[1:]
return input_txt, target_txt
dataset = sequences.map(create_seq_targets)
dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)
Model
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
model = Sequential()
model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None],input_length=209, mask_zero=True))
model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True,))
model.add(Dropout(0.2))
model.add(Dense(258, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(vocab_size, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy")
return model
When I give the model a sequence to start from I get back absolute nonsense and eventually the model predicts a 0 which is not in the char_index mapping.
Edit
Text Generation
epochs = 2
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
model = create_model(vocab_size = vocab_size,
embed_dim=embed_dim,
rnn_neurons=rnn_neurons,
batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
def generate_text(model, start_string):
num_generate = 200
input_eval = [char_2_index[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
text_generated = []
temperature = 1
# model.reset_states()
for i in range(num_generate):
print(text_generated)
predictions = model(input_eval)
predictions = tf.squeeze(predictions, 0)
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
print(predicted_id)
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(index_2_char[predicted_id])
return (start_string + ''.join(text_generated))

There are a few things that must be changed on the first sight.
Tokenizer must have num_words = vocab_size
At first (didn't analyse it deeply), I can't imagine why you're flattening your dataset and getting slices if it's probably correctly structured
You cannot use stateful=True if you don't want that "batch 2 is a sequel of batch 1", you have individual sentences, so stateful=False. (Unless you are training correctly with manual training loops and resetting states for each batch, which is unnecessary trouble in the training phase)
What you need to check visually:
Input data must have format like:
[
[1,2,3,6,10,4,10, ...up to sentence length - 1...],
[5,6,3,6,7,3,11,... up to sentence length - 1...],
.... up to number of sentences ...
]
Output data must then be:
[
[2,3,6,10,4,10,15 ...], #equal to input data, shifted by 1
[6,3,6,7,3,11,13, ...],
...
]
Print a few rows of them to check if they're correctly preprocessed as intended.
Training will then be easy:
model.fit(input_data, output_data, epochs=....)
Yes, your model will predict zeros, as you have zeros in your data, that's not weird: you did a pad_sequences.
You can interpret a zero as a "sentence end" in this case, since you did a 'post' pading. When your model gives you a zero, it decided that the sentence it's generating should end at that point - if it was well trained, it will probably continue outputting zeros for that sentence from this point on.
Generating new senteces
This part is more complex and you need to rewrite the model, now being stative=True, and transfer the weights from the trained model to this new model.
Before anything, call model.reset_states().
You will need to manually feed a batch with shape (number_of_sentences=batch_size, 1). This will be the "first character" of each of the sentences it will generate. The output will be the "second character" of each sentence.
Get this output and feed the model with it. It will generate the "third character" of each sentence. And so on.
When all outputs are zero, all sentences are fully generated and you can stop the loop.
Call model.reset_states() again before trying to generate a new batch of sentences.
You can find examples of this kind of predicting here: https://stackoverflow.com/a/50235563/2097240

Performing one hot encoding inside a Lambda layer in Keras to avoid memory issues: problem

I've made a sequential model in keras, for generating musical sequences. Something very simple, with LSTM and dense softmax. I have 333 possible musical events
I know that model.fit() needs all training data in memory, which is a problem if it is one hot encoded. So I give the model an integer as input, transform this to one hot encoding in a Lambda layer, and then use sparse categorical cross entropy for loss. Because each batch would be transformed to one hot encoding on the fly, I thought that this would sort out my memory issues. But instead, it hangs at the beginning of fitting, and fills up my memory, even with small batch size. Evidently, I'm not understanding something about how keras works, which is not surprising, given that I'm new to it (and on that note, please point out anything too naive in my code).
1) what is happening behind the scenes? What is it about keras that I'm not understanding? It seems like keras is going ahead and running the Lambda layer on all of my training examples before doing any training.
2) How can I solve this, and make keras do it truly on the fly? Can I solve it with model.fit(), which I'm currently using, or do I need model.fit_generator(), which to me looks like it could solve this rather easily?
Here is some of my code:
def musicmodel(Tx, n_a, n_values):
"""
Arguments:
Tx -- length of a sequence in the corpus
n_a -- the number of activations used in our model (for the LSTM)
n_values -- number of unique values in the music data
Returns:
model -- a keras model
"""
# Define the input with a shape
X = Input(shape=(Tx,))
# Define s0, initial hidden state for the decoder LSTM
a0 = Input(shape=(n_a,), name='a0')
c0 = Input(shape=(n_a,), name='c0')
a = a0
c = c0
# Create empty list to append the outputs to while iterating
outputs = []
# Step 2: Loop
for t in range(Tx):
# select the "t"th time step from X.
x = Lambda(lambda x: x[:,t])(X)
# We need the class represented in one hot fashion:
x = Lambda(lambda x: tf.one_hot(K.cast(x, dtype='int32'), n_values))(x)
# We then reshape x to be (1, n_values)
x = reshapor(x)
# Perform one step of the LSTM_cell
a, _, c = LSTM_cell(x, initial_state=[a, c])
# Apply densor to the hidden state output of LSTM_Cell
out = densor(a)
# Add the output to "outputs"
outputs.append(out)
# Step 3: Create model instance
model = Model(inputs=[X,a0,c0],outputs=outputs)
return model
I then fit my model:
model = musicmodel(Tx, n_a, n_values)
opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
a0 = np.zeros((m, n_a))
c0 = np.zeros((m, n_a))
model.fit([X, a0, c0], list(Y), validation_split=0.25, epochs=600, verbose=2, batch_size=4)

different prediction after load a model in keras

I have a Sequential Model built in Keras and after trained it give me good prediction but when i save and then load the model i don't obtain the same prediction on the same dataset. Why?
Note that I checked the weight of the model and they are the same as well as the architecture of the model, checked with model.summary() and model.getWeights(). This is very strange in my opinion and I have no idea how to deal with this problem.
I don't have any error but the prediction are different
I tried to use model.save() and load_model()
I tried to use model.save_weights() and after that re-built the model and then load the model
I have the same problem with both options.
def Classifier(input_shape, word_to_vec_map, word_to_index, emb_dim, num_activation):
sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.15)(X)
X = LSTM(128)(X)
X = Dropout(0.15)(X)
X = Dense(num_activation, activation='softmax')(X)
model = Model(sentence_indices, X)
sequentialModel = Sequential(model.layers)
return sequentialModel
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
...
model.fit(Y_train_indices, Z_train_oh, epochs=30, batch_size=32, shuffle=True)
# attempt 1
model.save('classificationTest.h5', True, True)
modelRNN = load_model(r'C:\Users\Alessio\classificationTest.h5')
# attempt 2
model.save_weights("myWeight.h5")
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
model.load_weights(r'C:\Users\Alessio\myWeight.h5')
# PREDICTION TEST
code_train, category_train, category_code_train, text_train = read_csv_for_email(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
categories, code_categories = get_categories(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
X_my_sentences = text_train
Y_my_labels = category_code_train
X_test_indices = sentences_to_indices(X_my_sentences, word_to_index, maxLen)
pred = model.predict(X_test_indices)
def codeToCategory(categories, code_categories, current_code):
i = 0;
for code in code_categories:
if code == current_code:
return categories[i]
i = i + 1
return "no_one_find"
# result
for i in range(len(Y_my_labels)):
num = np.argmax(pred[i])
# Pretrained embedding layer
def pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
embedding_layer -- pretrained layer Keras instance
"""
vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement)
### START CODE HERE ###
# Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
emb_matrix = np.zeros((vocab_len, emb_dim))
# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec_map[word]
# Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False.
embedding_layer = Embedding(vocab_len, emb_dim)
### END CODE HERE ###
# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
embedding_layer.build((None,))
# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
return embedding_layer
Do you have any kind of suggestion?
Thanks in Advance.
Edit1: if use the code of saving and loading in the same "page" (I'm using notebook jupyter) it works fine. If I change "page" it doesn't work. Could it be that there is something related with the tensorflow session?
Edit2: my final goal is to load a model, trained in Keras, with Deeplearning4J in java. So if you know a solution for "transforming" the keras model in something else readable in DL4J it will help anyway.
Edit3: add function pretrained_embedding_layer()
Edit4: dictionaries from word2Vec model read with gensim
from gensim.models import Word2Vec
model = Word2Vec.load('C:/Users/Alessio/Desktop/emoji_ita/embedding/glove_WIKI')
def getMyModels (model):
word_to_index = dict({})
index_to_word = dict({})
word_to_vec_map = dict({})
for idx, key in enumerate(model.wv.vocab):
word_to_index[key] = idx
index_to_word[idx] = key
word_to_vec_map[key] = model.wv[key]
return word_to_index, index_to_word, word_to_vec_map

Are you pre-processing your data in the same way when you load your model ?
And if yes, did you set the seed of your pre-processing functions ?
If you build a dictionnary with keras, are the sentences coming in the same order ?

I had the same problem before, so here is how you solve it. After making sure that the weights and summary are the same, try to print your random seed and check. If its value is changing from a session to another and if you tried tensorflow's seed, it means you need to disable the PYTHONHASHSEED environment variable. You can read more about it here:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
To disable it, go to your system environment variables and add PYTHONHASHSEED as a new variable if it doesn't exist. Then, set its value to 0 to disable it. Please note that it was done in this way because it has to be disabled before running the interpreter.

Obtain output at each timestep in Keras LSTM

What I want to do is give the output of the LSTM at the current timestep as input to the LSTM in the next timestep. So I want the LSTM to predict the word at at the current time step and give this word as input to the next timestep. Is this can be done then:
How can the input and target data be specified during training i.e. in the model.fit() function?

You cannot do it directly in keras but you could do it using a for loop and stateful network. This works like this (assuming that you have your sentences stored as sequences of integers with size=vocabulary_size:
Define a stateful network which accepts one word and returns the following one:
model = Input(batch_size=(batch_size, 1, 1))
model = Embedding(size_of_vocabulary, output_size, input_length=1)(input)
model = LSTM(lstm_1st_layer_size, return_sequences=True, stateful=True)(model)
....
model = LSTM(lstm_nth_layer_size, return_sequences=True, stateful=True)(model)
model = Dense(vocabulary_size, activation="softmax")
model.compile(loss="sparse_categorical_crossentropy", optimizer="rmsprop")
Assuming that you have a numpy array_of_samples of examples (preceding_word, next_word) you could fit it by:
model.fit(array_of_samples[:,0], array_of_samples[:,1])
Now you could try to predict stuff in a following manner:
sentence = [starting_word]
for i in range(len_of_sequence - 1):
sentence.append(model.predict(numpy.array([[sentence[i]]).argmax())
Now sentence stores your newly created sentence

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.