What I want to do is give the output of the LSTM at the current timestep as input to the LSTM in the next timestep. So I want the LSTM to predict the word at at the current time step and give this word as input to the next timestep. Is this can be done then:
How can the input and target data be specified during training i.e. in the model.fit() function?
You cannot do it directly in keras but you could do it using a for loop and stateful network. This works like this (assuming that you have your sentences stored as sequences of integers with size=vocabulary_size:
Define a stateful network which accepts one word and returns the following one:
model = Input(batch_size=(batch_size, 1, 1))
model = Embedding(size_of_vocabulary, output_size, input_length=1)(input)
model = LSTM(lstm_1st_layer_size, return_sequences=True, stateful=True)(model)
....
model = LSTM(lstm_nth_layer_size, return_sequences=True, stateful=True)(model)
model = Dense(vocabulary_size, activation="softmax")
model.compile(loss="sparse_categorical_crossentropy", optimizer="rmsprop")
Assuming that you have a numpy array_of_samples of examples (preceding_word, next_word) you could fit it by:
model.fit(array_of_samples[:,0], array_of_samples[:,1])
Now you could try to predict stuff in a following manner:
sentence = [starting_word]
for i in range(len_of_sequence - 1):
sentence.append(model.predict(numpy.array([[sentence[i]]).argmax())
Now sentence stores your newly created sentence
Related
I am trying to create a model using Keras which gives a input value to another non-keras word embedding model and get embedding of word back into this main keras model.
So, the problem is keras model run on graph and while giving input to non keras model, the value of input is of the form
Tensor("IteratorGetNext:9", shape=(None,), dtype=string)
<class 'tensorflow.python.framework.ops.Tensor'>
But the word embedding model (Fasttext) only accept input in string format. So how to get the value of string in Tensor("IteratorGetNext:9", shape=(None,), dtype=string) and give it as input to Fasttext word embedding model and get output embedding and convert this output numpy embeddings back into tensor to be used into Keras model?
One way is to turn run_eagerly = True in model.compile() as it would give all values in numpy format instead of Tensor (running in eager mode instead of graph mode).
tf.Tensor([list of strings], shape=(batch_size,), dtype=string)
<class 'tensorflow.python.framework.ops.EagerTensor'>
But it would make model training time high.
Code :
class MovieModel(tf.keras.Model):
def __init__(self):
super().__init__()
max_tokens = 10_000
self.title_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_movie_titles,mask_token=None),
tf.keras.layers.Embedding(len(unique_movie_titles) + 1, 32)
])
def call(self, inputs):
return tf.concat([
self.title_embedding(inputs["movie_title"]),
#Word embeddings of "movie_title"
], axis=1)
#Now I want to give "movie_title" as an input to FastText Model.
inputs["movie_title"] is a batch of movie titles. So I want to
process each string of this batch and split them into words and
call FastText Model to get embedding of every word and take
their average. And pass this averaged embedding along with
self.title_embedding in def.call(). But as Keras train model on
Graph, I am not able to acccess the values of string in batch.
I can access the embeddings of word using fasttext_model.wv[word]
I am using this as reference to build my model.
So I am building a keras sequential model in which the last output layer is an Upsampling2D layer & I need to feed the input image to that output layer to do a simple operation and return the output, any ideas?
EDIT :
The model mentioned before is the generator of a GAN model in which I need to add the input image to the output of the generator before feeding it to the discriminator
1.You can define a backbone model using inputs of pre-trained model and the outputs of the last layer before the output layer of pre-trained model
2.Base on that backbone model, defined new model have that new skip connection and the output layer as same as pre-trained model
3.Set the weights of output layer in new model to equal to weights of output layer in pre-trained model, using: new_model.layers[-1].set_weights(pre_model.layers[-1].get_weights())
Here is one good article about Adding Layers to the middle of a pre-trained network whithout invalidating the weights
So for the future reference, I solved it by using lambda layers as follow :
# z is the input I needed to use later on with the generator output to perform a certain function
generated_image = self.generator(z)
generated_image_modified=tf.keras.layers.Concatenate()([generated_image,z])
# with x[...,variable_you_need_range] you can access the input we just concatenated in your train loop
lambd = tf.keras.layers.Lambda(lambda x: your_function(x[...,2:5],x[...,:2]))(generated_image_modified)
full_model = self.discriminator(lambd)
self.combined = Model(z,outputs = full_model)
I am trying to train my model to generate sentences no longer that 210 characters. From what I have read I have only seen training on 'continuous' text. Like a book. However I am trying to train my model on single sentences.
I'm pretty new to tensorflow and ML so right now I am able to train my model but it generates garbage, seemingly random text. I have 10,000 sentences so I think I have sufficient data.
Overview of my data
Structure [['SENTENCE'], ['SENTENCE2']...]
Data Prep
tokenizer = keras.preprocessing.text.Tokenizer(num_words=209, lower=False, char_level=True, filters='#$%&()*+-<=>#[\\]^_`{|}~\t\n')
tokenizer.fit_on_texts(df['title'].values)
df['encoded_with_keras'] = tokenizer.texts_to_sequences(df['title'].values)
dataset = df['encoded_with_keras'].values
dataset = tf.keras.preprocessing.sequence.pad_sequences(dataset, padding='post')
dataset = dataset.flatten()
dataset = tf.data.Dataset.from_tensor_slices(dataset)
sequences = dataset.batch(seq_len+1, drop_remainder=True)
def create_seq_targets(seq):
input_txt = seq[:-1]
target_txt = seq[1:]
return input_txt, target_txt
dataset = sequences.map(create_seq_targets)
dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)
Model
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
model = Sequential()
model.add(Embedding(vocab_size, embed_dim, batch_input_shape=[batch_size, None],input_length=209, mask_zero=True))
model.add(LSTM(rnn_neurons, return_sequences=True, stateful=True,))
model.add(Dropout(0.2))
model.add(Dense(258, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(vocab_size, activation='softmax'))
model.compile(optimizer='adam', loss="sparse_categorical_crossentropy")
return model
When I give the model a sequence to start from I get back absolute nonsense and eventually the model predicts a 0 which is not in the char_index mapping.
Edit
Text Generation
epochs = 2
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
model = create_model(vocab_size = vocab_size,
embed_dim=embed_dim,
rnn_neurons=rnn_neurons,
batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
def generate_text(model, start_string):
num_generate = 200
input_eval = [char_2_index[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
text_generated = []
temperature = 1
# model.reset_states()
for i in range(num_generate):
print(text_generated)
predictions = model(input_eval)
predictions = tf.squeeze(predictions, 0)
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
print(predicted_id)
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(index_2_char[predicted_id])
return (start_string + ''.join(text_generated))
There are a few things that must be changed on the first sight.
Tokenizer must have num_words = vocab_size
At first (didn't analyse it deeply), I can't imagine why you're flattening your dataset and getting slices if it's probably correctly structured
You cannot use stateful=True if you don't want that "batch 2 is a sequel of batch 1", you have individual sentences, so stateful=False. (Unless you are training correctly with manual training loops and resetting states for each batch, which is unnecessary trouble in the training phase)
What you need to check visually:
Input data must have format like:
[
[1,2,3,6,10,4,10, ...up to sentence length - 1...],
[5,6,3,6,7,3,11,... up to sentence length - 1...],
.... up to number of sentences ...
]
Output data must then be:
[
[2,3,6,10,4,10,15 ...], #equal to input data, shifted by 1
[6,3,6,7,3,11,13, ...],
...
]
Print a few rows of them to check if they're correctly preprocessed as intended.
Training will then be easy:
model.fit(input_data, output_data, epochs=....)
Yes, your model will predict zeros, as you have zeros in your data, that's not weird: you did a pad_sequences.
You can interpret a zero as a "sentence end" in this case, since you did a 'post' pading. When your model gives you a zero, it decided that the sentence it's generating should end at that point - if it was well trained, it will probably continue outputting zeros for that sentence from this point on.
Generating new senteces
This part is more complex and you need to rewrite the model, now being stative=True, and transfer the weights from the trained model to this new model.
Before anything, call model.reset_states().
You will need to manually feed a batch with shape (number_of_sentences=batch_size, 1). This will be the "first character" of each of the sentences it will generate. The output will be the "second character" of each sentence.
Get this output and feed the model with it. It will generate the "third character" of each sentence. And so on.
When all outputs are zero, all sentences are fully generated and you can stop the loop.
Call model.reset_states() again before trying to generate a new batch of sentences.
You can find examples of this kind of predicting here: https://stackoverflow.com/a/50235563/2097240
I want to interpret an RNN by looking at the sequence-by-sequence values. It is possible to output these values with return_sequences. However, those values are then used as inputs into the next layer (e.g., a dense activation layer). I would like to output only the last value but record all values over the full sequence for interpretation. What's the easiest way to do this?
Create two models with the same layer, but in one of them you feed the Dense layer only with the last step of the RNN:
inputs = Input(inputShape)
outs = RNN(..., return_sequences=True)(inputs)
modelSequence = Model(inputs,outs)
#take only the last step
outs = Lambda(lambda x: x[:,-1])(outs)
outs = Dense(...)(outs)
modelSingle = Model(inputs,outs)
Use modelSingle,fit(x_data,y_data) to train as you've been doing normally.
Use modelSequence.predict(x_data) to see the results of the RNN without training.
I'm training based on a sample code I found on the Internet. The accuracy in testing is at 92% and the checkpoints are saved in a directory. In parallel (the training is running for 3 days now) I want to create my prediction code so I can learn more instead of just waiting.
This is my third day of deep learning so I probably don't know what I'm doing. Here's how I'm trying to predict:
Instantiate the model using the same code as in training
Load the last checkpoint
Try to predict
The code works but the results are nowhere near 90%.
Here's how I create the model:
INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal" # : Gaussian initialization scaled by fan_in (He et al., 2014)
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")
def generate_model(output_len, chars=None):
"""Generate the model"""
print('Build model...')
chars = chars or CHARS
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE
# note: in a situation where your input sequences have a variable length,
# use input_shape=(None, nb_feature).
for layer_number in range(INPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, input_shape=(None, len(chars)), init=INITIALIZATION,
return_sequences=layer_number + 1 < INPUT_LAYERS))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For the decoder's input, we repeat the encoded input for each time step
model.add(RepeatVector(output_len))
# The decoder RNN could be multiple layers stacked or a single layer
for _ in range(OUTPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, return_sequences=True, init=INITIALIZATION))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For each of step of the output sequence, decide which character should be chosen
model.add(TimeDistributed(Dense(len(chars), init=INITIALIZATION)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
In a separate file predict.py I import this method to create my model and try to predict:
...import code
model = generate_model(len(question), dataset['chars'])
model.load_weights('models/weights.204-0.20.hdf5')
def decode(pred):
return character_table.decode(pred, calc_argmax=False)
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, t, character_table.char_indices[char]] = 1.
preds = model.predict_classes([x], verbose=0)[0]
print("======================================")
print(decode(preds))
I don't know what the problem is. I have about 90 checkpoints in my directory and I'm loading the last one based on accuracy. All of them saved by a ModelCheckpoint:
checkpoint = ModelCheckpoint(MODEL_CHECKPOINT_DIRECTORYNAME + '/' + MODEL_CHECKPOINT_FILENAME,
save_best_only=True)
I'm stuck. What am I doing wrong?
In the repo you provided, the training and validation sentences are inverted before being fed into the model (as commonly done in seq2seq learning).
dataset = DataSet(DATASET_FILENAME)
As you can see, the default value for inverted is True, and the questions are inverted.
class DataSet(object):
def __init__(self, dataset_filename, test_set_fraction=0.1, inverted=True):
self.inverted = inverted
...
question = question[::-1] if self.inverted else question
questions.append(question)
You can try to invert the sentences during prediction. Specifically,
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, len(question) - t - 1, character_table.char_indices[char]] = 1.
When you generate model in your predict.py file:
model = generate_model(len(question), dataset['chars'])
is your first parameter the same as in your training file? Or is the question length dynamic? If so, you are generating different model, thus your saved checkpoint doesn't work.
It can be the dimentionality of arrays/df passed not matching what is expected by the functions you call. When a single dimention is expected by the called method, try ravel on what you expect to be a single dimention