TensorFlow Extended | Trainer Not Warm Starting With GenericExecutor & Keras Model

TensorFlow Extended | Trainer Not Warm Starting With GenericExecutor & Keras Model - python

I'm presently trying to get a Trainer component of a TFX pipeline to warm-start from a previous run of the same pipeline. The use case is:
Run the pipeline once, produce a model.
As new data comes in, train the existing model with the new data.
I am aware the ResolverNode component is designed for this purpose, so you can see how I utilize it below:
# detect the previously trained model
latest_model_resolver = ResolverNode(
instance_name='latest_model_resolver',
resolver_class=latest_artifacts_resolver.LatestArtifactsResolver,
latest_model=Channel(type=Model))
context.run(latest_model_resolver)
# set prior model as base_model
train_file = 'tfx_modules/recommender_train.py'
trainer = Trainer(
module_file=os.path.abspath(train_file),
custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor),
transformed_examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
schema=schema_gen.outputs['schema'],
train_args=trainer_pb2.TrainArgs(num_steps=10000),
eval_args=trainer_pb2.EvalArgs(num_steps=5000),
base_model=latest_model_resolver.outputs['latest_model'])
The components above run successfully, and the ResolverNode is able to detect the latest model from prior pipeline runs. No error is thrown - however, when running context.run(trainer), the model loss basically begins where it started the first time. After the model's first run, it finishes training loss ~0.1, however, upon the second run (with the supposed warm-start), it restarts ~18.2.
This leads me to believe all weights were re-initialized, which I don't believe should have occurred. Below are the relevant model construction functions:
def build_keras_model():
"""build keras model"""
embedding_max_values = load(open(os.path.abspath('tfx-example/user_artifacts/embedding_max_dict.pkl'), 'rb'))
embedding_dimensions = dict([(key, 20) for key in embedding_max_values.keys()])
embedding_pairs = [recommender.EmbeddingPair(embedding_name=feature,
embedding_dimension=embedding_dimensions[feature],
embedding_max_val=embedding_max_values[feature])
for feature in recommender_constants.univalent_features]
numeric_inputs = []
for num_feature in recommender_constants.numeric_features:
numeric_inputs.append(keras.Input(shape=(1,), name=num_feature))
input_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.input_layers]
pre_concat_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.embedding_layers]
concat = keras.layers.Concatenate()(pre_concat_layers) if len(pre_concat_layers) > 1 else pre_concat_layers[0]
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(concat)
output = keras.layers.Dense(1, kernel_initializer='lecun_uniform', name='out')(layer_1)
model = keras.models.Model(input_layers, outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')
return model
def run_fn(fn_args: TrainerFnArgs):
"""function for the Trainer component"""
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(fn_args.train_files, fn_args.data_accessor,
tf_transform_output, 40)
eval_dataset = _input_fn(fn_args.eval_files, fn_args.data_accessor,
tf_transform_output, 40)
model = build_keras_model()
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=fn_args.model_run_dir, update_freq='epoch', histogram_freq=1,
write_images=True)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)
signatures = {
'serving_default':
_get_serve_tf_examples_fn(model, tf_transform_output).get_concrete_function(tf.TensorSpec(
shape=[None],
dtype=tf.string,
name='examples')
)
}
model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
To research the problem, I have perused:
Warm Start Example From TFX
https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_warmstart.py
However, this guide uses the Estimator component instead of the Keras components. That component has a warm_start_from initialization parameter which I couldn't find for the Keras equivalent.
I suspect:
Either the warm-start functionality is only available for Estimator components and won't take effect even if base_model is set for Keras components.
I am somehow telling the model to re-initialize weights even after successfully loading the prior model - in that case I would love a pointer as to where that's happening.
Any assistance would be great! Much thanks.

With Keras models you have to load the model first using the base model path, then you can continue training from there instead of building a new model.
Your Trainer component looks correct, but in run_fn do the following instead:
def run_fn(fn_args: FnArgs):
model = tf.keras.models.load_model(fn_args.base_model)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)

Related

How to know the trained model is correct?

I use PyTorch Lightning for model training, during which I use ModelCheckpoint to save loading points. Finally, I would like to know whether the model is loaded correctly. Let me know if you require further information?
checkpoint_callback = ModelCheckpoint(
filename='tb1000_{epoch: 02d}-{step}',
monitor='val/acc#1',
save_top_k=5,
mode='max')
wandb_logger = pl.loggers.wandb.WandbLogger(
name=run_name,
project=args.project,
entity=args.entity,
offline=args.offline,
log_model='all')
model = BYOL(**args.__dict__, num_classes=dm.num_classes)
trainer = pl.Trainer.from_argparse_args(args,
logger=wandb_logger, callbacks=[checkpoint_callback])
trainer.fit(model, dm)
# Loading and testing
model_test = BYOL(**args.__dict__, num_classes=dm.num_classes)
path = "/tb100_epoch= 819-step=39359.ckpt"
model_test.load_from_checkpoint(path)

load_from_checkpoint() will return a model with trained weights, so you need to assign it to a new variable.
model_test = model_test.load_from_checkpoint(path)
or
model_test = BYOL.load_from_checkpoint(path)

Loading a converted pytorch model in huggingface transformers properly

I converted a pre-trained tf model to pytorch using the following function.
def convert_tf_checkpoint_to_pytorch(*, tf_checkpoint_path, albert_config_file, pytorch_dump_path):
# Initialise PyTorch model
config = AlbertConfig.from_json_file(albert_config_file)
print("Building PyTorch model from configuration: {}".format(str(config)))
model = AlbertForPreTraining(config)
# Load weights from tf checkpoint
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
I am loading the converted model and encoding sentences in the following way:
def vectorize_sentence(text):
albert_tokenizer = AlbertTokenizer.from_pretrained("albert-base-v2")
config = AlbertConfig.from_pretrained(config_path, output_hidden_states=True)
model = TFAlbertModel.from_pretrained(pytorch_dir, config=config, from_pt=True)
e = albert_tokenizer.encode(text, max_length=512)
model_input = tf.constant(e)[None, :] # Batch size 1
output = model(model_input)
v = [0] * 768
# generate sentence vectors by averaging the word vectors
for i in range(1, len(model_input[0]) - 1):
v = v + output[0][0][i].numpy()
vector = v/len(model_input[0])
return vector
However while loading the model, a warning comes up:
Some weights or buffers of the PyTorch model TFAlbertModel were not
initialized from the TF 2.0 model and are newly initialized:
['predictions.LayerNorm.bias', 'predictions.dense.weight',
'predictions.LayerNorm.weight', 'sop_classifier.classifier.bias',
'predictions.dense.bias', 'sop_classifier.classifier.weight',
'predictions.decoder.bias', 'predictions.bias',
'predictions.decoder.weight'] You should probably TRAIN this model on
a down-stream task to be able to use it for predictions and inference.
Can anyone tell me if I am doing anything wrong? What does the warning mean? I saw issue #5588 on the github repo of Transformers. Don't know if my issue is the same as this.

I think you could try using
model = AlbertModel.from_pretrained
instead of
model = TFAlbertModel.from_pretrained
in the VectorizeSentence definition.
AlbertModel is the name of the class for the pytorch format model, and TFAlbertModel is the name of the class for the tensorflow format model.
I'm not sure exactly what load_tf_weights_in_albert() does, but I think that once you have done that your model is in pytorch format.

different prediction after load a model in keras

I have a Sequential Model built in Keras and after trained it give me good prediction but when i save and then load the model i don't obtain the same prediction on the same dataset. Why?
Note that I checked the weight of the model and they are the same as well as the architecture of the model, checked with model.summary() and model.getWeights(). This is very strange in my opinion and I have no idea how to deal with this problem.
I don't have any error but the prediction are different
I tried to use model.save() and load_model()
I tried to use model.save_weights() and after that re-built the model and then load the model
I have the same problem with both options.
def Classifier(input_shape, word_to_vec_map, word_to_index, emb_dim, num_activation):
sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.15)(X)
X = LSTM(128)(X)
X = Dropout(0.15)(X)
X = Dense(num_activation, activation='softmax')(X)
model = Model(sentence_indices, X)
sequentialModel = Sequential(model.layers)
return sequentialModel
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
...
model.fit(Y_train_indices, Z_train_oh, epochs=30, batch_size=32, shuffle=True)
# attempt 1
model.save('classificationTest.h5', True, True)
modelRNN = load_model(r'C:\Users\Alessio\classificationTest.h5')
# attempt 2
model.save_weights("myWeight.h5")
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
model.load_weights(r'C:\Users\Alessio\myWeight.h5')
# PREDICTION TEST
code_train, category_train, category_code_train, text_train = read_csv_for_email(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
categories, code_categories = get_categories(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
X_my_sentences = text_train
Y_my_labels = category_code_train
X_test_indices = sentences_to_indices(X_my_sentences, word_to_index, maxLen)
pred = model.predict(X_test_indices)
def codeToCategory(categories, code_categories, current_code):
i = 0;
for code in code_categories:
if code == current_code:
return categories[i]
i = i + 1
return "no_one_find"
# result
for i in range(len(Y_my_labels)):
num = np.argmax(pred[i])
# Pretrained embedding layer
def pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
embedding_layer -- pretrained layer Keras instance
"""
vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement)
### START CODE HERE ###
# Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
emb_matrix = np.zeros((vocab_len, emb_dim))
# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec_map[word]
# Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False.
embedding_layer = Embedding(vocab_len, emb_dim)
### END CODE HERE ###
# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
embedding_layer.build((None,))
# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
return embedding_layer
Do you have any kind of suggestion?
Thanks in Advance.
Edit1: if use the code of saving and loading in the same "page" (I'm using notebook jupyter) it works fine. If I change "page" it doesn't work. Could it be that there is something related with the tensorflow session?
Edit2: my final goal is to load a model, trained in Keras, with Deeplearning4J in java. So if you know a solution for "transforming" the keras model in something else readable in DL4J it will help anyway.
Edit3: add function pretrained_embedding_layer()
Edit4: dictionaries from word2Vec model read with gensim
from gensim.models import Word2Vec
model = Word2Vec.load('C:/Users/Alessio/Desktop/emoji_ita/embedding/glove_WIKI')
def getMyModels (model):
word_to_index = dict({})
index_to_word = dict({})
word_to_vec_map = dict({})
for idx, key in enumerate(model.wv.vocab):
word_to_index[key] = idx
index_to_word[idx] = key
word_to_vec_map[key] = model.wv[key]
return word_to_index, index_to_word, word_to_vec_map

Are you pre-processing your data in the same way when you load your model ?
And if yes, did you set the seed of your pre-processing functions ?
If you build a dictionnary with keras, are the sentences coming in the same order ?

I had the same problem before, so here is how you solve it. After making sure that the weights and summary are the same, try to print your random seed and check. If its value is changing from a session to another and if you tried tensorflow's seed, it means you need to disable the PYTHONHASHSEED environment variable. You can read more about it here:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
To disable it, go to your system environment variables and add PYTHONHASHSEED as a new variable if it doesn't exist. Then, set its value to 0 to disable it. Please note that it was done in this way because it has to be disabled before running the interpreter.

Adding Dropout to testing/inference phase

I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?

As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])

your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.

As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.

Tensorboard loss plots go back in time (Keras)

I have a similar problem to the one described here, only I am using Keras with Tensorflow backend rather then Tensorflow.
I have trained 18 MLP models for time series forecasting with different meta parameters, all created with the same basic architecture. The 3 meta parameters I am scanning are the lookahead that the model uses for prediction, the depth of the model, and whether I am using L2 regularization.
model = Sequential()
# input_shape should be a 3D tensor with shape (batch_size, timesteps ,input_dim)
model.add(Flatten())
# hidden layer sizes should drop gradually from 256 to 2*lookahead
hidden_layer_sizes = [int(256 - i * (256 - 2 * lookahead) / depth) for i in range(depth)]
for hidden_layer_size in hidden_layer_sizes:
if regularization:
model.add(Dense(hidden_layer_size, kernel_initializer="he_normal",
kernel_regularizer = regularizers.l2(0.01), activation=activations))
else:
model.add(Dense(hidden_layer_size, kernel_initializer="he_normal",
activation=activations))
model.add(Dense(2 * lookahead))
loss = losses.mean_squared_error
model.compile(loss=loss, optimizer=self.kwargs["optimizer"], metrics=['mae'])
Each model's tensorboard data is saved in a seperate folder through the relevant Keras callback
callback_tensorboard = TensorBoard(log_dir=log_dir,
histogram_freq=5,
write_graph=False,
write_grads=True,
write_images=False)
but for some reason, 3 out of the 18 models have two tensorboard files saved instead of one, and the resulting graphs show this weird phenomena of progressing backwards through time
Why does this happen? and other then deleting the second tensorboard file, what can I do to prevent this?

It happens because every time you call mode.fit() you reinitialize the internal epoch counter in keras. However you could set it up manually when you compile the model like:
model.compile(optimizer=tf.train.AdamOptimizer(),
loss=tf.losses.mean_squared_error)
init_epoch = 0
And then when ypu call fit, you can pass additional argument to keras, which describes current epoch:
epoch_count = 200
init_epoch += epoch_count
history = model.fit(x_train, y_train,
batch_size=256,
epochs=init_epoch,
validation_data=(x_test, y_test),
verbose=1,
initial_epoch=init_epoch-epoch_count,
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

TensorFlow Extended | Trainer Not Warm Starting With GenericExecutor & Keras Model - python

Related

How to know the trained model is correct?

Loading a converted pytorch model in huggingface transformers properly

different prediction after load a model in keras

Adding Dropout to testing/inference phase

Tensorboard loss plots go back in time (Keras)

Categories

Resources