Loading a converted pytorch model in huggingface transformers properly

Loading a converted pytorch model in huggingface transformers properly - python

I converted a pre-trained tf model to pytorch using the following function.
def convert_tf_checkpoint_to_pytorch(*, tf_checkpoint_path, albert_config_file, pytorch_dump_path):
# Initialise PyTorch model
config = AlbertConfig.from_json_file(albert_config_file)
print("Building PyTorch model from configuration: {}".format(str(config)))
model = AlbertForPreTraining(config)
# Load weights from tf checkpoint
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
I am loading the converted model and encoding sentences in the following way:
def vectorize_sentence(text):
albert_tokenizer = AlbertTokenizer.from_pretrained("albert-base-v2")
config = AlbertConfig.from_pretrained(config_path, output_hidden_states=True)
model = TFAlbertModel.from_pretrained(pytorch_dir, config=config, from_pt=True)
e = albert_tokenizer.encode(text, max_length=512)
model_input = tf.constant(e)[None, :] # Batch size 1
output = model(model_input)
v = [0] * 768
# generate sentence vectors by averaging the word vectors
for i in range(1, len(model_input[0]) - 1):
v = v + output[0][0][i].numpy()
vector = v/len(model_input[0])
return vector
However while loading the model, a warning comes up:
Some weights or buffers of the PyTorch model TFAlbertModel were not
initialized from the TF 2.0 model and are newly initialized:
['predictions.LayerNorm.bias', 'predictions.dense.weight',
'predictions.LayerNorm.weight', 'sop_classifier.classifier.bias',
'predictions.dense.bias', 'sop_classifier.classifier.weight',
'predictions.decoder.bias', 'predictions.bias',
'predictions.decoder.weight'] You should probably TRAIN this model on
a down-stream task to be able to use it for predictions and inference.
Can anyone tell me if I am doing anything wrong? What does the warning mean? I saw issue #5588 on the github repo of Transformers. Don't know if my issue is the same as this.

I think you could try using
model = AlbertModel.from_pretrained
instead of
model = TFAlbertModel.from_pretrained
in the VectorizeSentence definition.
AlbertModel is the name of the class for the pytorch format model, and TFAlbertModel is the name of the class for the tensorflow format model.
I'm not sure exactly what load_tf_weights_in_albert() does, but I think that once you have done that your model is in pytorch format.

Related

How to know the trained model is correct?

I use PyTorch Lightning for model training, during which I use ModelCheckpoint to save loading points. Finally, I would like to know whether the model is loaded correctly. Let me know if you require further information?
checkpoint_callback = ModelCheckpoint(
filename='tb1000_{epoch: 02d}-{step}',
monitor='val/acc#1',
save_top_k=5,
mode='max')
wandb_logger = pl.loggers.wandb.WandbLogger(
name=run_name,
project=args.project,
entity=args.entity,
offline=args.offline,
log_model='all')
model = BYOL(**args.__dict__, num_classes=dm.num_classes)
trainer = pl.Trainer.from_argparse_args(args,
logger=wandb_logger, callbacks=[checkpoint_callback])
trainer.fit(model, dm)
# Loading and testing
model_test = BYOL(**args.__dict__, num_classes=dm.num_classes)
path = "/tb100_epoch= 819-step=39359.ckpt"
model_test.load_from_checkpoint(path)

load_from_checkpoint() will return a model with trained weights, so you need to assign it to a new variable.
model_test = model_test.load_from_checkpoint(path)
or
model_test = BYOL.load_from_checkpoint(path)

Huggingface Transformers Tensorflow fine-tuned distilgpt2 bad outputs

I fine-tuned a model starting from the 'distilgpt2' checkpoint. I fit the model with the model.fit() method and saved the resulting model with the .save_pretrained() method.
When I use this model to generate text:
import transformers
from transformers import TFAutoModelForCausalLM, AutoTokenizer
original_model = 'distilgpt2'
path2model = 'clm_model_save'
path2tok = 'clm_tokenizer_save'
tuned_model = TFAutoModelForCausalLM.from_pretrained(path2model, from_pt=False)
tuned_tokenizer = AutoTokenizer.from_pretrained(path2tok)
input_context = 'The dog'
input_ids = tuned_tokenizer.encode(input_context, return_tensors='tf') # encode input context
outputs = tuned_model.generate(input_ids=input_ids,
max_length=40,
temperature=0.7,
num_return_sequences=3,
do_sample=True) # generate 3 candidates using sampling
for i in range(3): # 3 output sequences were generated
print(f'Generated {i}: {tuned_tokenizer.decode(outputs[i], skip_special_tokens=True)}')
The model returns the output:
>>>All model checkpoint layers were used when initializing TFGPT2LMHeadModel.
>>>All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at clm_model_save.
>>>If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
>>>Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
>>>Generated 0: The dog!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>>Generated 1: The dog!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>>Generated 2: The dog!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
When I use the original checkpoint, distilgpt2, the model generates text just fine. Is this a sign of some sort of misconfiguration? Or is this simply a sign of a poorly trained model?
I've tried using the original checkpoint tokenizer, manually setting the pad_token_id, using a much longer input context, and changing several parameters for the .generate() method. Same results each time.
Also, I added special tokens to my tuned_tokenizer:
tuned_tokenizer.special_tokens_map
>>>{'bos_token': '<|startoftext|>',
>>> 'eos_token': '<|endoftext|>',
>>> 'unk_token': '<|endoftext|>',
>>> 'pad_token': '<|PAD|>'}
Compared to the original tokenizer:
tokenizer.special_tokens_map
>>> {'bos_token': '<|endoftext|>',
>>> 'eos_token': '<|endoftext|>',
>>> 'unk_token': '<|endoftext|>'}

TensorFlow Extended | Trainer Not Warm Starting With GenericExecutor & Keras Model

I'm presently trying to get a Trainer component of a TFX pipeline to warm-start from a previous run of the same pipeline. The use case is:
Run the pipeline once, produce a model.
As new data comes in, train the existing model with the new data.
I am aware the ResolverNode component is designed for this purpose, so you can see how I utilize it below:
# detect the previously trained model
latest_model_resolver = ResolverNode(
instance_name='latest_model_resolver',
resolver_class=latest_artifacts_resolver.LatestArtifactsResolver,
latest_model=Channel(type=Model))
context.run(latest_model_resolver)
# set prior model as base_model
train_file = 'tfx_modules/recommender_train.py'
trainer = Trainer(
module_file=os.path.abspath(train_file),
custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor),
transformed_examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
schema=schema_gen.outputs['schema'],
train_args=trainer_pb2.TrainArgs(num_steps=10000),
eval_args=trainer_pb2.EvalArgs(num_steps=5000),
base_model=latest_model_resolver.outputs['latest_model'])
The components above run successfully, and the ResolverNode is able to detect the latest model from prior pipeline runs. No error is thrown - however, when running context.run(trainer), the model loss basically begins where it started the first time. After the model's first run, it finishes training loss ~0.1, however, upon the second run (with the supposed warm-start), it restarts ~18.2.
This leads me to believe all weights were re-initialized, which I don't believe should have occurred. Below are the relevant model construction functions:
def build_keras_model():
"""build keras model"""
embedding_max_values = load(open(os.path.abspath('tfx-example/user_artifacts/embedding_max_dict.pkl'), 'rb'))
embedding_dimensions = dict([(key, 20) for key in embedding_max_values.keys()])
embedding_pairs = [recommender.EmbeddingPair(embedding_name=feature,
embedding_dimension=embedding_dimensions[feature],
embedding_max_val=embedding_max_values[feature])
for feature in recommender_constants.univalent_features]
numeric_inputs = []
for num_feature in recommender_constants.numeric_features:
numeric_inputs.append(keras.Input(shape=(1,), name=num_feature))
input_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.input_layers]
pre_concat_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.embedding_layers]
concat = keras.layers.Concatenate()(pre_concat_layers) if len(pre_concat_layers) > 1 else pre_concat_layers[0]
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(concat)
output = keras.layers.Dense(1, kernel_initializer='lecun_uniform', name='out')(layer_1)
model = keras.models.Model(input_layers, outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')
return model
def run_fn(fn_args: TrainerFnArgs):
"""function for the Trainer component"""
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(fn_args.train_files, fn_args.data_accessor,
tf_transform_output, 40)
eval_dataset = _input_fn(fn_args.eval_files, fn_args.data_accessor,
tf_transform_output, 40)
model = build_keras_model()
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=fn_args.model_run_dir, update_freq='epoch', histogram_freq=1,
write_images=True)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)
signatures = {
'serving_default':
_get_serve_tf_examples_fn(model, tf_transform_output).get_concrete_function(tf.TensorSpec(
shape=[None],
dtype=tf.string,
name='examples')
)
}
model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
To research the problem, I have perused:
Warm Start Example From TFX
https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_warmstart.py
However, this guide uses the Estimator component instead of the Keras components. That component has a warm_start_from initialization parameter which I couldn't find for the Keras equivalent.
I suspect:
Either the warm-start functionality is only available for Estimator components and won't take effect even if base_model is set for Keras components.
I am somehow telling the model to re-initialize weights even after successfully loading the prior model - in that case I would love a pointer as to where that's happening.
Any assistance would be great! Much thanks.

With Keras models you have to load the model first using the base model path, then you can continue training from there instead of building a new model.
Your Trainer component looks correct, but in run_fn do the following instead:
def run_fn(fn_args: FnArgs):
model = tf.keras.models.load_model(fn_args.base_model)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)

Saving normalization values in Keras model

I have a Keras model for which I would like to save the normalization values in the model object itself for easier portability.
I'm using sklearn's StandardScaler() to normalize my data, so I simply want to save the mean_ and var_ attributes from the scaler to the model, save the model, and when I reload the model have access to these attributes.
Currently when I reload the model the attributes I added are not there. What is the correct way of doing this ?
Code:
# Normalize data
scaler = StandardScaler()
scaler.fit(X_train)
...
# Create model
model = Sequential(...)
# Compile and train
...
# Save model with normalization mean and var
model.normalization_mean = scaler.mean_
model.normalization_var = scaler.var_
keras.models.save_model(model = model,
filepath = ...)
# Reload model
model = keras.models.load_model(filepath = ...)
hasattr(model, 'normalization_mean') # False
hasattr(model, 'normalization_var') # False

this is a possibility... you can create a model subclass in this way and assign external object like not-trainable variables
X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.dense1 = Dense(32)
self.dense2 = Dense(1)
def call(self, inputs):
x = self.dense1(inputs)
return self.dense2(x)
model = MyModel()
model.compile('adam','mse')
model.fit(X,y)
model._normalization_mean = tf.Variable([111.], trainable=False)
model._normalization_var = tf.Variable([222.], trainable=False)
model.save('abc.tf', save_format='tf')
model = tf.keras.models.load_model(filepath = 'abc.tf')
after loading the model you can call
model._normalization_mean.numpy()
# array([111.], dtype=float32)
here the running notebook
to save and load subclass model you can refer to this

I just came across Keras preprocessing layers whose purpose seem to be exactly what you're describing.
The Keras preprocessing layers API allows developers to build Keras-native input processing pipelines. These input processing pipelines can be used as independent preprocessing code in non-Keras workflows, combined directly with Keras models, and exported as part of a Keras SavedModel.
With Keras preprocessing layers, you can build and export models that are truly end-to-end: models that accept raw images or raw structured data as input; models that handle feature normalization or feature value indexing on their own.

different prediction after load a model in keras

I have a Sequential Model built in Keras and after trained it give me good prediction but when i save and then load the model i don't obtain the same prediction on the same dataset. Why?
Note that I checked the weight of the model and they are the same as well as the architecture of the model, checked with model.summary() and model.getWeights(). This is very strange in my opinion and I have no idea how to deal with this problem.
I don't have any error but the prediction are different
I tried to use model.save() and load_model()
I tried to use model.save_weights() and after that re-built the model and then load the model
I have the same problem with both options.
def Classifier(input_shape, word_to_vec_map, word_to_index, emb_dim, num_activation):
sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.15)(X)
X = LSTM(128)(X)
X = Dropout(0.15)(X)
X = Dense(num_activation, activation='softmax')(X)
model = Model(sentence_indices, X)
sequentialModel = Sequential(model.layers)
return sequentialModel
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
...
model.fit(Y_train_indices, Z_train_oh, epochs=30, batch_size=32, shuffle=True)
# attempt 1
model.save('classificationTest.h5', True, True)
modelRNN = load_model(r'C:\Users\Alessio\classificationTest.h5')
# attempt 2
model.save_weights("myWeight.h5")
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
model.load_weights(r'C:\Users\Alessio\myWeight.h5')
# PREDICTION TEST
code_train, category_train, category_code_train, text_train = read_csv_for_email(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
categories, code_categories = get_categories(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
X_my_sentences = text_train
Y_my_labels = category_code_train
X_test_indices = sentences_to_indices(X_my_sentences, word_to_index, maxLen)
pred = model.predict(X_test_indices)
def codeToCategory(categories, code_categories, current_code):
i = 0;
for code in code_categories:
if code == current_code:
return categories[i]
i = i + 1
return "no_one_find"
# result
for i in range(len(Y_my_labels)):
num = np.argmax(pred[i])
# Pretrained embedding layer
def pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
embedding_layer -- pretrained layer Keras instance
"""
vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement)
### START CODE HERE ###
# Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
emb_matrix = np.zeros((vocab_len, emb_dim))
# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec_map[word]
# Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False.
embedding_layer = Embedding(vocab_len, emb_dim)
### END CODE HERE ###
# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
embedding_layer.build((None,))
# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
return embedding_layer
Do you have any kind of suggestion?
Thanks in Advance.
Edit1: if use the code of saving and loading in the same "page" (I'm using notebook jupyter) it works fine. If I change "page" it doesn't work. Could it be that there is something related with the tensorflow session?
Edit2: my final goal is to load a model, trained in Keras, with Deeplearning4J in java. So if you know a solution for "transforming" the keras model in something else readable in DL4J it will help anyway.
Edit3: add function pretrained_embedding_layer()
Edit4: dictionaries from word2Vec model read with gensim
from gensim.models import Word2Vec
model = Word2Vec.load('C:/Users/Alessio/Desktop/emoji_ita/embedding/glove_WIKI')
def getMyModels (model):
word_to_index = dict({})
index_to_word = dict({})
word_to_vec_map = dict({})
for idx, key in enumerate(model.wv.vocab):
word_to_index[key] = idx
index_to_word[idx] = key
word_to_vec_map[key] = model.wv[key]
return word_to_index, index_to_word, word_to_vec_map

Are you pre-processing your data in the same way when you load your model ?
And if yes, did you set the seed of your pre-processing functions ?
If you build a dictionnary with keras, are the sentences coming in the same order ?

I had the same problem before, so here is how you solve it. After making sure that the weights and summary are the same, try to print your random seed and check. If its value is changing from a session to another and if you tried tensorflow's seed, it means you need to disable the PYTHONHASHSEED environment variable. You can read more about it here:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
To disable it, go to your system environment variables and add PYTHONHASHSEED as a new variable if it doesn't exist. Then, set its value to 0 to disable it. Please note that it was done in this way because it has to be disabled before running the interpreter.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading a converted pytorch model in huggingface transformers properly - python

Related

How to know the trained model is correct?

Huggingface Transformers Tensorflow fine-tuned distilgpt2 bad outputs

TensorFlow Extended | Trainer Not Warm Starting With GenericExecutor & Keras Model

Saving normalization values in Keras model

different prediction after load a model in keras

Categories

Resources