I have a Keras model for which I would like to save the normalization values in the model object itself for easier portability.
I'm using sklearn's StandardScaler() to normalize my data, so I simply want to save the mean_ and var_ attributes from the scaler to the model, save the model, and when I reload the model have access to these attributes.
Currently when I reload the model the attributes I added are not there. What is the correct way of doing this ?
Code:
# Normalize data
scaler = StandardScaler()
scaler.fit(X_train)
...
# Create model
model = Sequential(...)
# Compile and train
...
# Save model with normalization mean and var
model.normalization_mean = scaler.mean_
model.normalization_var = scaler.var_
keras.models.save_model(model = model,
filepath = ...)
# Reload model
model = keras.models.load_model(filepath = ...)
hasattr(model, 'normalization_mean') # False
hasattr(model, 'normalization_var') # False
this is a possibility... you can create a model subclass in this way and assign external object like not-trainable variables
X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.dense1 = Dense(32)
self.dense2 = Dense(1)
def call(self, inputs):
x = self.dense1(inputs)
return self.dense2(x)
model = MyModel()
model.compile('adam','mse')
model.fit(X,y)
model._normalization_mean = tf.Variable([111.], trainable=False)
model._normalization_var = tf.Variable([222.], trainable=False)
model.save('abc.tf', save_format='tf')
model = tf.keras.models.load_model(filepath = 'abc.tf')
after loading the model you can call
model._normalization_mean.numpy()
# array([111.], dtype=float32)
here the running notebook
to save and load subclass model you can refer to this
I just came across Keras preprocessing layers whose purpose seem to be exactly what you're describing.
The Keras preprocessing layers API allows developers to build Keras-native input processing pipelines. These input processing pipelines can be used as independent preprocessing code in non-Keras workflows, combined directly with Keras models, and exported as part of a Keras SavedModel.
With Keras preprocessing layers, you can build and export models that are truly end-to-end: models that accept raw images or raw structured data as input; models that handle feature normalization or feature value indexing on their own.
Related
I'm presently trying to get a Trainer component of a TFX pipeline to warm-start from a previous run of the same pipeline. The use case is:
Run the pipeline once, produce a model.
As new data comes in, train the existing model with the new data.
I am aware the ResolverNode component is designed for this purpose, so you can see how I utilize it below:
# detect the previously trained model
latest_model_resolver = ResolverNode(
instance_name='latest_model_resolver',
resolver_class=latest_artifacts_resolver.LatestArtifactsResolver,
latest_model=Channel(type=Model))
context.run(latest_model_resolver)
# set prior model as base_model
train_file = 'tfx_modules/recommender_train.py'
trainer = Trainer(
module_file=os.path.abspath(train_file),
custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor),
transformed_examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
schema=schema_gen.outputs['schema'],
train_args=trainer_pb2.TrainArgs(num_steps=10000),
eval_args=trainer_pb2.EvalArgs(num_steps=5000),
base_model=latest_model_resolver.outputs['latest_model'])
The components above run successfully, and the ResolverNode is able to detect the latest model from prior pipeline runs. No error is thrown - however, when running context.run(trainer), the model loss basically begins where it started the first time. After the model's first run, it finishes training loss ~0.1, however, upon the second run (with the supposed warm-start), it restarts ~18.2.
This leads me to believe all weights were re-initialized, which I don't believe should have occurred. Below are the relevant model construction functions:
def build_keras_model():
"""build keras model"""
embedding_max_values = load(open(os.path.abspath('tfx-example/user_artifacts/embedding_max_dict.pkl'), 'rb'))
embedding_dimensions = dict([(key, 20) for key in embedding_max_values.keys()])
embedding_pairs = [recommender.EmbeddingPair(embedding_name=feature,
embedding_dimension=embedding_dimensions[feature],
embedding_max_val=embedding_max_values[feature])
for feature in recommender_constants.univalent_features]
numeric_inputs = []
for num_feature in recommender_constants.numeric_features:
numeric_inputs.append(keras.Input(shape=(1,), name=num_feature))
input_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.input_layers]
pre_concat_layers = numeric_inputs + [elem for pair in embedding_pairs for elem in pair.embedding_layers]
concat = keras.layers.Concatenate()(pre_concat_layers) if len(pre_concat_layers) > 1 else pre_concat_layers[0]
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(concat)
output = keras.layers.Dense(1, kernel_initializer='lecun_uniform', name='out')(layer_1)
model = keras.models.Model(input_layers, outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')
return model
def run_fn(fn_args: TrainerFnArgs):
"""function for the Trainer component"""
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(fn_args.train_files, fn_args.data_accessor,
tf_transform_output, 40)
eval_dataset = _input_fn(fn_args.eval_files, fn_args.data_accessor,
tf_transform_output, 40)
model = build_keras_model()
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=fn_args.model_run_dir, update_freq='epoch', histogram_freq=1,
write_images=True)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)
signatures = {
'serving_default':
_get_serve_tf_examples_fn(model, tf_transform_output).get_concrete_function(tf.TensorSpec(
shape=[None],
dtype=tf.string,
name='examples')
)
}
model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
To research the problem, I have perused:
Warm Start Example From TFX
https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_warmstart.py
However, this guide uses the Estimator component instead of the Keras components. That component has a warm_start_from initialization parameter which I couldn't find for the Keras equivalent.
I suspect:
Either the warm-start functionality is only available for Estimator components and won't take effect even if base_model is set for Keras components.
I am somehow telling the model to re-initialize weights even after successfully loading the prior model - in that case I would love a pointer as to where that's happening.
Any assistance would be great! Much thanks.
With Keras models you have to load the model first using the base model path, then you can continue training from there instead of building a new model.
Your Trainer component looks correct, but in run_fn do the following instead:
def run_fn(fn_args: FnArgs):
model = tf.keras.models.load_model(fn_args.base_model)
model.fit(train_dataset, steps_per_epoch=fn_args.train_steps, validation_data=eval_dataset,
validation_steps=fn_args.eval_steps, callbacks=[tensorboard_callback],
epochs=5)
I want to add a dense layer on top of the bare BERT Model transformer outputting raw hidden-states, and then fine tune the resulting model. Specifically, I am using this base model. This is what the model should do:
Encode the sentence (a vector with 768 elements for each token of the sentence)
Keep only the first vector (related to the first token)
Add a dense layer on top of this vector, to get the desired transformation
So far, I have successfully encoded the sentences:
from sklearn.neural_network import MLPRegressor
import torch
from transformers import AutoModel, AutoTokenizer
# List of strings
sentences = [...]
# List of numbers
labels = [...]
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
# 2D array, one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s, return_tensors='pt'))[0][0][0]
for s in sentences]).detach().numpy()
regr = MLPRegressor()
regr.fit(encoded_sentences, labels)
In this way I can train a neural network by feeding it with the encoded sentences. However, this approach clearly does not fine tune the base BERT model. Can anybody help me? How can I build a model (possibly in pytorch or using the Huggingface library) that can be entirely fine tuned?
There are two ways to do it: Since you are looking to fine-tune the model for a downstream task similar to classification, you can directly use:
BertForSequenceClassification class. Performs fine-tuning of logistic regression layer on the output dimension of 768.
Alternatively, you can define a custom module, that created a bert model based on the pre-trained weights and adds layers on top of it.
from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = nn.Linear(768, 256)
self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids,
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear2_output)
return linear2_output
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()))
for epoch in epochs:
for batch in data_loader: ## If you have a DataLoader() object to get the data.
data = batch[0]
targets = batch[1] ## assuming that data loader returns a tuple of data and its targets
optimizer.zero_grad()
encoding = tokenizer.batch_encode_plus(data, return_tensors='pt', padding=True, truncation=True,max_length=50, add_special_tokens = True)
outputs = model(input_ids, attention_mask=attention_mask)
outputs = F.log_softmax(outputs, dim=1)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
If you want to tune the BERT model itself you will need to modify the parameters of the model. To do this you will most likely want to do your work with PyTorch. Here is some rough psuedo code to illustrate:
from torch.optim import SGD
model = ... # whatever model you are using
parameters = model.parameters() # or some more specific set of parameters
optimizer = SGD(parameters,lr=.01) # or whatever optimizer you want
optimizer.zero_grad() # boiler-platy pytorch function
input = ... # whatever the appropriate input for your task is
label = ... # whatever the appropriate label for your task is
loss = model(**input, label) # usuall loss is the first item returned
loss.backward() # calculates gradient
optim.step() # runs optimization algorithm
I've left out all the relevant details because they are quite tedious and specific to whatever your specific task is. Huggingface has a nice article walking through this is more detail here, and you will definitely want to refer to some pytorch documentation as you use any pytorch stuff. I highly recommend the pytorch blitz before trying to do anything serious with it.
For anyone using Tensorflow/ Keras the equivalent of Ashwin's answer would be:
from tensorflow import keras
from transformers import AutoTokenizer, TFAutoModel
class CustomBERTModel(keras.Model):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = TFAutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = keras.layers.Dense(256)
self.linear2 = keras.layers.Dense(3) ## 3 is the number of classes in this example
def call(self, inputs, training=False):
# call expects only one positional argument, so you have to pass in a tuple and unpack. The next parameter is a special reserved training parameter.
ids, mask = inputs
sequence_output = self.bert(ids, mask, training=training).last_hidden_state
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:]) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear1_output)
return linear2_output
model = CustomBERTModel()
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
ipts = tokenizer("Some input sequence", return_tensors="tf")
test = model((ipts["input_ids"], ipts["attention_mask"]))
Then to train the model you can make a custom training loop using GradientTape.
You can verify that the additional layers are also trainable with model.trainable_weights. You can access weights for individual layers with e.g. model.trainable_weights[-1].numpy() would get the last layer's bias vector. [Note the Dense layers will only appear after the first time the call method is executed.]
In the examples here it mentions that one can subclass the class tf.keras.Model as follows:
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
However, what happens if I want to have a variable number of layers and also variable type of layers? How do I store my layer objects in my class object?
From what I have understood the name that I give to the attributes (dense1, dense2) in the example above is significant because that will be used to refer to those layers and their variables when saving to a checkpoint, etc.? Is that correct?
My question is basically: How do I store my layers in my tf.keras.Model subclass if I don't know how many of them I have available? And then how do I save and restore the weights of those layers?
My first thought was to have lists of layer objects but then it is not obvious to me how those layer weights will be saved and restored since they will not correspond to distinct attribute names.
The short answer is: just do what you would do normally, Tensorflow takes care of the rest.
The answer is hidden in the docstring of the save_weights method for tf.keras.Model (emphasis added):
When saving in TensorFlow format, all objects referenced by the network are
saved in the same format as tf.train.Checkpoint, including any Layer
instances or Optimizer instances assigned to object attributes. For
networks constructed from inputs and outputs using tf.keras.Model(inputs,
outputs), Layer instances used by the network are tracked/saved
automatically. For user-defined classes which inherit from tf.keras.Model,
Layer instances must be assigned to object attributes, typically in the
constructor.
The easiest way to accomplish your goal is to assign the layers to a Python object. In the following example, I'm using a dictionary to preserve the original names.
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.my_weight_dict = {}
self.my_weight_dict["dense1"] = tf.keras.layers.Dense(6, activation=tf.nn.relu)
self.my_weight_dict["dense2"] = tf.keras.layers.Dense(3, activation=tf.nn.softmax) # changed to fit the dataset
def call(self,inputs):
x = self.my_weight_dict["dense1"](inputs)
return self.my_weight_dict["dense2"](x)
This allows you to programmatically specify attributes that will change the property of your model - e.g. useful for automated hyperparameters tuning.
Here's a fully reproducible example that uses the class defined above:
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
# load the data and split it into train and test
iris_dataset = load_iris()
X = iris_dataset.data
y = iris_dataset.target
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,stratify=y)
# normalize the features
X_train = normalize(X_train, axis=0,norm='max')
X_test = normalize(X_test, axis=0,norm='max')
# create, compile, and fit the model
model = MyModel()
model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.05, momentum=0.9),
loss="sparse_categorical_crossentropy", #tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, verbose = 2, batch_size=128,
validation_data = (X_test, y_test))
# just call the save_weights
model.save_weights(filepath="path/to/your/weights/file")
# create a new model with the same structure
model_2 = MyModel()
model_2.load_weights("path/to/your/weights/file")
model_2.compile(optimizer=tf.keras.optimizers.SGD(lr=0.05, momentum=0.9),
loss="sparse_categorical_crossentropy", #tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_2.evaluate(X_test,y_test)
I have an already trained neuronal network consisting of files NNbiases_b1.csv, NNbiases_out.csv, NNweights_h1.csv and NNweights_out.csv. The input and output layer sizes are known too.
Now I'm looking for a Python script that uses this neuronal network, means outputs data dependent on input data and trained network.
But whenever I google for an related script, I only find howtos and explanations about training an network!
So my question: when I have an already trained network with the data/files above: how can I use this neuronal network?
Thanks!
I think you need to reconstruct your model's architecture, and then manually set the weights of each layer with something like that :
all_weights = []
NNweights_h1 = [...] #load your csv of weights
NNbiases_b1 = [...] #load your csv of biases
all_weights.append(NNweights_h1)
all_weights.append(NNbiases_b1)
model.layers[i].set_weights(all_weights)
And do that for all your layers.
Update after precisions
In order to use your model (dummy exemple) :
Reconstruct the architecture :
def model(model_input):
x = Dense(12, input_dim=8, activation='relu')(model_input)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x, name='Your_model')
return model
Instanciate it :
X_test = [...] #load your data
input_shape = [...] #your test data shape
model_input = Input(shape=input_shape)
model = model(model_input)
Manually set the weights with using the code at the begining of the answer
Use this model to predict your data:
prediction = model.predict(X_test) #get the predictions of your model
I hope this will help you !
I am using custom defined transformer for important features selection via following code
class fs(TransformerMixin, BaseEstimator):
def __init__(self, n_estimators=10):
self.ss=None
self.n_estimators = n_estimators
self.x_new = None
def fit(self, X, y):
m = ExtraTreesClassifier(10)
m.fit(X,y)
self.ss = SelectFromModel(m, prefit=True)
return self
def transform(self, X):
self.x_new=self.ss.transform(X)
return self.x_new`
Here x_new are the new features selected by my custom transformer fs.
Then I am defining my Neural Network as a classifier in my pipeline as follows.
def create_model(dropout_rate=0.1):
my_model=fs()
x_new=my_model.x_new
n_x_new=x_new.shape[1]
np.random.seed(6000)
model_new = Sequential()
model_new.add(Dense(n_x_new,input_dim=n_x_new ,kernel_initializer='glorot_uniform', activation='sigmoid'))
...............................rest of the code................................
Using sklearn Pipeline with the following code.
clf=KerasClassifier(build_fn=create_model, epochs=10, batch_size=1000, verbose=0)
model = Pipeline([('fs', fs()),('clf', clf)])
grid = GridSearchCV(estimator=model,param_grid={"clf__dropout_rate": [0.1, 0.2]},scoring='roc_auc', n_jobs=1)
grid_result = grid.fit(train_cv_x, train_cv_y)
I am getting the following error. How will my Keras model know about the shape of x_new selected for every hyperparamter.
'NoneType' object has no attribute 'shape'
So I solved the problem without making a custom estimate wrapper for my keras model with a very simple trick. When I looked at the error I was getting, It was something like this.
I just hard coded the input_dim value to check if those selected important features are passed to my keras model or not.
model_new.add(Dense(n_x_new,input_dim=200 ,kernel_initializer='glorot_uniform', activation='sigmoid'))
expected dense_22_input to have shape (None, 200) but got array with shape (10328, 428)
The error says that my keras model is actually getting the selecting important features from fs() but as I have defined the input_dim=200and this is not matching with the feature seize it gets from fs().
The only issue was to somehow know about the number of features selected for every hyperparameter and then give that number to input_dim of the keras model.
I declared a global variable and saved the self.x_new value in my fs() in that global variable.
That global variable is accessible in keras model.
There might be some other more robust solutions like, making custom wrapper for keras classifier.