I have a Sequential Model built in Keras and after trained it give me good prediction but when i save and then load the model i don't obtain the same prediction on the same dataset. Why?
Note that I checked the weight of the model and they are the same as well as the architecture of the model, checked with model.summary() and model.getWeights(). This is very strange in my opinion and I have no idea how to deal with this problem.
I don't have any error but the prediction are different
I tried to use model.save() and load_model()
I tried to use model.save_weights() and after that re-built the model and then load the model
I have the same problem with both options.
def Classifier(input_shape, word_to_vec_map, word_to_index, emb_dim, num_activation):
sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.15)(X)
X = LSTM(128)(X)
X = Dropout(0.15)(X)
X = Dense(num_activation, activation='softmax')(X)
model = Model(sentence_indices, X)
sequentialModel = Sequential(model.layers)
return sequentialModel
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
...
model.fit(Y_train_indices, Z_train_oh, epochs=30, batch_size=32, shuffle=True)
# attempt 1
model.save('classificationTest.h5', True, True)
modelRNN = load_model(r'C:\Users\Alessio\classificationTest.h5')
# attempt 2
model.save_weights("myWeight.h5")
model = Classifier((maxLen,), word_to_vec_map, word_to_index, maxLen, num_activation)
model.load_weights(r'C:\Users\Alessio\myWeight.h5')
# PREDICTION TEST
code_train, category_train, category_code_train, text_train = read_csv_for_email(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
categories, code_categories = get_categories(r'C:\Users\Alessio\Desktop\6Febbraio\2test.csv')
X_my_sentences = text_train
Y_my_labels = category_code_train
X_test_indices = sentences_to_indices(X_my_sentences, word_to_index, maxLen)
pred = model.predict(X_test_indices)
def codeToCategory(categories, code_categories, current_code):
i = 0;
for code in code_categories:
if code == current_code:
return categories[i]
i = i + 1
return "no_one_find"
# result
for i in range(len(Y_my_labels)):
num = np.argmax(pred[i])
# Pretrained embedding layer
def pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
embedding_layer -- pretrained layer Keras instance
"""
vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement)
### START CODE HERE ###
# Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
emb_matrix = np.zeros((vocab_len, emb_dim))
# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec_map[word]
# Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False.
embedding_layer = Embedding(vocab_len, emb_dim)
### END CODE HERE ###
# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
embedding_layer.build((None,))
# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
return embedding_layer
Do you have any kind of suggestion?
Thanks in Advance.
Edit1: if use the code of saving and loading in the same "page" (I'm using notebook jupyter) it works fine. If I change "page" it doesn't work. Could it be that there is something related with the tensorflow session?
Edit2: my final goal is to load a model, trained in Keras, with Deeplearning4J in java. So if you know a solution for "transforming" the keras model in something else readable in DL4J it will help anyway.
Edit3: add function pretrained_embedding_layer()
Edit4: dictionaries from word2Vec model read with gensim
from gensim.models import Word2Vec
model = Word2Vec.load('C:/Users/Alessio/Desktop/emoji_ita/embedding/glove_WIKI')
def getMyModels (model):
word_to_index = dict({})
index_to_word = dict({})
word_to_vec_map = dict({})
for idx, key in enumerate(model.wv.vocab):
word_to_index[key] = idx
index_to_word[idx] = key
word_to_vec_map[key] = model.wv[key]
return word_to_index, index_to_word, word_to_vec_map
Are you pre-processing your data in the same way when you load your model ?
And if yes, did you set the seed of your pre-processing functions ?
If you build a dictionnary with keras, are the sentences coming in the same order ?
I had the same problem before, so here is how you solve it. After making sure that the weights and summary are the same, try to print your random seed and check. If its value is changing from a session to another and if you tried tensorflow's seed, it means you need to disable the PYTHONHASHSEED environment variable. You can read more about it here:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
To disable it, go to your system environment variables and add PYTHONHASHSEED as a new variable if it doesn't exist. Then, set its value to 0 to disable it. Please note that it was done in this way because it has to be disabled before running the interpreter.
Related
I am trying to solve a problem for a deep learning class and the block of code i have to modify looks like this
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
""" Define a tf.keras model for binary classification out of the MobileNetV2 model
Arguments:
image_shape -- Image width and height
data_augmentation -- data augmentation function
Returns:
tf.keras.model
"""
input_shape = image_shape + (3,)
# START CODE HERE
base_model=tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights="imagenet")
# Freeze the base model by making it non trainable
base_model.trainable = None
# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(shape=None)
# apply data augmentation to the inputs
x = None
# data preprocessing using the same weights the model was trained on
x = preprocess_input(None)
# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(None, training=None)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
x = None()(x)
#include dropout with probability of 0.2 to avoid overfitting
x = None(None)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
prediction_layer = None
# END CODE HERE
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
IMG_SIZE = (160, 160)
def data_augmentation():
data = tl.keras.Sequential()
data.add(RandomFlip("horizontal")
data.add(RandomRotation(0.2)
return data
I tried 3 times starting from that template following the directions and a lot of trial and error. I don't know what I am missing. I have gotten it to the point where it train a model and I can get the summary of it, but the summary is not correct.
Please help, I am going crazy trying to figure this out. I know it is super simple, but its the simple problems that trip me up.
You might have to use the below code to run your algorithm.
input_shape = image_shape + (3,)
### START CODE HERE
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
include_top=False, # <== Important!!!!
weights='imagenet') # From imageNet
# Freeze the base model by making it non trainable
base_model.trainable = False
# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(shape=input_shape)
# apply data augmentation to the inputs
x = data_augmentation(inputs)
# data preprocessing using the same weights the model was trained on
x = preprocess_input(x)
# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(x, training=False)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
x = tf.keras.layers.GlobalAveragePooling2D()(x)
#include dropout with probability of 0.2 to avoid overfitting
x = tf.keras.layers.Dropout(0.2)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
prediction_layer = tf.keras.layers.Dense(1 ,activation='linear')(x)
### END CODE HERE
outputs = prediction_layer
model = tf.keras.Model(inputs, outputs)
I had the same issue but my mistake was putting (x) in the dense layer before the end, here is the code that worked for me:
def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
''' Define a tf.keras model for binary classification out of the MobileNetV2 model
Arguments:
image_shape -- Image width and height
data_augmentation -- data augmentation function
Returns:
Returns:
tf.keras.model
'''
input_shape = image_shape + (3,)
### START CODE HERE
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
include_top=False, # <== Important!!!!
weights='imagenet') # From imageNet
# Freeze the base model by making it non trainable
base_model.trainable = False
# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(shape=input_shape)
# apply data augmentation to the inputs
x = data_augmentation(inputs)
# data preprocessing using the same weights the model was trained on
x = preprocess_input(x)
# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(x, training=False)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
x = tfl.GlobalAveragePooling2D()(x)
#include dropout with probability of 0.2 to avoid overfitting
x = tfl.Dropout(0.2)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
prediction_layer = tfl.Dense(1, activation = 'linear')
### END CODE HERE
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
Under def data augmentation, your brackets are not well closed
I want to add a dense layer on top of the bare BERT Model transformer outputting raw hidden-states, and then fine tune the resulting model. Specifically, I am using this base model. This is what the model should do:
Encode the sentence (a vector with 768 elements for each token of the sentence)
Keep only the first vector (related to the first token)
Add a dense layer on top of this vector, to get the desired transformation
So far, I have successfully encoded the sentences:
from sklearn.neural_network import MLPRegressor
import torch
from transformers import AutoModel, AutoTokenizer
# List of strings
sentences = [...]
# List of numbers
labels = [...]
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
# 2D array, one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s, return_tensors='pt'))[0][0][0]
for s in sentences]).detach().numpy()
regr = MLPRegressor()
regr.fit(encoded_sentences, labels)
In this way I can train a neural network by feeding it with the encoded sentences. However, this approach clearly does not fine tune the base BERT model. Can anybody help me? How can I build a model (possibly in pytorch or using the Huggingface library) that can be entirely fine tuned?
There are two ways to do it: Since you are looking to fine-tune the model for a downstream task similar to classification, you can directly use:
BertForSequenceClassification class. Performs fine-tuning of logistic regression layer on the output dimension of 768.
Alternatively, you can define a custom module, that created a bert model based on the pre-trained weights and adds layers on top of it.
from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = nn.Linear(768, 256)
self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids,
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear2_output)
return linear2_output
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()))
for epoch in epochs:
for batch in data_loader: ## If you have a DataLoader() object to get the data.
data = batch[0]
targets = batch[1] ## assuming that data loader returns a tuple of data and its targets
optimizer.zero_grad()
encoding = tokenizer.batch_encode_plus(data, return_tensors='pt', padding=True, truncation=True,max_length=50, add_special_tokens = True)
outputs = model(input_ids, attention_mask=attention_mask)
outputs = F.log_softmax(outputs, dim=1)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
If you want to tune the BERT model itself you will need to modify the parameters of the model. To do this you will most likely want to do your work with PyTorch. Here is some rough psuedo code to illustrate:
from torch.optim import SGD
model = ... # whatever model you are using
parameters = model.parameters() # or some more specific set of parameters
optimizer = SGD(parameters,lr=.01) # or whatever optimizer you want
optimizer.zero_grad() # boiler-platy pytorch function
input = ... # whatever the appropriate input for your task is
label = ... # whatever the appropriate label for your task is
loss = model(**input, label) # usuall loss is the first item returned
loss.backward() # calculates gradient
optim.step() # runs optimization algorithm
I've left out all the relevant details because they are quite tedious and specific to whatever your specific task is. Huggingface has a nice article walking through this is more detail here, and you will definitely want to refer to some pytorch documentation as you use any pytorch stuff. I highly recommend the pytorch blitz before trying to do anything serious with it.
For anyone using Tensorflow/ Keras the equivalent of Ashwin's answer would be:
from tensorflow import keras
from transformers import AutoTokenizer, TFAutoModel
class CustomBERTModel(keras.Model):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = TFAutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = keras.layers.Dense(256)
self.linear2 = keras.layers.Dense(3) ## 3 is the number of classes in this example
def call(self, inputs, training=False):
# call expects only one positional argument, so you have to pass in a tuple and unpack. The next parameter is a special reserved training parameter.
ids, mask = inputs
sequence_output = self.bert(ids, mask, training=training).last_hidden_state
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:]) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear1_output)
return linear2_output
model = CustomBERTModel()
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
ipts = tokenizer("Some input sequence", return_tensors="tf")
test = model((ipts["input_ids"], ipts["attention_mask"]))
Then to train the model you can make a custom training loop using GradientTape.
You can verify that the additional layers are also trainable with model.trainable_weights. You can access weights for individual layers with e.g. model.trainable_weights[-1].numpy() would get the last layer's bias vector. [Note the Dense layers will only appear after the first time the call method is executed.]
I converted a pre-trained tf model to pytorch using the following function.
def convert_tf_checkpoint_to_pytorch(*, tf_checkpoint_path, albert_config_file, pytorch_dump_path):
# Initialise PyTorch model
config = AlbertConfig.from_json_file(albert_config_file)
print("Building PyTorch model from configuration: {}".format(str(config)))
model = AlbertForPreTraining(config)
# Load weights from tf checkpoint
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
I am loading the converted model and encoding sentences in the following way:
def vectorize_sentence(text):
albert_tokenizer = AlbertTokenizer.from_pretrained("albert-base-v2")
config = AlbertConfig.from_pretrained(config_path, output_hidden_states=True)
model = TFAlbertModel.from_pretrained(pytorch_dir, config=config, from_pt=True)
e = albert_tokenizer.encode(text, max_length=512)
model_input = tf.constant(e)[None, :] # Batch size 1
output = model(model_input)
v = [0] * 768
# generate sentence vectors by averaging the word vectors
for i in range(1, len(model_input[0]) - 1):
v = v + output[0][0][i].numpy()
vector = v/len(model_input[0])
return vector
However while loading the model, a warning comes up:
Some weights or buffers of the PyTorch model TFAlbertModel were not
initialized from the TF 2.0 model and are newly initialized:
['predictions.LayerNorm.bias', 'predictions.dense.weight',
'predictions.LayerNorm.weight', 'sop_classifier.classifier.bias',
'predictions.dense.bias', 'sop_classifier.classifier.weight',
'predictions.decoder.bias', 'predictions.bias',
'predictions.decoder.weight'] You should probably TRAIN this model on
a down-stream task to be able to use it for predictions and inference.
Can anyone tell me if I am doing anything wrong? What does the warning mean? I saw issue #5588 on the github repo of Transformers. Don't know if my issue is the same as this.
I think you could try using
model = AlbertModel.from_pretrained
instead of
model = TFAlbertModel.from_pretrained
in the VectorizeSentence definition.
AlbertModel is the name of the class for the pytorch format model, and TFAlbertModel is the name of the class for the tensorflow format model.
I'm not sure exactly what load_tf_weights_in_albert() does, but I think that once you have done that your model is in pytorch format.
I have an already trained neuronal network consisting of files NNbiases_b1.csv, NNbiases_out.csv, NNweights_h1.csv and NNweights_out.csv. The input and output layer sizes are known too.
Now I'm looking for a Python script that uses this neuronal network, means outputs data dependent on input data and trained network.
But whenever I google for an related script, I only find howtos and explanations about training an network!
So my question: when I have an already trained network with the data/files above: how can I use this neuronal network?
Thanks!
I think you need to reconstruct your model's architecture, and then manually set the weights of each layer with something like that :
all_weights = []
NNweights_h1 = [...] #load your csv of weights
NNbiases_b1 = [...] #load your csv of biases
all_weights.append(NNweights_h1)
all_weights.append(NNbiases_b1)
model.layers[i].set_weights(all_weights)
And do that for all your layers.
Update after precisions
In order to use your model (dummy exemple) :
Reconstruct the architecture :
def model(model_input):
x = Dense(12, input_dim=8, activation='relu')(model_input)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x, name='Your_model')
return model
Instanciate it :
X_test = [...] #load your data
input_shape = [...] #your test data shape
model_input = Input(shape=input_shape)
model = model(model_input)
Manually set the weights with using the code at the begining of the answer
Use this model to predict your data:
prediction = model.predict(X_test) #get the predictions of your model
I hope this will help you !
I'm training based on a sample code I found on the Internet. The accuracy in testing is at 92% and the checkpoints are saved in a directory. In parallel (the training is running for 3 days now) I want to create my prediction code so I can learn more instead of just waiting.
This is my third day of deep learning so I probably don't know what I'm doing. Here's how I'm trying to predict:
Instantiate the model using the same code as in training
Load the last checkpoint
Try to predict
The code works but the results are nowhere near 90%.
Here's how I create the model:
INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal" # : Gaussian initialization scaled by fan_in (He et al., 2014)
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")
def generate_model(output_len, chars=None):
"""Generate the model"""
print('Build model...')
chars = chars or CHARS
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE
# note: in a situation where your input sequences have a variable length,
# use input_shape=(None, nb_feature).
for layer_number in range(INPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, input_shape=(None, len(chars)), init=INITIALIZATION,
return_sequences=layer_number + 1 < INPUT_LAYERS))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For the decoder's input, we repeat the encoded input for each time step
model.add(RepeatVector(output_len))
# The decoder RNN could be multiple layers stacked or a single layer
for _ in range(OUTPUT_LAYERS):
model.add(recurrent.LSTM(HIDDEN_SIZE, return_sequences=True, init=INITIALIZATION))
model.add(Dropout(AMOUNT_OF_DROPOUT))
# For each of step of the output sequence, decide which character should be chosen
model.add(TimeDistributed(Dense(len(chars), init=INITIALIZATION)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
In a separate file predict.py I import this method to create my model and try to predict:
...import code
model = generate_model(len(question), dataset['chars'])
model.load_weights('models/weights.204-0.20.hdf5')
def decode(pred):
return character_table.decode(pred, calc_argmax=False)
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, t, character_table.char_indices[char]] = 1.
preds = model.predict_classes([x], verbose=0)[0]
print("======================================")
print(decode(preds))
I don't know what the problem is. I have about 90 checkpoints in my directory and I'm loading the last one based on accuracy. All of them saved by a ModelCheckpoint:
checkpoint = ModelCheckpoint(MODEL_CHECKPOINT_DIRECTORYNAME + '/' + MODEL_CHECKPOINT_FILENAME,
save_best_only=True)
I'm stuck. What am I doing wrong?
In the repo you provided, the training and validation sentences are inverted before being fed into the model (as commonly done in seq2seq learning).
dataset = DataSet(DATASET_FILENAME)
As you can see, the default value for inverted is True, and the questions are inverted.
class DataSet(object):
def __init__(self, dataset_filename, test_set_fraction=0.1, inverted=True):
self.inverted = inverted
...
question = question[::-1] if self.inverted else question
questions.append(question)
You can try to invert the sentences during prediction. Specifically,
x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
x[0, len(question) - t - 1, character_table.char_indices[char]] = 1.
When you generate model in your predict.py file:
model = generate_model(len(question), dataset['chars'])
is your first parameter the same as in your training file? Or is the question length dynamic? If so, you are generating different model, thus your saved checkpoint doesn't work.
It can be the dimentionality of arrays/df passed not matching what is expected by the functions you call. When a single dimention is expected by the called method, try ravel on what you expect to be a single dimention