I am following this tutorial on how to train a siamese bert network:
https://keras.io/examples/nlp/semantic_similarity_with_bert/
all good, but I am not sure what is the best way to save the model after train it and save it.
any suggestion?
I was trying with
model.save('models/bert_siamese_v1')
which creates a folder with save_model.bp keras_metadata.bp and two subfolders (variables and assets)
then I try to load it with:
model.load_weights('models/bert_siamese_v1/')
and it gives me this error:
2022-03-08 14:11:52.567762: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open models/bert_siamese_v1/: Failed precondition: models/bert_siamese_v1; Is a directory: perhaps your file is in a different file format and you need to use a different restore operator?
what is the best way to proceed?
Try using tf.saved_model.save to save your model:
tf.saved_model.save(model, 'models/bert_siamese_v1')
model = tf.saved_model.load('models/bert_siamese_v1')
The warning you get during saving can apparently be ignored. After loading your model, you can use it for inference f(test_data):
f = model.signatures["serving_default"]
x1 = tf.random.uniform((1, 128), maxval=100, dtype=tf.int32)
x2 = tf.random.uniform((1, 128), maxval=100, dtype=tf.int32)
x3 = tf.random.uniform((1, 128), maxval=100, dtype=tf.int32)
print(f)
print(f(attention_masks = x1, input_ids = x2, token_type_ids = x3))
ConcreteFunction signature_wrapper(*, token_type_ids, attention_masks, input_ids)
Args:
attention_masks: int32 Tensor, shape=(None, 128)
input_ids: int32 Tensor, shape=(None, 128)
token_type_ids: int32 Tensor, shape=(None, 128)
Returns:
{'dense': <1>}
<1>: float32 Tensor, shape=(None, 3)
{'dense': <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[0.40711606, 0.13456087, 0.45832306]], dtype=float32)>}
It seems you have two options
manually save weights
model.save_weights('./checkpoints/my_checkpoint')
model = create_model()
model.load_weights('./checkpoints/my_checkpoint')
save the entire model
Call model.save to save a model's architecture, weights, and training configuration in a single file/folder. This allows you to export a model so it can be used without access to the original Python code*. Since the optimizer-state is recovered, you can resume training from exactly where you left off.
Save model
# Create and train a new model instance.
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# Save the entire model as a SavedModel.
!mkdir -p saved_model
model.save('saved_model/my_model')
load model
new_model = tf.keras.models.load_model('saved_model/my_model')
It seems that you are mixing both approaches, saving model and loading weights.
Related
I am novice in TensorFlow
I am traying to use BERT embeddings in LSTM model
this is my model function
def bert_tweets_model():
Bertmodel = TFAutoModel.from_pretrained(model_name,output_hidden_states=True)
input_word_ids = tf.keras.Input(shape=(max_length,), dtype=tf.int32, name="input_ids")
input_masks_in = tf.keras.Input(shape=(max_length,), name='masked_token', dtype='int32')
with torch.no_grad():
last_hidden_states = Bertmodel(input_word_ids, attention_mask=input_masks_in)[0]
x = tf.keras.layers.LSTM(100, dropout=0.1, activation='relu',recurrent_dropout=0.3,return_sequences = True)(last_hidden_states)
x = tf.keras.layers.LSTM(50, dropout=0.1,activation='relu', recurrent_dropout=0.3,return_sequences = True)(x)
x=tf.keras.layers.Flatten()(x)
output = tf.keras.layers.Dense(units = 2, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[input_word_ids, input_masks_in], outputs = output)
return model
with strategy.scope():
model = bert_tweets_model()
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)
model.compile(loss='binary_crossentropy',optimizer=adam_optimizer,metrics=['accuracy'])
model.summary()
validation_data=[dev_encoded, y_val]
train2=[input_id, attention_mask]
history = model.fit(
x=train2, y=y_train, batch_size=batch_size,
epochs=3,
validation_data=validation_data,
verbose=2)
I recieved this error in fit function when I tried to input data
"ValueError: Layer "model_1" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 512) dtype=int32>]"
also,I received these warning massages I do not know what is means.
WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
WARNING:tensorflow:Layer lstm_3 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
can someone help me, thanks in advance.
Regenerating your error
_input1 = tf.random.uniform((1,100), 0 , 10)
_input2 = tf.random.uniform((1,100), 0 , 10)
model(_input1, _input2)
After running this code I am getting the same error...
Layer "model" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 100), ...
#Now, the problem is you have to enclose the inputs in the set or list then you have to pass the inputs to the model like this
model((_input1, _input2))
<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.5324366, 0.3743334]], dtype=float32)>
Remember: if you are using tf.data.Dataset then encolse it then while making the dataset enclose the dataset within the set like this
tf.data.Dataset.from_tensor_slices((words_id, words_mask))
Second Problem as you asked
The warning you are getting because, you should be aware that LSTM doesn't run in CUDA GPU it uses the CPU only therefore it is slow, so TensorFlow is just telling you that LSTM will not run under GPU or parallel computing.
I have created a model that takes an input of shape (None, 512). Below is the summary of my model
shape of training feature
train_ids.shape
(10, 512)
shape of the training response variable
indus_cat_train.shape
(10, 49)
My model runs perfectly if I use
history = model.fit(
train_ids, indus_cat_train, epochs=2, validation_data=(
valid_ids, indus_cat_valid))
However my actual dataset is very large and feeding the completed dataset all at once is consuming so much RAM and shut down all the process.
I want to feed all data in batches or one by one. In order to complete this task, I tried out tf.data.Dataset.from_tensor_slices function
# training data
tf_train_data = tf.data.Dataset.from_tensor_slices((train_ids, indus_cat_train))
# validation data
tf_valid_data = tf.data.Dataset.from_tensor_slices((valid_ids, indus_cat_valid))
The above code is running fine and upon inspection, it is giving the desired shape
for elem in t:
print(elem[0].shape) # for features
print(elem[1].shape) # for response
print output
(512,) # for features
(49,) # for response variable
# terminating all other output to save space
However on-calling model.fit on tf_train_dataset the model gives me an error
bert_history = model.fit(
tf_train_data, epochs=2, validation_data=tf_valid_data)
WARNING:tensorflow:Model was constructed with shape (None, 512) for input Tensor("input_ids_1:0", shape=(None, 512), dtype=int32), but it was called on an input with incompatible shape (512, 1).
Sharing model code for further understanding as asked by Prateek
# training data
tf_train_data = tf.data.Dataset.from_tensor_slices((train_ids, indus_cat_train))
# validation data
tf_valid_data = tf.data.Dataset.from_tensor_slices((valid_ids, indus_cat_valid))
# model downloaded from bert
bert_model_name = "uncased_L-12_H-768_A-12"
bert_ckpt_dir = "bert_model"
bert_ckpt_file = os.path.join(bert_ckpt_dir, "bert_model.ckpt")
bert_config_file = os.path.join(bert_ckpt_dir, "bert_config.json")
# creating tokenizer
tokenizer = FullTokenizer(vocab_file=os.path.join(bert_ckpt_dir, "vocab.txt"))
# create function for model
def create_model(max_seq_len, bert_ckpt_file, n_classes):
with tf.io.gfile.GFile(bert_config_file, "r") as reader:
# get bert configurations
bert_configurations = StockBertConfig.from_json_string(reader.read())
bert_params = map_stock_config_to_params(bert_configurations)
bert_params_adapter_size = None
bert = BertModelLayer.from_params(bert_params, name="bert")
input_ids = keras.layers.Input(shape=(max_seq_len,), dtype="int32",
name="input_ids")
bert_output = bert(input_ids)
print("bert shape", bert_output.shape)
cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(bert_output)
cls_out = keras.layers.Dropout(0.5)(cls_out)
logits = keras.layers.Dense(units=765, activation="tanh")(cls_out)
logits = keras.layers.Dropout(0.5)(logits)
logits = keras.layers.Dense(
units=n_classes, activation="softmax")(logits)
model = keras.Model(inputs=input_ids, outputs=logits)
model.build(input_shape=(None, max_seq_len))
load_stock_weights(bert, bert_ckpt_file)
return model
n_cats = 49 #number of output categories
model = create_model(max_seq_len=512, bert_ckpt_file=bert_ckpt_file,
n_classes=n_cats)
model.summary()
optimizer = tf.keras.optimizers.Adam( learning_rate=learning_rate, epsilon=1e-08)
loss = tf.keras.losses.CategoricalCrossentropy()metric = tf.keras.metrics.CategoricalCrossentropy( name='categorical_crossentropy')model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit( tf_train_data, epochs=2, validation_data=tf_valid_data)
I have solved it using dataset.batch. tf.data.Dataset was missing the batch size arguments as a result of which supplied tensors are not batched, i.e. I was getting shape (512,1) instead of (512,) and (49,1) instead of (49,)
batch_size = 2
tf_train_data = tf.data.Dataset.from_tensor_slices((train_ids,
indus_cat_train)).batch(batch_size)
tf_valid_data = tf.data.Dataset.from_tensor_slices((valid_ids,
indus_cat_valid)).batch(batch_size)
bert_history = model.fit(
tf_train_data, epochs=2, validation_data=tf_valid_data)
I'm following the TensorFlow Keras tutorial for text generation. The training part works perfectly, but when I try to predict the next token, I get an error.
Here's all the important code:
Making the vocabulary and dataset.
vocab = sorted(set(text))
char2index = { c:i for i, c in enumerate(vocab) }
index2char = np.array(vocab)
chars_to_int = np.array([char2index[c] for c in text])
char_dataset = tf.data.Dataset.from_tensor_slices(chars_to_int)
sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)
def split_input_and_target(sequence):
input_ = sequence[:-1]
target_ = sequence[1:]
return input_, target_
dataset = sequences.map(split_input_and_target)
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
Building the model
(important part here is that BATCH_SIZE = 64):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(len(vocab), EMBEDDING_DIM,
batch_input_shape=[BATCH_SIZE, None]))
# here are a few more layers
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
model.fit(dataset, epochs=EPOCHS)
Actually trying to generate text (this one was copied almost directly from the tutorial after I started getting desperate):
num_tokens = 100
seed = "some text"
input_eval = [char2index[c] for c in seed]
input_eval = tf.expand_dims(input_eval, 0)
text_generated = []
model.reset_states()
for i in range(num_tokens):
predictions = model(input_eval)
predictions = tf.squeeze(predictions, 0)
# more stuff
Then, I first get a warning:
WARNING:tensorflow:Model was constructed with shape (64, None) for input Tensor("embedding_14_input:0", shape=(64, None), dtype=float32), but it was called on an input with incompatible shape (1, 9).
Then it gives me an error:
---->3 predictions = model(input_eval)
...
ValueError: Tensor's shape (9, 64, 256) is not compatible with supplied shape [9, 1, 256]
The second number, 64, is my batch size. If I change BATCH_SIZE to 1, everything works and all is fine, but this is obviously not the solution I am hoping for.
(I somehow managed to miss a step in the tutorial despite reading it several times over the past few hours.)
Here's the relevant passage:
To keep this prediction step simple, use a batch size of 1.
Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.
To run the model with a different batch_size, we need to rebuild the model and restore the weights from the checkpoint.
tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
I hope my silly mistake will help somebody to remember to reload the model in the future!
I am trying to update my code to work with TF 2.0. as a start, I have used a pre-made keras model:
def train_input_fn(batch_size=1):
"""An input function for training"""
print("train_input_fn: start function")
train_dataset = tf.data.experimental.make_csv_dataset(CSV_PATH_TRAIN, batch_size=batch_size,label_name='label',
select_columns=["sample","label"])
print('train_input_fn: finished make_csv_dataset')
train_dataset = train_dataset.map(parse_features_vector)
print("train_input_fn: finished the map with pars_features_vector")
train_dataset = train_dataset.repeat().batch(batch_size)
print("train_input_fn: finished batch size. train_dataset is %s ", train_dataset)
return train_dataset
IMG_SHAPE = (160,160,3)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top = False,
weights = 'imagenet')
base_model.trainable = False
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])
estimator = tf.keras.estimator.model_to_estimator(keras_model = model, model_dir = './date')
# train_input_fn read a CSV of images, resize them and returns dataset batch
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=20)
# eval_input_fn read a CSV of images, resize them and returns dataset batch of one sample
eval_spec = tf.estimator.EvalSpec(eval_input_fn)
tf.estimator.train_and_evaluate(estimator, train_spec=train_spec, eval_spec=eval_spec)
LOGS are:
train_input_fn: finished batch size. train_dataset is %s <BatchDataset shapes: ({mobilenetv2_1.00_160_input: (None, 1, 160, 160, 3)}, (None, 1)), types: ({mobilenetv2_1.00_160_input: tf.float32}, tf.int32)>
ERROR:
ValueError: Input 0 of layer Conv1_pad is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: [None, 1, 160, 160, 3]
What will be the right way to combine tf.keras with dataset API. is this the issue or something else?
Thanks,
eilalan
You don't need this line
train_dataset = train_dataset.repeat().batch(batch_size)
Function you're using to create dataset, tf.data.experimental.make_csv_dataset alredy batched it. You can use repeat though
I've trained and saved my model successfully (image classifier) using tensorflow so now I have the .meta , index and checkpoint files.
I wanted to feed my model an image for testing so I create another .py file and restored my model :
with tf.Session() as sess:
saver = tf.train.import_meta_graph('model-24900.meta')
saver.restore(sess,"model-24900")
graph = tf.get_default_graph()
after that I tried to feed an image :
Prediction=sess.run([output],feed_dict={input_img : testImage,})
the problem that 'output' and input_img are defined in another file( where I've constructed and trained the model so they are undefined in the file where I want to test the model.
This is how I wrote in my train file:
with tf.name_scope("Input") as scope:
input_img = tf.placeholder(dtype='float', shape=[None, 128, 128, 1], name="input")
with tf.name_scope("Target") as scope:
target_labels = tf.placeholder(dtype='float', shape=[None, 2], name="Targets")
nb = NetworkBuilder()
with tf.name_scope("ModelV2") as scope:
model = input_img
model = nb.attach_conv_layer(model, 32)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 32)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 64)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 64)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 128)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 128)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.flatten(model)
model = nb.attach_dense_layer(model, 200)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 32)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 2)
output = nb.attach_softmax_layer(model)
with tf.name_scope("Optimization") as scope:
global_step = tf.Variable(0, name='global_step', trainable=False)
cost = tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=target_labels)
cost = tf.reduce_mean(cost)
tf.summary.scalar("cost", cost)
optimizer = tf.train.AdamOptimizer().minimize(cost,global_step=global_step)
with tf.name_scope('accuracy') as scope:
correct_pred = tf.equal(tf.argmax(output, 1), tf.argmax(target_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
So my question is how can I define 'output' and 'input_img' (that I've used in train file) in my test file so I can feed an image to my CNN model
1] Testing by .ckpt file and recreating the whole model:
you can redefine the whole model with input tensors and output tensors that you did during training in the .py file that you are using in the training
Now you need to make a forward pass by giving the test image to the input tensor that you define and get the predictions from the output tensor but no training should be preformed
The model that you define in the test .py file has to have the same structure that was used in the training
2] Testing by .ckpt file and using the name of the tensors:
You can explicitly name tensors in the training
During testing .py file you can easily get the input and output tensor by .get_tensor_by_name("example:0") function that you explicitly named during training
And you can use these tensors in the sess.run where you will feed data to the input tensor and get the prediction
3] Testing by .pb frozen graph and using the name of the tensors:
The above two methods are still trainable and the file size of these is larger then the .pb file
.pb frozen file is a freeze graph which is not trainable
You can use this file to import you frozen graph
Now you can get the input and output tensors by .get_tensor_by_name("example:0") function and make predictions
=========================UPDATED=========================
There are two ways you can know the name of the tensor:
1] Using Tensor-Board:
Save your model after training
Open the terminal and run the command tensorboard --logdir="path_where_you_have_stored_ckpt_file"
open http://0.0.0.0:6006/ into your web browser
Go to graphs section and identify the name of the tensor by clicking on that particular tensor node
2] In the code:
Every Tensor has name property
You can do something like this:
for node in graph_def.node:
print(node.name)
If you want name of a particular tensor you can do something like this:
x = tf.placeholder(tf.float32, [None, 784])
print(x.name)
Once you have the name you can easily use the name to retrieve the tensor from by using : .get_tensor_by_name(x.name)