I have 2 different neural network models, trained and saved using TFLearn. When I run each script, the saved models are loaded properly. I need a system where, the second model should be called after the output of the first model.
But when I try to load the second model after the first model has been loaded, it gives me the following error:
NotFoundError (see above for traceback): Key val_loss_2 not found in checkpoint
[[Node: save_6/RestoreV2_42 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_6/Const_0_0, save_6/RestoreV2_42/tensor_names, save_6/RestoreV2_42/shape_and_slices)]]
The second model is properly loaded if I comment out the loading of the first model, or if I run the 2 scripts separately. Any idea why this error is happening?
The code structure is something like ..
from second_model_file import check_second_model
def run_first_model(input):
features = convert_to_features(input)
model = tflearn.DNN(get_model())
model.load("model1_path/model1") # relative path
pred = model.predict(features)
...
if pred == certain_value:
check_second_model()
The second_model_file.py is something similar:
def check_second_model():
input_var = get_input_var()
model2 = tflearn.DNN(regression_model())
model2.load("model2_path/model2") # relative path
pred = model2.predict(input_var)
#other stuff ......
The models have been saved in different folders and so each have their own checkpoint file
Well, okay I found the solution. It was hidden in the discussion on this thread .
I used tf.reset_default_graph() before building the second network and model and it worked. Hope this helps someone else too.
New code:
import tensorflow as tf
def check_second_model():
input_var = get_input_var()
tf.reset_default_graph()
model2 = tflearn.DNN(regression_model())
model2.load("model2_path/model2") # relative path
pred = model2.predict(input_var)
Though I intuitively understand why this solution works, I would be happy if someone can explain me better why it is designed such.
Related
I am trying to use #tf.function(jit_compile=True) to create a TF graph; below is the pseudocode for it. I'm not able to provide a functioning code since it contains a lot of dependencies.
model = tf.keras.models.load_model()
#tf.function(jit_compile=True)
def myfunction(inputs, model):
out1 = function1(inputs)
out2 = model(out1)
out3 = function2(out2)
return out3
When I run the above code, I get the error RuntimeError: Cannot get session inside Tensorflow graph function. Everything works fine if I change the code to load the model inside function2. function2 contains a while loop (tf.while_loop), would that be a problem? I am not sure why taking the model as an input is not working in this case. Any help is appreciated.
I am unable to load saved pytorch model from the outputs folder in my other scripts.
I am using following lines of code to save the model:
os.makedirs("./outputs/model", exist_ok=True)
torch.save({
'model_state_dict': copy.deepcopy(model.state_dict()),
'optimizer_state_dict': optimizer.state_dict()
}, './outputs/model/best-model.pth')
new_run.upload_file("outputs/model/best-model.pth", "outputs/model/best-model.pth")
saved_model = new_run.register_model(model_name='pytorch-model', model_path='outputs/model/best-model.pth')
and using the following code to access it:
global model
best_model_path = 'outputs/model/best-model.pth'
model_checkpoint = torch.load(best_model_path)
model.load_state_dict(model_checkpoint['model_state_dict'], strict = False)
but when I run the above mentioned code, I get this error: No such file or directory: './outputs/model/best-model.pth'
Also I want to know is there a way to get the saved model from Azure Models? I have tried to get it by using following lines of code:
from azureml.core.model import Model
model = Model(ws, "Pytorch-model")
but it returns Model type object which returns error on model.eval() (error: Model has no such attribute eval()).
There is no global output folder. If you want to use a Model in a new script you need to give the script the model as an input or register the model and download the model from the new script.
The Model object form from azureml.core.model import Model is not your pytorch Model. 1
You can use model.register(...) to register your model. And model.download(...) to download you model. Than you can use pytorch to load you model. 2
I have problem with fastai library. My code below:
import fastai
from fastai.text import *
import os
import pandas as pd
import fastai
from fastai import *
lab = df.columns[0]
data_lm = TextLMDataBunch.from_csv(r'/AWD', 'data.csv', label_cols = lab, text_cols = ['text'])
data_clas = TextClasDataBunch.from_csv(r'/AWD', 'data.csv', vocab = data_lm.train_ds.vocab, bs = 256,label_cols = lab, text_cols=['text'])
data_lm.save('data_lm_export.pkl')
data_clas.save('data_clas.pkl')
learn = language_model_learner(data_lm,AWD_LSTM,drop_mult = 0.3)
learn.lr_find()
learn.recorder.plot(skip_end=10)
learn.fit_one_cycle(10,1e-2,moms=(0.8,0.7))
learn.save('fit_head')
learn.load('fit_head')
My data is quite big, so each epoch in fit_one_cycle lasts about 6h. My resources enables me only to train model in SLURM JOB 70h, so my whole script will be cancelled. I wanted to divide my script into pieces and the first longest part has to learn and save fit_head. Everything was ok, and after that I wanted to load my model to train it again, but i got this error:
**RuntimeError: Error(s) in loading state_dict for SequentialRNN:
size mismatch for 0.encoder.weight: copying a param with shape torch.Size([54376, 400]) from checkpoint, the shape in current model is torch.Size([54720, 400]).
**
I have checked similar problems on github/stack posts and I tried those solutions like this below, but i cannot find anything usefull.
data_clas.vocab.stoi = data_lm.vocab.stoi
data_clas.vocab.itos = data_lm.vocab.itos
Is there any possibility to load trained model without having this issue ?
When you do learner.save() only the model weights are saved on your disk and not the model state dict which contains the model architecture information.
To train the model in a different session you must first define the model itself. Remember to use the same code to define your new model. Since your data is quite heavy as you mentioned you can use a very small subset (~16 records) of your data to create this new model and then do learn.load(model_path) and you should be able to resume training.
you can modify the training data with learn.data.train_dl = new_dl
thanks for your atention, I'm developing an automatic speaker recognition system using SincNet.
Ravanelli, M., & Bengio, Y. (2018, December). Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
Since the network is coded in Pytorch I searched and found a Keras implementation here https://github.com/grausof/keras-sincnet. I adapted the train.py code to train a Sincnet with my own data in Tensorflow 2.0, and worked fine, I saved only the weights of my trained network, my training data has shape 128,3200,1 for inputs and 128 for labels per batch
#Creates a Sincnet model with input_size=3200 (wlen), num_classes=40, fs=16000
redsinc = create_model(wlen,num_classes,fs)
#Saves only weights and stopearly callback
checkpointer = ModelCheckpoint(filepath='checkpoints/SincNetBiomex3.hdf5',verbose=1,
save_best_only=True, monitor='val_accuracy',save_weights_only=True)
stopearly = EarlyStopping(monitor='val_accuracy',patience=3,verbose=1)
callbacks = [checkpointer,stopearly]
# optimizer = RMSprop(lr=learnrate, rho=0.9, epsilon=1e-8)
optimizer = Adam(learning_rate=learnrate)
# Creates generator of training batches
train_generator = batchGenerator(batch_size,train_inputs,train_labels,wlen)
validinputs, validlabels = create_batches_rnd(validation_labels.shape[0],
validation_inputs,validation_labels,wlen)
#Compiling model and train with function fit_generator
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history = redsinc.fit_generator(train_generator, steps_per_epoch=N_batches, epochs = epochs,
verbose = 1, callbacks=callbacks, validation_data=(validinputs,validlabels))
The problem came when I tried to evaluate the network, I didn't use the code found in test.py, I only loaded the weights I previously saved and use the function evaluate, my test data had the shape 1200,3200,1 for the inputs and 1200 for labels.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
test_loss, test_accuracy = redsinc.evaluate(x=eval_in,y=eval_lab)
RuntimeError: You must compile your model before training/testing. Use `model.compile(optimizer,
loss)`.
Then I added the same compile code I used for training:
optimizer = Adam(learning_rate=0.001)
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
Then rerun the test code and got this:
WARNING:tensorflow:From C:\Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-
packages\tensorflow_core\python\ops\resource_variable_ops.py:1781: calling
BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is
deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
ValueError: A tf.Variable created inside your tf.function has been garbage-collected. Your code needs to keep Python references to variables created inside `tf.function`s.
A common way to raise this error is to create and return a variable only referenced inside your function:
#tf.function
def f():
v = tf.Variable(1.0)
return v
v = f() # Crashes with this error message!
The reason this crashes is that #tf.function annotated function returns a **`tf.Tensor`** with the **value** of the variable when the function is called rather than the variable instance itself. As such there is no code holding a reference to the `v` created inside the function and Python garbage collects it.
The simplest way to fix this issue is to create variables outside the function and capture them:
v = tf.Variable(1.0)
#tf.function
def f():
return v
f() # <tf.Tensor: ... numpy=1.>
v.assign_add(1.)
f() # <tf.Tensor: ... numpy=2.>
I don't understand the error since I've evaluated other networks with the same function and never got any problems. Then I decided to use predict function to match predicted labels with correct labels and obtain all metrics with my own code but I got another error.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
print('Model loaded')
#Predict labels with test data
predict_labels = redsinc.predict(eval_in)
Error while reading resource variable _AnonymousVar212 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar212/class tensorflow::Var does not exist.
[[node sinc_conv1d/concat_104/ReadVariableOp (defined at \Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_13649]
Function call stack:
keras_scratch_graph
I hope someone can tell me what these errors mean and how to solve them, I've searched for solutions to them but most of the solutions I've found don't seem related to my problem so I can't apply those solutions. I'm guessing the errors are caused by the Sincnet layer code, because it is a custom coded layer. The code for Sincnet layer can be found in the github repository in the file sincnet.py.
I appreciate all help I can get, again thank you for your atention.
You should downgrade your tf and keras version, it works to me when I faced the same problem.
Try this keras==2.1.6; tensorflow-gpu==1.13.1
I am using theano, sklearn and numpy in Python. I found this code for saving my trained network and predict on my new dataset in this link https://github.com/lzhbrian/RBM-DBN-theano-DL4J/blob/master/src/theano/code/logistic_sgd.py. the part of the code I am using is this :
"""
An example of how to load a trained model and use it
to predict labels.
"""
def predict():
# load the saved model
classifier = pickle.load(open('best_model.pkl'))
# compile a predictor function
predict_model = theano.function(
inputs=[classifier.input],
outputs=classifier.y_pred)
# We can test it on some examples from test test
dataset='mnist.pkl.gz'
datasets = load_data(dataset)
test_set_x, test_set_y = datasets[2]
test_set_x = test_set_x.get_value()
predicted_values = predict_model(test_set_x[:10])
print("Predicted values for the first 10 examples in test set:")
print(predicted_values)
if __name__ == '__main__':
sgd_optimization_mnist()
The code for the neural network model I want to save and load and predict with is https://github.com/aseveryn/deep-qa. I could save and load the model with cPickle but I continuously get errors in # compile a predictor function part:
predict_model = theano.function(inputs=[classifier.input],outputs=classifier.y_pred)
Actually I am not certain what I need to put in the inputs according to my code. Which one is right?
inputs=[main.predict_prob_batch.batch_iterator], outputs=test_nnet.layers[-1].
y_pred)
inputs=[predict_prob_batch.batch_iterator],
outputs=test_nnet.layers[-1].y_pred)
inputs=[MiniBatchIteratorConstantBatchSize.dataset],
outputs=test_nnet.layers[-1].y_pred)
inputs=[
sgd_trainer.MiniBatchIteratorConstantBatchSize.dataset],
outputs=test_nnet.layers[-1].y_pred)
or none of them???
Each of them I tried I got the errors:
ImportError: No module named MiniBatchIteratorConstantBatchSize
or
NameError: global name 'predict_prob_batch' is not defined
I would really appreciate if you could help me.
I also used these commands for running the code but still the errors.
python -c 'from run_nnet import predict; from sgd_trainer import MiniBatchIteratorConstantBatchSize; from MiniBatchIteratorConstantBatchSize import dataset; print predict()'
python -c 'from run_nnet import predict; from sgd_trainer import *; from MiniBatchIteratorConstantBatchSize import dataset; print predict()'
Thank you and let me know please if you know a better way to predict for new dataset on the loaded trained model.