I am trying to use #tf.function(jit_compile=True) to create a TF graph; below is the pseudocode for it. I'm not able to provide a functioning code since it contains a lot of dependencies.
model = tf.keras.models.load_model()
#tf.function(jit_compile=True)
def myfunction(inputs, model):
out1 = function1(inputs)
out2 = model(out1)
out3 = function2(out2)
return out3
When I run the above code, I get the error RuntimeError: Cannot get session inside Tensorflow graph function. Everything works fine if I change the code to load the model inside function2. function2 contains a while loop (tf.while_loop), would that be a problem? I am not sure why taking the model as an input is not working in this case. Any help is appreciated.
Related
I'm following this tutorial to perform time series classifications using Transformers with Keras and TensorFlow. I'm using Windows 10 and the PyDev Eclipse plugin. Unfortunately, my program stops and the console output is completely blank every time I run the following code:
n_classes = len(np.unique(y_train))
input_shape = np.array(x_trainScaled).shape[0:]
model = build_model(n_classes,input_shape,head_size=256,num_heads=4,ff_dim=4,num_transformer_blocks=4,mlp_units=[128],mlp_dropout=0.4,dropout=0.25)
model.compile(loss="sparse_categorical_crossentropy",optimizer=keras.optimizers.Adam(learning_rate=1e-4),metrics=["sparse_categorical_accuracy"])
print(model.summary())
callbacks = [keras.callbacks.EarlyStopping(patience=100, restore_best_weights=True)]
model.fit(x_trainScaled,y_train,validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)
pathToModel = 'my/path/to/model/'
model.save(pathToModel)
Even previous warnings or print statements are completely erased and I have no idea what's going on. If I comment the model.fit(...) statement out, the program terminates and crashes with an error message resulting from a model.predict(...) call.
Any help is highly appreciated.
The solution was to transform the input data and labels to numpy arrays first. Thus, calling the fit function as follows:
model.fit(np.array(x_trainScaled),np.array(y_train),validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)
worked perfectly fine for me, as opposed to:
model.fit(x_trainScaled,y_train,validation_split=0.2,epochs=200,batch_size=64,callbacks=callbacks)
thanks for your atention, I'm developing an automatic speaker recognition system using SincNet.
Ravanelli, M., & Bengio, Y. (2018, December). Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
Since the network is coded in Pytorch I searched and found a Keras implementation here https://github.com/grausof/keras-sincnet. I adapted the train.py code to train a Sincnet with my own data in Tensorflow 2.0, and worked fine, I saved only the weights of my trained network, my training data has shape 128,3200,1 for inputs and 128 for labels per batch
#Creates a Sincnet model with input_size=3200 (wlen), num_classes=40, fs=16000
redsinc = create_model(wlen,num_classes,fs)
#Saves only weights and stopearly callback
checkpointer = ModelCheckpoint(filepath='checkpoints/SincNetBiomex3.hdf5',verbose=1,
save_best_only=True, monitor='val_accuracy',save_weights_only=True)
stopearly = EarlyStopping(monitor='val_accuracy',patience=3,verbose=1)
callbacks = [checkpointer,stopearly]
# optimizer = RMSprop(lr=learnrate, rho=0.9, epsilon=1e-8)
optimizer = Adam(learning_rate=learnrate)
# Creates generator of training batches
train_generator = batchGenerator(batch_size,train_inputs,train_labels,wlen)
validinputs, validlabels = create_batches_rnd(validation_labels.shape[0],
validation_inputs,validation_labels,wlen)
#Compiling model and train with function fit_generator
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history = redsinc.fit_generator(train_generator, steps_per_epoch=N_batches, epochs = epochs,
verbose = 1, callbacks=callbacks, validation_data=(validinputs,validlabels))
The problem came when I tried to evaluate the network, I didn't use the code found in test.py, I only loaded the weights I previously saved and use the function evaluate, my test data had the shape 1200,3200,1 for the inputs and 1200 for labels.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
test_loss, test_accuracy = redsinc.evaluate(x=eval_in,y=eval_lab)
RuntimeError: You must compile your model before training/testing. Use `model.compile(optimizer,
loss)`.
Then I added the same compile code I used for training:
optimizer = Adam(learning_rate=0.001)
redsinc.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
Then rerun the test code and got this:
WARNING:tensorflow:From C:\Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-
packages\tensorflow_core\python\ops\resource_variable_ops.py:1781: calling
BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is
deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
ValueError: A tf.Variable created inside your tf.function has been garbage-collected. Your code needs to keep Python references to variables created inside `tf.function`s.
A common way to raise this error is to create and return a variable only referenced inside your function:
#tf.function
def f():
v = tf.Variable(1.0)
return v
v = f() # Crashes with this error message!
The reason this crashes is that #tf.function annotated function returns a **`tf.Tensor`** with the **value** of the variable when the function is called rather than the variable instance itself. As such there is no code holding a reference to the `v` created inside the function and Python garbage collects it.
The simplest way to fix this issue is to create variables outside the function and capture them:
v = tf.Variable(1.0)
#tf.function
def f():
return v
f() # <tf.Tensor: ... numpy=1.>
v.assign_add(1.)
f() # <tf.Tensor: ... numpy=2.>
I don't understand the error since I've evaluated other networks with the same function and never got any problems. Then I decided to use predict function to match predicted labels with correct labels and obtain all metrics with my own code but I got another error.
# Create a Sincnet model and load previously saved weights
redsinc = create_model(wlen,num_clases,fs)
redsinc.load_weights('checkpoints/SincNetBiomex3.hdf5')
print('Model loaded')
#Predict labels with test data
predict_labels = redsinc.predict(eval_in)
Error while reading resource variable _AnonymousVar212 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar212/class tensorflow::Var does not exist.
[[node sinc_conv1d/concat_104/ReadVariableOp (defined at \Users\atenc\Anaconda3\envs\py3.7-tf2.0gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_13649]
Function call stack:
keras_scratch_graph
I hope someone can tell me what these errors mean and how to solve them, I've searched for solutions to them but most of the solutions I've found don't seem related to my problem so I can't apply those solutions. I'm guessing the errors are caused by the Sincnet layer code, because it is a custom coded layer. The code for Sincnet layer can be found in the github repository in the file sincnet.py.
I appreciate all help I can get, again thank you for your atention.
You should downgrade your tf and keras version, it works to me when I faced the same problem.
Try this keras==2.1.6; tensorflow-gpu==1.13.1
So I'm trying to run a training session, and when I do I get this error when trying to run my algorithm (when I use tf.train.get_global_step()):
ValueError: global_step is required for exponential_decay.
For some reason, tf.train.get_or_create_global_step() doesn't exist for me, I'm not sure if that's because it's a removed method or what. I updated TensorFlow and everything I'm up to date.
I've dug around the documentation and there's nothing about it. To run I'm using tf.app.run() with a main function.
Is there another way to initialize the global step variable?
Although tf.train.get_or_create_step() is perfectly fine, here is another solution:
g_step = tf.get_variable('global_step', trainable=False, initializer=0)
learning_rate = tf.train.exponential_decay(0.1, g_step)
tf.train.AdamOptimizer(learning_rate).minimize(loss=loss, global_step=g_step)
Create an untrainable variable that initializes with zero and passes it to the Optimizer.
If you need global_step later use tf.train.global_step():
sess = tf.Session()
# Initialize the variable
sess.run(g_step.initializer)
print('global_step: %s' % tf.train.global_step(sess, g_step))
So, the reason this function wasn't showing up was because I actually hadn't been on the newest version of TensorFlow even though it was telling me I was completely up to date.
Seen Here:
So all I did to fix it was uninstall tensorflow, then install from the actual link I don't have it anymore, but a quick google search would suffice.
I am using lstm predictor for timeseries prediction..
regressor = skflow.Estimator(model_fn=lstm_model(TIMESTEPS, RNN_LAYERS, DENSE_LAYERS))
validation_monitor = learn.monitors.ValidationMonitor(X['val'], y['val'],
every_n_steps=PRINT_STEPS,
early_stopping_rounds=1000)
regressor.fit(X['train'], y['train'], monitors=[validation_monitor])
But while doing regressor.fit, i am getting the error as shown in Title, need help on this..
I understand that your code imports the lstm_model from the file lstm_predictor.py when initializing your estimator. If so, the problem is caused by the following line:
x_ = learn.ops.split_squeeze(1, time_steps, X)
As the README.md of that repo tells, the Tensorflow API has changed significantly. The function split_squeeze also seems to be removed from the module tensorflow.contrib.learn.python.ops. This issue has been discussed in that repository but no changes have been made in that repo since 2 years!
Yet, you can simply replace that function with tf.unstack. So simply change the line as:
x_ = tf.unstack(X, num=time_steps, axis=1)
With this I was able to get past the problem.
I have 2 different neural network models, trained and saved using TFLearn. When I run each script, the saved models are loaded properly. I need a system where, the second model should be called after the output of the first model.
But when I try to load the second model after the first model has been loaded, it gives me the following error:
NotFoundError (see above for traceback): Key val_loss_2 not found in checkpoint
[[Node: save_6/RestoreV2_42 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_6/Const_0_0, save_6/RestoreV2_42/tensor_names, save_6/RestoreV2_42/shape_and_slices)]]
The second model is properly loaded if I comment out the loading of the first model, or if I run the 2 scripts separately. Any idea why this error is happening?
The code structure is something like ..
from second_model_file import check_second_model
def run_first_model(input):
features = convert_to_features(input)
model = tflearn.DNN(get_model())
model.load("model1_path/model1") # relative path
pred = model.predict(features)
...
if pred == certain_value:
check_second_model()
The second_model_file.py is something similar:
def check_second_model():
input_var = get_input_var()
model2 = tflearn.DNN(regression_model())
model2.load("model2_path/model2") # relative path
pred = model2.predict(input_var)
#other stuff ......
The models have been saved in different folders and so each have their own checkpoint file
Well, okay I found the solution. It was hidden in the discussion on this thread .
I used tf.reset_default_graph() before building the second network and model and it worked. Hope this helps someone else too.
New code:
import tensorflow as tf
def check_second_model():
input_var = get_input_var()
tf.reset_default_graph()
model2 = tflearn.DNN(regression_model())
model2.load("model2_path/model2") # relative path
pred = model2.predict(input_var)
Though I intuitively understand why this solution works, I would be happy if someone can explain me better why it is designed such.