Transfer Learning Trainable Model Throws Errors On saving

Transfer Learning Trainable Model Throws Errors On saving - python

I have downloaded strong texta pretrained model, and im trying to transfer learn it.
therefore I'm loading the model which is saved as a 'xray_model.h5' file, and set it as untrainable:
model = tf.keras.models.load_model('xray_model.h5')
model.trainable = False
later I take the start layer and end layer and build my addings on it:
base_input = model.layers[0].input
base_output = model.get_layer(name="flatten").output
base_output = build_model()(base_output)
new_model = keras.Model(inputs=base_input, outputs=base_output)
since I want to train my layers (and after some games, I realized that I might need to train the old layers too) I want to set the model as trainable:
for i in range(len(new_model.layers)):
new_model._layers[i].trainable = True
BUT, when I start training it, with the callback:
METRICS = ['accuracy',
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
lr_metric]
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001, verbose=1)
save_callback = tf.keras.callbacks.ModelCheckpoint("new_xray_model.h5",
save_best_only=True,
monitor='accuracy')
history = new_model.fit(train_generator,
verbose=1,
steps_per_epoch=BATCH_SIZE,
epochs=EPOCHS,
validation_data=test_generator,
callbacks=[save_callback, reduce_lr])
I get the next error:
File "C:\Users\jm10o\AppData\Local\Programs\Python\Python38\lib\site-packages\h5py\_hl\group.py", line 373, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)
Process finished with exit code 1
I noticed that it happens only when I'm trying to further train the model which I loaded.
I couldn't find any solution for it.

The problem came from the Model_checkpoint callback. for each epoch, you save the model with the same name.
use the following format
ModelCheckpoint('your_model_name{epoch:0d}.h5',
monitor='accuracy')

Related

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on.
After the training, I am trying to save the model but I get the following error:
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)
I have tried the following to solve my problem:
I tried to exclude the optimizer as mentioned here: https://github.com/tensorflow/tensorflow/issues/27688.
I tried different versions of TensorFlow like 2.2 and 2.3.
I tried to reinstall h5py like mentioned here: RuntimeError: Unable to create link (name already exists) when I append hdf5 file?.
Everything without success!
Here is the relevant code of the Model:
(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)
model = tf.keras.models.Sequential([
feature_layer,
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)
...
model.fit(train_ds,
validation_data=val_ds,
epochs=args.epoch,
callbacks=[tensorboard_callback])
model.summary()
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)
paramString = paramString + "-a{:.4f}".format(accuracy)
outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin
if args.saveModel:
filepath = "./saved_models/" + outputName + ".h5"
model.save(filepath, save_format='h5')
Called function in preprocessing Modul:
def getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):
print("start preprocessing...")
path = filepath
data = pd.read_csv(path, dtype={
"NAME1": np.str_,
"NAME2": np.str_,
"EMAIL1": np.str_,
"ZIP": np.str_,
"STREET": np.str_,
"LONGITUDE":np.floating,
"LATITUDE": np.floating,
"RECEIVERTYPE": np.int64})
feature_columns = []
data = data.fillna("NaN")
data = __preProcessName(data)
data = __preProcessStreet(data)
train, test = train_test_split(data, test_size=0.2, random_state=0)
train, val = train_test_split(train, test_size=0.2, random_state=0)
train_ds = __df_to_dataset(train, batch_size=batch_size)
val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)
__buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)
print("preprocessing completed")
return (feature_columns, train_ds, val_ds, test_ds)
Calling the different preprocessing functions of the features:
def __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):
feature_columns.append(__getFutureColumnLon(bucketSizeGEO))
feature_columns.append(__getFutureColumnLat(bucketSizeGEO))
(namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, 'NAME1PRO'))
feature_columns.append(namew1_one_hot)
feature_columns.append(namew2_one_hot)
feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, 'STREETPRO')))
feature_columns.append(__getFutureColumnZIP(2223, zippath))
if addCrossedFeatures:
feature_columns.append(__getFutureColumnCrossedNames(100))
feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))
Function reletated to embeddings:
def __getFutureColumnsName(name_num_words):
vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()
namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W1', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='NAME1W2', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(name_num_words)
namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)
return (namew1_embedding, namew2_embedding)
def __getFutureColumnStreet(street_num_words):
vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()
street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='STREETW', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
dim = __getNumberOfDimensions(street_num_words)
street_embedding = feature_column.embedding_column(street_voc, dimension=dim)
return street_embedding
def __getFutureColumnZIP(zip_num_words, zippath):
zip_voc = feature_column.categorical_column_with_vocabulary_file(
key='ZIP', vocabulary_file=zippath, vocabulary_size=zip_num_words,
default_value=0)
dim = __getNumberOfDimensions(zip_num_words)
zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)
return zip_embedding

The error OSError: Unable to create link (name already exists) when saving model in h5 format is caused by some duplicate variable names. Checking by for i, w in enumerate(model.weights): print(i, w.name) showed that they are the embedding_weights names.
Normally, when building feature_column, the distinct key passed into each feature column will be used to build distinct variable name. This worked correctly in TF 2.1 but broke in TF 2.2 and 2.3, and supposedly fixed in TF 2.4 nigthly.

My workaround for TF 2.3 is based on #SajanGohil's comment, but my issue was with weight names (not layer names):
for i in range(len(model.weights)):
model.weights[i]._handle_name = model.weights[i].name + "_" + str(i)
The same caveats apply: this approach manipulates TF internals and thus is not future-proof.

I found out that this situation also occurs when I load a model from a modelcheckpoint, model.compile it with same optimizer, metrics and loss function, and train it.
But if I avoid compiling it again with same parameters, this error message would not show up again.

Tensorflow TypeError: Can not convert a NoneType into a Tensor or Operation

I am learning TensorFlow and was going through this step-by-step guide. The below code is the exact same as on the website. However, when running it, I get an error when trying to fit the model. The full traceback I get is as follows:
Traceback (most recent call last):
File "C:\users\name\desktop\python ml tutorial\embedding.py", line 49, in <module>
model.fit(x=padded_docs, y=labels, epochs=50, verbose=0)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1213, in fit
self._make_train_function()
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 316, in _make_train_function
loss=self.total_loss)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\optimizer_v2\optimizer_v2.py", line 506, in get_updates
return [self.apply_gradients(grads_and_vars)]
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\optimizer_v2\optimizer_v2.py", line 441, in apply_gradients
kwargs={"name": name})
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1917, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1924, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\optimizer_v2\optimizer_v2.py", line 494, in _distributed_apply
with ops.control_dependencies(update_ops):
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 5257, in control_dependencies
return get_default_graph().control_dependencies(control_inputs)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\func_graph.py", line 356, in control_dependencies
return super(FuncGraph, self).control_dependencies(filtered_control_inputs)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 4691, in control_dependencies
c = self.as_graph_element(c)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3610, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "C:\Users\name\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3699, in _as_graph_element_locked
(type(obj).__name__, types_str))
TypeError: Can not convert a NoneType into a Tensor or Operation.
And the full code is below:
from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
# define document
docs = ['Well done!',
'Good work',
'Great effort',
'nice work',
'Excellent!',
'Weak',
'Poor effort!',
'not good',
'poor work',
'Could have done better.']
# define class labels
labels = array([1,1,1,1,1,0,0,0,0,0])
# integer-encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)
# padding
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen = max_length, padding = 'post')
print(padded_docs)
# define model
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy'])
# summarize
print(model.summary())
# fit the model
model.fit(x=padded_docs, y=labels, epochs=50, verbose=0)
# evaluate the model
loss, accuracy = model.evaluate(x=padded_docs, y=labels, verbose=0)
print("Accuracy: {}".format(accuracy))
What's going on here? The article was originally written back in 2017 but its last revision and update was just a week ago. I imagine they are constantly tweaking TensorFlow since it is still state-of-the-art and needs a lot of improvement.
Any ideas on how to circumvent this?
Edit:
I began trying to figure out where the script could have gone wrong. I will be listing what I found here, hopefully it will help us spot something:
I found that in ops.py's control_dependencies() function, control_inputs parameter has the following values: Tensor("Adam/gradients/gradients/loss/dense_1_loss/binary_crossentropy/logistic_loss_grad/Reshape_1:0", shape=(None, 1), dtype=float32), Tensor("Adam/gradients/gradients/loss/dense_1_loss/binary_crossentropy/logistic_loss/Log1p_grad/mul:0", shape=(None, 1), dtype=float32), and None. When it becomes None, the program crashes.

"Unknown RProp Optimizer" in Keras

I am trying to train a model using the RProp optimizer as detailed in this question and this question as well.
I downloaded the rprop.py script from this Github repository and added it to my Keras/tf codebase, at C:\mini\envs\aiml3\Lib\site-packages\tensorflow_core\python\keras\optimizer_v2.
In my R script (running in RStudio), I run the following to create my model:
model <- keras_model_sequential() %>%
layer_dense(units = 2, activation = "sigmoid", input_shape = c(2)) %>% #logistic, input
layer_dense(units = 1, activation = "sigmoid") #output
model %>% compile(
optimizer = "rprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
but I am thrown an error with the following traceback:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Unknown optimizer: rprop
Detailed traceback:
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\training\tracking\base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 251, in compile
self._set_optimizer(optimizer)
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1454, in _set_optimizer
self.optimizer = optimizers.get(optimizer)
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\keras\optimizers.py", line 848, in get
return deserialize(config)
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\keras\optimizers.py", line 817, in deserialize
printable_module_name='optimizer')
File "C:\mini\envs\aiml3\lib\site-packages\tensorflow_core\python\keras\utils\generic_utils.py", line 180, in deserialize_keras_object
config,
which looks like my script isn't recognizing the optimizer. I am not sure how to instantiate it in my model.

Basically the problem is that there is no 'rprop' optimizer in Keras. You can find a list of available optimizers here: https://keras.io/api/optimizers/
As a concrete workaround, you can use rmsprop instead. Check out the docs for an usage example.

Loading Weights - NoneType Found

I am working on an LSTM for a final project. I've been following TensorFlow's tutorial here: https://www.tensorflow.org/tutorials/sequences/text_generation for most of it, especially for how to save and load the models. However, it's coming up with this error:
Traceback (most recent call last):
File "D:\xxx\Documents\Class Coding\Artificial Intelligence\Shelley>\Writerbot.py", line 187, in
restore_progress()
File "D:\xxx\Documents\Class Coding\Artificial Intelligence\Shelley\Writerbot.py", line 141, in restore_progress
shelley.load_weights(weights)
File "C:\Users\xxx\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\network.py", line 1508, in load_weights
if _is_hdf5_filepath(filepath):
File "C:\Users\xxx\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\network.py", line 1648, in _is_hdf5_filepath
return filepath.endswith('.h5') or filepath.endswith('.keras')
AttributeError: 'NoneType' object has no attribute 'endswith'
And here is my code related to loading and restoring weights, as best as I can tell, since the rest of the error's coming from keras:
def create_shelley(vocab, embedding, numunits, batch):
"""This is what actually creates a neural network."""
shelley = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab, embedding,
batch_input_shape=[batch, None]),
lstm(numunits,
return_sequences=True,
recurrent_initializer='glorot_uniform',
stateful=True),
tf.keras.layers.Dense(vocab)
])
return shelley
def train():
"""We create weight checkpoints as we train our neural network on files fed into it."""
checkpoints = 'D:\\xxx\\Documents\\Class Coding\\Artificial Intelligence\\Shelley\\trainingcheckpoints'
prefix = os.path.join(checkpoints, "ckpt_{epoch}")
callback=tf.keras.callbacks.ModelCheckpoint(
filepath=prefix,
save_weights_only=True)
print(epochsteps)
history = shelley.fit(botfeed.repeat(), epochs=epochs, steps_per_epoch=epochsteps, callbacks=[callback])
def restore_progress():
"""Load the most recent weight checkpoint."""
trainingcheckpoints = "D:\\Robin Pegau\\Documents\\Class Coding\\Artificial Intelligence\\Shelley\\trainingcheckpoints\\checkpoint"
weights = tf.train.latest_checkpoint(trainingcheckpoints)
shelley = create_shelley(vocab, embed, totalunits, batch = 1)
shelley.load_weights(weights)
shelley.build(tf.TensorShape([1, None]))
restore_progress()
There is a "checkpoint" file that has no filetype. There are also files that look like "ckpt_[x].index" and "ckpt_[x].data-00000-of-00001
Thank you all for your help in advance.

Tensorflow: Global step must be from the same graph as loss

I'm trying to use Tensorflow to do some classification with the tf.contrib.layers package, and I've run into a problem I can't quite figure out. As far as I can tell from examples (e.g. this and it's tutorial), everything with the graph is handled by the API. I can download and run the same code in my environment perfectly well.
However, when I run my code, I get the an error that my global step is not from the same graph as my loss, which seems bizarre: ValueError: Tensor("global_step:0", shape=(), dtype=int64_ref) must be from the same graph as Tensor("softmax_cross_entropy_loss/value:0", shape=(), dtype=float32). The error occurs during the construction of the train_op
Here's my tensorflow code (I do have some other code for handling the loading of the data, but it doesn't use anything from tensorflow). Sorry that the code is sort of messy right now: I've been tearing it apart trying to figure this error out.
import numpy as np
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.estimators import model_fn as model_fn_lib
import data # my data loading module
def train(training_file, vocab_path, hidden_units=[10, 20, 10], estimator=tf.contrib.learn.DNNClassifier):
"""
Given a training CSV file, train a Tensorflow neural network
"""
training_set = data.load(training_file)
vocab = tf.contrib.learn.preprocessing.VocabularyProcessor(data.DOC_LENGTH)
vocab = vocab.restore(vocab_path)
training_data = tf.one_hot(training_set.data, len(vocab.vocabulary_._mapping), dtype=tf.float32)
training_targets = tf.constant(np.array(training_set.targets, dtype=np.int32))
classifier = tf.contrib.learn.Estimator(model_fn=lambda features, targets, mode, params: model_fn(features, targets, mode, params, hidden_units))
classifier.fit(input_fn=lambda: (training_data, training_targets), steps=2000)
return classifier
def model_fn(features, targets, mode, params, hidden_units):
if len(hidden_units) <= 0:
raise ValueError("Hidden units must be a iterable of ints of length >= 1")
# Define the network
network = tf.contrib.layers.relu(features, hidden_units[0])
for i in range(1, len(hidden_units)):
network = tf.contrib.layers.relu(network, hidden_units[i])
# Flatten the network
network = tf.reshape(network, [-1, hidden_units[-1] * data.DOC_LENGTH])
# Add dropout to enhance feature use
network = tf.layers.dropout(inputs=network, rate=0.5, training=mode == tf.contrib.learn.ModeKeys.TRAIN)
# Calculate the logits
logits = tf.contrib.layers.fully_connected(network, 15)
loss = None
train_op = None
if mode != tf.contrib.learn.ModeKeys.INFER:
targets = tf.cast(tf.one_hot(targets, 15, 1, 0), dtype=tf.float32)
loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=targets)
if mode == tf.contrib.learn.ModeKeys.TRAIN:
# This train_op causes the error
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=tf.train.get_global_step(),
optimizer='Adam',
learning_rate=0.01)
predictions = {
"classes": tf.argmax(input=logits, axis=1),
"probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}
return model_fn_lib.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=train_op)
def main(unusedargv):
# ... parses arguments
classifier = train(args.train_data, args.vocab)
print(evaluate(classifier, args.train_data))
print(evaluate(classifier, args.test_data))
if __name__ == "__main__":
tf.app.run()
Here's the full stack trace:
File "categorize.py", line 126, in main
classifier = train(args.train_data, args.vocab)
File "categorize.py", line 39, in train
classifier.fit(input_fn=lambda: (training_data, training_targets), steps=2000)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 934, in _train_model
model_fn_ops = self._call_legacy_get_train_ops(features, labels)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _call_legacy_get_train_ops
train_ops = self._get_train_ops(features, labels)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "categorize.py", line 37, in <lambda>
classifier = tf.contrib.learn.Estimator(model_fn=lambda features, targets, mode, params: model_fn(features, targets, mode, params, hidden_units))
File "categorize.py", line 73, in model_fn
learning_rate=0.01)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 152, in optimize_loss
with vs.variable_scope(name, "OptimizeLoss", [loss, global_step]):
File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 82, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1410, in variable_scope
g = ops._get_graph_from_inputs(values) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3968, in _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3907, in _assert_same_graph
"%s must be from the same graph as %s." % (item, original_item))
ValueError: Tensor("global_step:0", shape=(), dtype=int64_ref) must be from the same graph as Tensor("softmax_cross_entropy_loss/value:0", shape=(), dtype=float32).
Here's my code:

The context of two functions are different, so, you need to use the tf.Graph() in the calling function to set the default graph as follows.
def train(...):
with tf.Graph().as_default():
...
...
training_data = tf.one_hot(training_set.data, len(vocab.vocabulary_._mapping), dtype=tf.float32)
training_targets = tf.constant(np.array(training_set.targets, dtype=np.int32))
classifier = tf.contrib.learn.Estimator(model_fn=lambda features, targets, mode, params: model_fn(features, targets, mode, params, hidden_units))
classifier.fit(input_fn=lambda: (training_data, training_targets), steps=2000)
return classifier

I figured out the problem! This may specified have to do with the Estimator interface, but basically I needed to move my tensorflow variable definition into the Estimator. I ended up making a method to do this, but it also worked when I defined the variables in the lambda:
def train(training_file, vocab_path, hidden_units=[10, 20, 10]):
"""
Given a training CSV file, train a Tensorflow neural network
"""
training_set = data.load(training_file)
vocab = tf.contrib.learn.preprocessing.VocabularyProcessor(data.DOC_LENGTH)
vocab = vocab.restore(vocab_path)
# Note not defining the variables here
training_data = training_set.data
training_targets = np.array(training_set.targets, dtype=np.int32)
classifier = tf.contrib.learn.Estimator(model_fn=lambda features, targets, mode, params: model_fn(features, targets, mode, params, hidden_units))
# Note the variable definition here
classifier.fit(
input_fn=lambda:
(tf.one_hot(training_data, len(vocab.vocabulary_._mapping), dtype=tf.float32)
tf.constant(training_targets)),
steps=2000))
return classifier

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transfer Learning Trainable Model Throws Errors On saving - python

The problem came from the Model_checkpoint callback. for each epoch, you save the model with the same name. use the following format ModelCheckpoint('your_model_name{epoch:0d}.h5', monitor='accuracy')

Related

TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

Tensorflow TypeError: Can not convert a NoneType into a Tensor or Operation

"Unknown RProp Optimizer" in Keras

Loading Weights - NoneType Found

Tensorflow: Global step must be from the same graph as loss

Categories

Resources