I've made a NN model with Keras in a Ananconda enviroment (i'm using Jupiter).
I would want to access the log file that I'm writing with tensorboard, and I would like to see the accuracy and the loss function graphs.
However, when I try to access to the log file from the terminal this error occurs: AttributeError: module 'tensorboard.util' has no attribute 'PersistentOpEvaluator'
Anyone can help me to write these graphs and to see them opening tensorboard?
This is my code:
hidden_size = 256
sl_model = keras.models.Sequential()
[...]
sl_model.add(keras.layers.Dense(max_length, activation='softmax'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
# Let's print a summary of the model
sl_model.summary()
#I'd like to access to this file
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
How can I fix this? thank you!
You must delete tensorboard directory in site-packages then pip install tensorboard --upgrade supposing your tensorflow version is up-to-date.
Related
I'm new using Pyhton and I'm kinda lost trying setup keras save model and weights, someone with more experience could help me please?
I'm folloing this guide to learn how predict system works and would give advices in lottery games:
https://medium.com/#polanitzer/predicting-the-israeli-lottery-results-for-the-november-29-2022-game-using-an-artificial-191489eb2c10
On my Jurassic computer this is frozing in random epochs above 900, then reading tensorflow docs have see about possibility to use save weights on every epoch and continue from previous one if it fails / computer frozen.
I have did to checkpoint:
checkpoint_filepath="/home/ubuntu/Downloads/Lottery/checkpoints/lottery/"
model_checkpoint_callback = ModelCheckpoint(
filepath=os.path.join(checkpoint_filepath,"weights-improvement.hd5"),
monitor='val_accuracy',
verbose=1,
save_best_only=True,
save_weights_only=True,
save_freq='epoch',
mode='max')
es = EarlyStopping(monitor='val_accuracy', patience=5)
callbacks_list = [model_checkpoint_callback, es]
And to load model:
model.load_weights("/home/ubuntu/Downloads/Lottery/checkpoints/lottery/weights-improvement.hd5")
loss, acc = model.evaluate(train_samples, train_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
load_model('/home/ubuntu/Downloads/Lottery/lottery/')
Tried use train_samples / train_labels and x_train / y_train, no luck restoring then, runing all code in jupyter notebook, this start from begin everytime (0.08% accuracy even previous run have get 60% before frozen).
And at training model I have did:
model.fit(x=x_train, y=y_train, batch_size=32, epochs=1200, verbose=2, callbacks=[model_checkpoint_callback], validation_split=0.22)
model.save('lottery')
I was reading docs from here:
https://www.tensorflow.org/tutorials/keras/save_and_load
What I'm doing wrong?
Thanks in advice to all able to try help me!!!
Hi I have tried to load my checkpoints but i get the following error:
" W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open ../codeOutputs/3DNewArchitectureWithRotation: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?"
This is the code I have used:
checkpoint_filepath = '../codeOutputs/3DNewArchitectureWithRotation'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch',
options=None,
initial_value_threshold=None,
)
Model.load_weights(checkpoint_filepath)
BestRegressor = Model.fit(aaaiTrainImages, afTrainPorosity, validation_data = (aaaiValidationImages, afValidationPorosity), epochs=Epochs, callbacks =[EarlyStop,model_checkpoint_callback], verbose=2)
It seems the file type the checkpoints have been saved as are :HDF document (application/x-hdf).
I would appreciate any help as I have spend many days training my model and suddenly crashed, so it would be really helpful if I can skip retraining it up to the data I had
I was faced with the same issue. As others have pointed out, the issue derives from the argument save_weights_only=False which creates a directory of files. You can still call model.load_weights() and depersist the model, but you get that unpleasant error. One approach I took was to use the following to depersist the model without any errors/warnings.
import tensorflow as tf
m = tf.keras.models.load_model('/path/to/checkpoint/dir')
I've been using Colab to train my models, but it's quite infuriating that so far I have only been able to save the weights to my Google Drive, not the whole model, or even model checkpoints.
I mounted Google Drive with:
from google.colab import drive
drive.mount('/content/gdrive')
And I know that I can read files from the Drive as this code works:
import numpy as np
with np.load("/content/gdrive/MyDrive/trainingData.npz") as f:
dataX = f["dataX"]
dataY = f["dataY"]
And I set up the TPU using the following:
%tensorflow_version 2.x
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
But when I run the following code, no model checkpoints get saved:
with tpu_strategy.scope():
model = Sequential()
model.add(LSTM(256, input_shape=(dataX.shape[1], dataX.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(dataY.shape[1], activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam')
filepath="/content/gdrive/MyDrive/weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(dataX, dataY, epochs=50, batch_size=128)
I can't even just save the model normally: model.save("/content/gdrive/MyDrive/model") gives:
UnimplementedError: File system scheme '[local]' not implemented (file: 'model/variables/variables_temp/part-00000-of-00001')
Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
The interesting thing is that I can still save model weights, via model.save_weights("/content/gdrive/MyDrive/model.h5")
However, as I want to be able to save the whole model for future training, just saving the weights is not satisfactory.
What errors have I made and how can I save my model?
When using
model.compile(optimizer = tf.train.AdamOptimizer(),
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
in my Jupyter Notebook the following Error pops up:
module 'tensorflow._api.v2.train' has no attribute 'AdamOptimizer'
Tensorflow Version: 2.0.0-alpha0
Do you think the only possibility is to downgrade the TF version?
tf.train.AdamOptimizer() => tf.optimizers.Adam()
From https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/optimizers
model.compile(optimizer = tf.keras.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
I haven't tried 2.0 yet, but from what I've seen on the dev submit videos, you can use
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
tf.optimizers.Adam()
Is the way to go. No reason to downgrade.
There are lots of changes in tf 2.0 compared to 1.14.
Note that the parameter-names of Adam have changed, too. e.g. beta1 is now beta_1, check the documentation in Meixu Songs link.
I had the same error. I removed
tf.train.AdamOptimizer()
And I wrote
tf.optimizers.Adam()
Instead.
It should be:
tf.compat.v1.train.AdamOptimizer()
It's a minor change in upgraded version.
Please use:
model.compile(optimizer=tf.optimizers.Adam(), loss="sparse_categorical_crossentropy")
Thanks!
I'm currently struggeling with importing my exported Keras model into Tensorflow. The code worked fine with a sequential model. I was able to train the model in python and then import it into my c++ application. Since I needed more ressources I decided to distribute the model onto several GPUs. Afterwards I was not able to import the model.
This is how I created my model before:
input_img = Input(shape=(imgDim, imgDim, 1))
# add several layers to net
model = Model(input_img, net)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=100,
batch_size=100,
shuffle=True,
validation_data=(x_test, y_test))
saveKerasModelAsProtobuf(model, outpath)
This is how I export my model:
def saveKerasModelAsProtobuf(model, outputPath):
signature = tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'image': model.input}, outputs={'scores': model.output})
builder = tf.saved_model.builder.SavedModelBuilder(outputPath)
builder.add_meta_graph_and_variables(
sess=keras.backend.get_session(),
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
signature
}
)
builder.save()
return
This is how I changed the code to run on multiple GPUs:
input_img = Input(shape=(imgDim, imgDim, 1))
# add several layers to net
model = Model(input_img, net)
parallel_model = multi_gpu_model(model, gpus=4)
parallel_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
parallel_model.fit(x_train, y_train,
epochs=100,
batch_size=100,
shuffle=True,
validation_data=(x_test, y_test))
# export model rather than parallel_model:
saveKerasModelAsProtobuf(model, outpath)
When I try to import the model in C++ on a single GPU machine I get the following error, indicating that it's not actually the sequential model (as I would expect) but the parallel_model:
Cannot assign a device for operation 'replica_3/lambda_4/Shape': Operation was explicitly assigned to /device:GPU:3 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: replica_3/lambda_4/Shape = Shape[T=DT_FLOAT, _output_shapes=[[4]], out_type=DT_INT32, _device="/device:GPU:3"](input_1)]]
From what I read, they should share the same weights, but not the internal structure. What am I doing wrong? Is there a better/more generic way to export the model?
Thanks!