I am sorry if this question seems pretty straight forward. But reading the Keras save and restore help page :
https://www.tensorflow.org/beta/tutorials/keras/save_and_restore_models
I do not understand how to use the "ModelCheckpoint" for saving during training. The help file mentions it should give 3 files, I see only one, MODEL.ckpt.
Here is my code:
checkpoint_dir = FolderName + "/tmp/model.ckpt"
cp_callback = k.callbacks.ModelCheckpoint(checkpoint_dir,verbose=1,save_weights_only=True)
parallel_model.compile(optimizer=tf.keras.optimizers.Adam(lr=learning_rate),loss=my_cost_MSE, metrics=['accuracy])
parallel _model.fit(image, annotation, epochs=epoch,
batch_size=batch_size, steps_per_epoch=10,
validation_data=(image_val,annotation_val),validation_steps=num_batch_val,callbacks=callbacks_list)
Also, when I want to load the weights after training with:
model = k.models.load_model(file_checkpoint)
I get the error:
"raise ValueError('Unknown ' + printable_module_name + ':' + object_name)
ValueError: Unknown loss function:my_cost_MSE"
my-cost_MSE is my cost function that is used in the training.
First of all, it looks like you are using the tf.keras (from tensorflow) implementation rather than keras (from the keras-team/keras repo). In this case, as stated in the tf.keras guide :
When saving a model's weights, tf.keras defaults to the checkpoint
format. Pass save_format='h5' to use HDF5.
On the other hand, note that adding the callback ModelCheckpoint is, usually, roughly equivalent to call model.save(...) at the end of each epoch, so that's why you should expect three files to be saved (according to the checkpoint format).
The reason it's not doing so is because, by using the option save_weights_only=True, you are saving just the weights. Roughly equivalent to replace the call to model.save for model.save_weights at the end of each epoch. Hence, the only file that's being saved is the one with the weights.
From here, you can proceed in two different ways:
Storing just the weights
You need your model (the structure, let's say) to be loaded beforehand and then call model.load_weights instead of keras.models.load_model:
model = MyModel(...) # Your model definition as used in training
model.load_weights(file_checkpoint)
Note that in this case, you won't have problems with custom definitions (my_cost_MSE) since you are just loading model weights.
Storing the whole model
Another way to proceed is to store the whole model and load it accordingly:
cp_callback = k.callbacks.ModelCheckpoint(
checkpoint_dir,verbose=1,
save_weights_only=False
)
parallel_model.compile(
optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
loss=my_cost_MSE,
metrics=['accuracy']
)
model.fit(..., callbacks=[cp_callback])
Then you could load it by:
model = k.models.load_model(file_checkpoint, custom_objects={"my_cost_MSE": my_cost_MSE})
Note that in this latter case, you need to specify custom_objects since its definition is needed to deserialize the model.
keras has a save command. It saves all the details needed to rebuild the model.
(from the keras docs)
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns am identical compiled model
model = load_model('my_model.h5')
Related
I have tried tons of methods like onnx2keras, pytorch2keras and so on. But there would always be something wrong...
Since my model is not really complicated: just a ResNet18-encoder + Decoder with some skip-connections. I'm considering to simply transfer them one layer by another, from pytorch to Keras.
Before I try I'd like to ask if you have similar experience? I know there's set_weights method, but that's for keras-to-keras so nothing special. However, Keras is object-based model, so how can I assign name-based weights, e.g. 'encoder.bn1.bias', 'encoder.bn1.running_mean', 'encoder.bn1.running_var' to a BN? I don't want TF1.x solutions because all of my work is on TF2.x.
So In my opinion, it would be something like:
# 1. Save weights and names from pytorch model
weights_dict = torch_mode.static_dict()
# 2. Construct Keras model
keras_model = tf.keras.models.Model(...)
# 3. Now load weights for each layer in Keras model
for var_name, weight in weights_dict.items():
# Assign conv with weight with'encoder.conv1.weight'
# Assign BN with 'encoder.bn1.weight', 'encoder.bn1.bias', 'encoder.bn1.running_mean', 'encoder.bn1.running_var', 'encoder.bn1.num_batches_tracked'
But I don't know how... Look forward to your opinions!
Could you try pt2keras and see if it works?
link: https://github.com/JWLee89/pt2keras/
To install pt2keras, type the following in the terminal:
pip install -U pt2keras
Below is a simple example for converting resnet18.
import tensorflow as tf
from torchvision.models.resnet import resnet18
from pt2keras import Pt2Keras
if __name__ == '__main__':
input_shape = (1, 3, 224, 224)
# Grab model
model = resnet18(pretrained=False).eval()
# Create pt2keras object
converter = Pt2Keras()
# convert model
keras_model: tf.keras.Model = converter.convert(model, input_shape, strict=True)
# Save the model
keras_model.save('output_model.h5')
# Do whatever else that you want afterwards ...
I have attached the converted keras model visualized using netron:
Before I try I'd like to ask if you have similar experience? I know there's set_weights method, but that's for keras-to-keras so nothing special. However, Keras is object-based model, so how can I assign name-based weights, e.g. 'encoder.bn1.bias', 'encoder.bn1.running_mean', 'encoder.bn1.running_var' to a BN? I don't want TF1.x solutions because all of my work is on TF2.x.
Unfortunately, as far as I know, you cannot attach name-based weights to individual parameters in Keras like you can in PyTorch, since keras is layer-based. However, you can name the batch-norm layer, which I am guessing is not very useful to you.
According to keras.io:
Once the model is created, you can config the model with losses and
metrics with model.compile().
But this explanation does not provide enough information about what exactly compiling model does.
Configures the model for training. documentation
Personally, I wouldn't call it compile, because what it does has got nothing to do with compilation, in computer science terms, and this is very confusing/ overwhelming to think about machine learning and compilation at the same time.
Its just a method which does configuration:
It just sets the arguments you pass it: optimizer, loss function, metrics, eager execution. You can run it multiple times, it will just overwrite the settings you set previously.
My suggestion to developers of TensorFlow would be to rename it to configure in the short term, and perhaps in the future (not that important), move to having 1 setter (or use the factory/ builder pattern) for each configuration argument.
Heres the code for it:
base_layer.keras_api_gauge.get_cell('compile').set(True)
with self.distribute_strategy.scope():
if 'experimental_steps_per_execution' in kwargs:
logging.warn('The argument `steps_per_execution` is no longer '
'experimental. Pass `steps_per_execution` instead of '
'`experimental_steps_per_execution`.')
if not steps_per_execution:
steps_per_execution = kwargs.pop('experimental_steps_per_execution')
self._validate_compile(optimizer, metrics, **kwargs)
self._run_eagerly = run_eagerly
self.optimizer = self._get_optimizer(optimizer)
self.compiled_loss = compile_utils.LossesContainer(
loss, loss_weights, output_names=self.output_names)
self.compiled_metrics = compile_utils.MetricsContainer(
metrics, weighted_metrics, output_names=self.output_names)
self._configure_steps_per_execution(steps_per_execution or 1)
# Initializes attrs that are reset each time `compile` is called.
self._reset_compile_cache()
self._is_compiled = True
self.loss = loss or {} # Backwards compat.
model.compile is related to training your model. Actually, your weights need to optimize and this function can optimize them. In a way that your accuracy make increases. This was just one of the input parameters called 'optimizer'.
model.compile(
optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics='acc'
)
These are the main inputs. Also you can find more details in TensorFlow documentation in link below:
https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile
I want to save the best model and then load it during the test. So I used the following method:
def train():
#training steps …
if acc > best_acc:
best_state = model.state_dict()
best_acc = acc
return best_state
Then, in the main function I used:
model.load_state_dict(best_state)
to resume the model.
However, I found that best_state is always the same as the last state during training, not the best state. Is anyone know the reason and how to avoid it?
By the way, I know I can use torch.save(the_model.state_dict(), PATH) and then load the model by
the_model.load_state_dict(torch.load(PATH)).
However, I don’t want to save the parameters to file as train and test functions are in one file.
model.state_dict() is OrderedDict
from collections import OrderedDict
You can use:
from copy import deepcopy
To fix the problem
Instead:
best_state = model.state_dict()
You should use:
best_state = copy.deepcopy(model.state_dict())
Deep (not shallow) copy makes the mutable OrderedDict instance not to mutate best_state as it goes.
You may check my other answer on saving the state dict in PyTorch.
When you are saving the state of the model you should save the following things in the network
1) Optimizer state and
2) Model's state dict
You can define one method in your class model as following
def save_state(state,filename):
torch.save(state,filename)
'''
When you are saving the state do as follows:
'''
Model model //for example
model.save_state({'state_dict':model.state_dict(), 'optimizer': optimizer.state_dict()})
The saved model will be stored as model.pth.tar (for an example)
Now during loading do the following steps,
checkpoint = torch.load('model.pth.tar')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
Hope this will help you.
How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:
def getmodel():
model = Sequential()
mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape)
model.add(preproc_layer)
# Build the remaining model, perhaps set weights,
...
return model
Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:
m = getmodel()
mean, std = get_mean_std(..)
graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")
K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)
However the set_value fails with
AttributeError: 'Tensor' object has no attribute 'assign'
So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.
Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:
...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'
Update 1:
I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:
# Regular model, trained as usual
model = ...
# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)
# Prepend the preprocessing model to the regular model
full_model = Model(inputs=[preproc_model.input],
outputs=[model(preproc_model.output)])
# Save the complete model to disk
full_model.save('full_model.hdf5')
This seems to work until the save() call, which fails with the same deep stack trace as above.
Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.
So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?
Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.
Update 2:
So I found that the deep stack trace could be avoided by doing
def foo(x):
bar = K.variable(baz, name="baz")
return x - bar
So defining bar inside the function instead of capturing from the outside scope.
I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.
Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model.
This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:
Build and train a model
Save it to disk
Load it, prepend a preprocessing model
Export the stacked model to disk as a frozen pb file
Load the frozen pb from disk
Apply it on some unseen data
I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:
Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
Train the model, build an identical model but this time with the proper values for mean/std.
Transfer the weights
Export and freeze the model to pb
All this now fully works as expected. Small overhead on training but negligible for me.
Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.
Will accept #Daniel's answer as it got me going in the right direction.
Related question:
Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
When creating a variable, you must give it the "value", not the shape:
mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")
Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.
For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:
def myFunc(x):
#reshape x in a way it's compatible with the tensors mean and std:
x = K.reshape(x,(-1,1,1,3))
#-1 is like a wildcard, it will be the value that matches the rest of the given shape.
#I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor
result = (x - mean_tensor) / (std_tensor + K.epsilon())
#now shape it back to the same shape it was before (which I don't know)
return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
#-1 is still necessary, it's the batch size
Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)
model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))
After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))
If you want to include everything, including the preprocessing inside the function, ok too:
def myFunc(x):
mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)
std_tensor = K.std(x,axis=[0,1,2])
x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.
result = (x - mean_tensor) / (std_tensor + K.epsilon())
return K.reshape(result,(-1,width,height,3))
Now, all this is extra calculation in your model and will consume processing.
It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).
I have a tensorflow contrib.learn.DNNRegressor that I have trained as part of the following code snippet:
regressor = tf.contrib.learn.DNNRegressor(feature_columns=fc,
hidden_units=hu_array,
optimizer=tf.train.AdamOptimizer(
learning_rate=0.001,
),
enable_centered_bias=False,
activation_fn=tf.tanh,
model_dir="./models/my_model/",
)
regressor.fit(x=training_features, y=training_labels, steps=10000)
The trained network performs quite well, and I'd like to use it as a part of some other code, on another machine. I have tried copying over the models/my_model directory, and constructing a new DNNRegressor pointing just at the model_dir, but it requires that I supply feature_columns and hidden_units definitions. Shouldn't that information be available via the snapshots stored in model_dir? Is there a better way to save/recover a trained model which is performing well, to be used as a predictor, without having to separately save the feature_columns and hidden_units?
I came up with something workable- not ideal, but it gets the job done. If anyone has a better idea, I am all ears.
I converted my kwargs for DNNRegressor into a dict, and used the ** operator. Then I was able to pickle the kwargs dict, and reconstruct the DNNRegressor from that. E.g:
reg_args = {'feature_columns': fc, 'hidden_units': hu_array, ...}
regressor = tf.contrib.learn.DNNRegressor(**reg_args)
pickle.dump(reg_args, open('reg_args.pkl', 'wb'))
Later on, I reconstruct via:
reg_args = pickle.load(open('reg_args.pkl', 'rb'))
# On another machine and so my model dir path changed:
reg_args['model_dir'] = NEW_MODEL_DIR
regressor = tf.contrib.learn.DNNRegressor(**reg_args)
It worked well. I'm sure there must be a better way but for now if someone is trying to figure out a workaround for tf.contrib.learn, this is a solution.
When training
You call DNNRegressor(..., model_dir) and then call the fit() and evaluate() method.
When testing
You call DNNRegressor(..., model_dir) and then can call predict() methods. Your model will find a trained model in the model_dir and will load the trained model params.
Reference
Issue #3340 of TF