Training an image classifier using .fit_generator() or .fit() and passing a dictionary to class_weight= as an argument.
I never got errors in TF1.x but in 2.1 I get the following output when starting training:
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
What does it mean to coerce something from ... to ['...']?
The source for this warning on tensorflow's repo is here, comments placed are:
Attempt to coerce sample_weight_modes to the target structure. This implicitly depends on the fact that Model flattens outputs for its internal representation.
This seems like a bogus message. I get the same warning message after upgrading to TensorFlow 2.1, but I do not use any class weights or sample weights at all. I do use a generator that returns a tuple like this:
return inputs, targets
And now I just changed it to the following to make the warning go away:
return inputs, targets, [None]
I don't know if this is relevant, but my model uses 3 inputs, so my inputs variable is actually a list of 3 numpy arrays. targets is just a single numpy array.
In any case, it's just a warning. The training works fine either way.
Edit for TensorFlow 2.2:
This bug seems to have been fixed in TensorFlow 2.2, which is great. However the fix above will fail in TF 2.2, because it will try to get the shape of the sample weights, which will obviously fail with AttributeError: 'NoneType' object has no attribute 'shape'. So undo the above fix when upgrading to 2.2.
I believe this is a bug with tensorflow that will happen when you call model.compile() with default parameter sample_weight_mode=None and then call model.fit() with specified sample_weight or class_weight.
From the tensorflow repos:
fit() eventually calls _process_training_inputs()
_process_training_inputs() sets sample_weight_modes = [None] based on model.sample_weight_mode = None and then creates a DataAdapter with sample_weight_modes = [None]
the DataAdapter calls broadcast_sample_weight_modes() with sample_weight_modes = [None] during initialization
broadcast_sample_weight_modes() seems to expect sample_weight_modes = None but receives [None]
it asserts that [None] is a different structure from sample_weight / class_weight, overwrites it back to None by fitting to the structure of sample_weight / class_weight and outputs a warning
Warning aside this has no effect on fit() as sample_weight_modes in the DataAdapter is set back to None.
Note that tensorflow documentation states that sample_weight must be a numpy-array. If you call fit() with sample_weight.tolist() instead, you will not get a warning but sample_weight is silently overwritten to None when _process_numpy_inputs() is called in preprocessing and receives an input of length greater than one.
I have taken your Gist and installed Tensorflow 2.0, instead of TFA and it worked without any such Warning.
Here is the Gist of the complete code. Code for installing the Tensorflow is shown below:
!pip install tensorflow==2.0
Screenshot of the successful execution is shown below:
Update: This bug is fixed in Tensorflow Version 2.2.
instead of providing a dictionary
weights = {'0': 42.0, '1': 1.0}
i tried a list
weights = [42.0, 1.0]
and the warning disappeared.
Related
I have this line of code (for Tensorflow 1.0):
tf.placeholder(tf.float32, [None, n, p])
n and p are just random numbers.
How to translate this line of code into tf.keras.input for Tensorflow 2.0?
Thanks a lot in advance!
From comments
Issue was resolved after creating two different virtual environments
for TF 2.x and 1.x.
For more information you can refer detail answer mentioned by Denver
here.
(paraphrased from lindo)
From Tensorflow documentation
In TF 2.x, you can just pass tensors directly into ops and layers. If you want to explicitly set up your inputs, you can see Keras functional API on how to use tf.keras.Input to replace tf.compat.v1.placeholder. tf.function arguments also do the job of tf.compat.v1.placeholder. For more details please read Better performance with tf.function.
Input produces a symbolic tensor-like object (i.e. a placeholder). This can be used with lower-level TensorFlow ops that take tensors as inputs as shown below
x = Input(shape=(32,))
y = tf.square(x) # This op will be treated like a layer
model = Model(x, y)
For more details you can refer Migrate your TensorFlow 1 code to TensorFlow 2
When I'm training my model with TensorFlow 2.3, I want to visualize some intermediate tensors calculated using the weight in the computation graph of my customized tf.keras.layers.Layer.
So I use tf.summary.image() to record these tensors and visualize them as images like this:
class CustomizedLayer(tf.keras.layers.Layer):
def call(self, inputs, training=None):
# ... some code ...
tf.summary.image(name="some_weight_map", data=some_weight_map)
# ... some code ...
But in TensorBoard, no matter how many steps passed, there is only one image of step 0 shown.
And I tried to set the parameter step of tf.summary.image() to the value obtained from tf.summary.experimental.get_step():
tf.summary.image(name="weight_map", data=weight_map, step=tf.summary.experimental.get_step())
And update the step by calling tf.summary.experimental.set_step from a customized Callback using a tf.Variable like codes shown below:
class SummaryCallback(tf.keras.callbacks.Callback):
def __init__(self, step_per_epoch):
super().__init__()
self.global_step = tf.Variable(initial_value=0, trainable=False, name="global_step")
self.global_epoch = 0
self.step_per_epoch = step_per_epoch
tf.summary.experimental.set_step(self.global_step)
def on_batch_end(self, batch, logs=None):
self.global_step = batch + self.step_per_epoch * self.global_epoch
tf.summary.experimental.set_step(self.global_step)
# whether the line above is commented, calling tf.summary.experimental.get_step() in computation graph code always returns 0.
# tf.print(self.global_step)
def on_epoch_end(self, epoch, logs=None):
self.global_epoch += 1
This Callback's instance is passed in the argument callbacks in model.fit() function.
But the value tf.summary.experimental.get_step() returned is still 0.
The TensorFlow document of "tf.summary.experimental.set_step()" says:
when using this with #tf.functions, the step value will be captured at the time the function is traced, so changes to the step outside the function will not be reflected inside the function unless using a tf.Variable step.
Accroding to the document, I am already using a Variable to store the steps, but it's changes are still not reflected inside the function (or keras.Model).
Note: My code produces expected results in TensorFlow 1.x with just a simple line of tf.summary.image() before I migrate it to TensorFlow 2.
So I want to know if my approach is wrong in TensorFlow 2?
In TF2, how can I get training steps inside the computation graph?
Or there is other solution to summarize tensors (as scalar, image, etc.) inside a model in TensorFlow 2?
I found this issue has been reported on Github repository of Tensorflow: https://github.com/tensorflow/tensorflow/issues/43568
This is caused by using tf.summary in model while tf.keras.callbacks.TensorBoard callback is also enabled, and the step will always be zero. The issue reporter gives a temporary solution.
To fix it, inherit the tf.keras.callbacks.TensorBoard class and overwrite the on_train_begin method and on_test_begin method like this:
class TensorBoardFix(tf.keras.callbacks.TensorBoard):
"""
This fixes incorrect step values when using the TensorBoard callback with custom summary ops
"""
def on_train_begin(self, *args, **kwargs):
super(TensorBoardFix, self).on_train_begin(*args, **kwargs)
tf.summary.experimental.set_step(self._train_step)
def on_test_begin(self, *args, **kwargs):
super(TensorBoardFix, self).on_test_begin(*args, **kwargs)
tf.summary.experimental.set_step(self._val_step)
And use this fixed callback class in model.fit():
tensorboard_callback = TensorBoardFix(log_dir=log_dir, histogram_freq=1, write_graph=True, update_freq=1)
model.fit(dataset, epochs=200, callbacks=[tensorboard_callback])
This solve my problem and now I can get proper step inside my model by calling tf.summary.experimental.get_step().
(This issue may be fixed in later version of TensorFlow)
I am sub-classing tensorflow.keras.Model to implement a certain model. Expected behavior:
Training (fitting) time: returns a list of tensors including the final output and auxiliary output;
Inferring (predicting) time: returns a single output tensor.
And the code is:
class SomeModel(tensorflow.keras.Model):
# ......
def call(self, x, training=True):
# ......
return [aux1, aux2, net] if training else net
This is how i use it:
model=SomeModel(...)
model.compile(...,
loss=keras.losses.SparseCategoricalCrossentropy(),
loss_weights=[0.4, 0.4, 1],...)
# ......
model.fit(data, [labels, labels, labels])
And got:
AssertionError: in converted code:
ipython-input-33-862e679ab098:140 call *
`return [aux1, aux2, net] if training else net`
...\tensorflow_core\python\autograph\operators\control_flow.py:918 if_stmt
Then the problem is that the if statement is converted into the calculation graph and this would of course cause the problem. I found the whole stack trace is long and useless so it's not included here.
So, is there any way to make TensorFlow generate different graph based on training or not?
Which tensorflow version are you using? You can overwrite behaviour in the .fit, .predict and .evaluate methods in Tensorflow 2.2, which would generate different graphs for these methods (I assume) and potentially work for your use-case.
The problems with earlier versions is that subclassed models get created by tracing the call method. This means Python conditionals become Tensorflow conditionals and face several limitations during graph creation and execution.
First, both branches (if-else) have to be defined, and regarding python collections (eg. lists), the branches have to have the same structure (eg. number of elements). You can read about the limitations and effects of Autograph here and here.
(Also, a conditional may not get evaluated at every run, if the condition is based on a Python variable and not a tensor.)
I try to generate image summaries to be displayed in tensorboard. This worked in an eager execution environment.
Now, I try to use the eval_metric_ops returning a dict of operations to compute metrics during execution of the computation graph. For this, I rely on tf.py_func to do my metrics computations and plots. This function signature is
tf.py_func(
func,
inp,
Tout,
stateful=True,
name=None
)
Where Tout is the returned type of the function. I managed to make it work for simple metrics (float values). As far as I understand, I need to define a string returned type for my summaries which will be parsed after to rebuild my images.
Here is the blocking point.
I build my Summary with:
summ = tf.Summary(value=[
tf.Summary.Value(
tag=metric_name,
image=tf.Summary.Image(
encoded_image_string=encode_image_array_as_png_str(
self._last_metrics[metric_name])))])
Returning it as is, I get: W tensorflow/core/framework/op_kernel.cc:1306] Unimplemented: Unsupported object type Summary
Returning str(summ) gives: WARNING:tensorflow:Skipping summary for ..., cannot parse string to Summary.
I also tried to build it with:
tf.summary.image(
name,
tensor,
max_outputs=3,
collections=None,
family=None
)
But this gives: W tensorflow/core/framework/op_kernel.cc:1306] Unimplemented: Unsupported object type Tensor
Do you know how to serialize a Summary to a string/bytes iterable/whatever can be interpreted as a string Tensor, in a way that it can be parsed back to an image Summary after that.
Thanks.
Shame on me.
As many other classes in tensorflow, Summary is defined by a Protocol Buffer message and thus, implement the SerializeToString().
Hence, just returning summ.SerializeToString() works!
I was following this tensorflow tutorial for gradient clipping while working with a multilayer perceptron.
grads_and_vars = optimizer.compute_gradients(cross_entropy_loss, trainable_variable)
capped_grads_and_vars = [(tf.clip_by_global_norm(gv[0],5), gv[1]) for gv in grads_and_vars]
optimizer.apply_gradients(capped_grads_and_vars)
tensorflow shows the following error,
in clip_by_global_norm raise TypeError("t_list should be a sequence")
trainable_variable is a list which I created while creating the model. assume I have a trainable variable(tf.Variable), I add this variable to trainable_variable list by the following command.
trainable_variable.append(var) #where ver is a trainable variable in tensorflow
The key point of this type of problem is, trainable_variable list may contain multiple tensors who are not initialized or used in the graph. make sure you contain all the tensor safely in the trainable_variable list. Sometimes even they might contain NaN for gradient computation. This type of error may also introduce for unnatural value.