Is there a way to reshape a TF tensor inside of a custom Keras loss function? I'm defining this custom loss function for a convolutional neural network?
def custom_loss(x, x_hat):
"""
Custom loss function for training background extraction networks (autoencoders)
"""
#flatten x, x_hat before computing mean, median
shape = x_hat.get_shape().as_list()
batch_size = shape[0]
image_size = np.prod(shape[1:])
x = tf.reshape(x, [batch_size, image_size])
x_hat = tf.reshape(x_hat, [batch_size, image_size])
B0 = reduce_median(tf.transpose(x_hat))
# I divide by sigma in the next step. So I add a small float32 to F0
# so as to prevent sigma from becoming 0 or Nan.
F0 = tf.abs(x_hat - B0) + 1e-10
sigma = tf.reduce_mean(tf.sqrt(F0 / 0.5), axis=0)
background_term = tf.reduce_mean(F0 / sigma, axis=-1)
bce = binary_crossentropy(x, x_hat)
loss = bce + background_term
return loss
In addition to computing the standard binary_crossentropy an additional background_term is added into the loss. This term incentives the network to predict images close the median of a batch. Since the outputs of the CNN are 2d and reduce_median works better with 1d arrays I have to reshape the images into 1d arrays. When I try to train this network I get the error
Traceback (most recent call last):
File "stackoverflow.py", line 162, in <module>
autoencoder = build_conv_autoencoder(lambda_W, input_shape, num_filters, optimizer, custom_loss)
File "stackoverflow.py", line 136, in build_conv_autoencoder
autoencoder.compile(optimizer, loss, metrics=[mean_squared_error])
File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 594, in compile
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 667, in compile
sample_weight, mask)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 318, in weighted
score_array = fn(y_true, y_pred)
File "stackoverflow.py", line 26, in custom_loss
x = tf.reshape(x, [batch_size, image_size])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2448, in reshape
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 494, in apply_op
raise err
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 491, in apply_op
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 710, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 441, in make_tensor_proto
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 441, in <listcomp>
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got None
It seems like Keras is calling custom_loss before the TensorFlow graph is instantiated. This makes batch_size None instead of the actual value. Is there a proper way to reshape tensors inside loss functions to this error is avoided? You can look at the full code here .
Is there a proper way to reshape tensors...
If you are using Keras you should use the K.reshape(x,shape) method, which is a wrapper for tf.reshape(x,shape) as we can see in the docs.
I also notice you are using get_shape() to obtain your tensor shape, when on Keras you can do this with K.int_shape(x) as also mentioned in the docs, like this:
shape = K.int_shape(x_hat)
Besides that there are several other operations you do directly calling your Tensorflow import, instead of the Keras Backend (like tf.abs(), tf.reduce_mean(), tf.transpose(), etc.). You should consider using its corresponding wrappers in the keras backend to have uniform notation and guarantee a more regular behaviour. Also, by using the Keras backend you are giving your program compatibility with both Theano and Tensorflow, so it is a big plus you should consider.
Additionally, some TypeError may appear when working with tensors with undefined dimension(s). Please take a look at this question where they explain about reshaping tensors with undefined dimensions. Also, for its equivalent in Keras, check this other question, where in an answer I explain how to achieve that using Keras with Tensorflow as backend.
...Now regarding your code. Basically, as you have some undefined dimensions, you can pass the value -1 to have it infer the shape no matter what size it could be (it is explained in the first linked question, but can also be seen in the docs). Something like:
x = tf.reshape(x, [-1, image_size])
Or using Keras backend:
x = K.reshape(x, [-1, image_size])
Related
When using custom estimators in Tensorflow 2, when the model contains BatchNorm or Dropout layers, tf fails while building the graph with the following error. It works just fine when I comment out the Dropout and BatchNorm layers.
The model I use is a simple CNN model with two conv blocks and dense layer at the end:
def build_conv_block(x: Model, filter_map_count: int, name: str):
x = Conv2D(filter_map_count, (3, 3), name=f'{name}_conv_2d')(x)
x = BatchNormalization(name=f'{name}_bn')(x) <------- Error when not commented out
x = ReLU(name=f'{name}_relu')(x)
x = MaxPool2D((2, 2), name=f'{name}_max_pool_2d')(x)
x = Dropout(0.25, name=f'{name}_dropout')(x) <------- Error when not commented out
return x
def get_model(params):
input_image = Input(shape=params.input_shape)
x = build_conv_block(input_image, filter_map_count=64, name='layer_1')
x = build_conv_block(x, filter_map_count=128, name='layer_2')
x = Flatten(name='flatten_conv')(x)
output_pred = Dense(10, activation='softmax', name='output')(x)
model = Model(inputs=input_image, outputs=output_pred)
model.optimizer = Adam(learning_rate=params.learning_rate)
return model
I have a standard train_op in the model_fn that takes mnist images and labels as input and the class as output:
# Calculate gradients
with tf.GradientTape() as tape:
y_pred = model(features, training=training)
loss = tf.losses.categorical_crossentropy(labels, y_pred)
if mode == tf.estimator.ModeKeys.TRAIN:
gradients = tape.gradient(loss, model.trainable_variables)
train_op = model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
Here's the traceback of the error I get:
Traceback (most recent call last):
File "F:/Projects/python/my_project/train.py", line 38, in <module>
tf.estimator.train_and_evaluate(estimator, train_spec=train_spec, eval_spec=eval_spec)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 473, in train_and_evaluate
return executor.run()
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 613, in run
return self.run_local()
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1160, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1190, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1148, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "F:\Projects\python\my_project\model.py", line 62, in model_fn
gradients = tape.gradient(loss, model.trainable_variables)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 1014, in gradient
unconnected_gradients=unconnected_gradients)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\eager\imperative_grad.py", line 76, in imperative_grad
compat.as_str(unconnected_gradients.value))
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 138, in _gradient_function
return grad_fn(mock_op, *out_grads)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\cond_v2.py", line 120, in _IfGrad
true_graph, grads, util.unique_grad_fn_name(true_graph.name))
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\cond_v2.py", line 395, in _create_grad_func
func_graph=_CondGradFuncGraph(name, func_graph))
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\framework\func_graph.py", line 915, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\cond_v2.py", line 394, in <lambda>
lambda: _grad_fn(func_graph, grads), [], {},
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\cond_v2.py", line 373, in _grad_fn
src_graph=func_graph)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\gradients_util.py", line 550, in _GradientsHelper
gradient_uid)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\ops\gradients_util.py", line 175, in _DefaultGradYs
constant_op.constant(1, dtype=y.dtype, name="grad_ys_%d" % i)))
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\framework\constant_op.py", line 227, in constant
allow_broadcast=True)
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\framework\constant_op.py", line 265, in _constant_impl
allow_broadcast=allow_broadcast))
File "F:\Python\envs\tf2\lib\site-packages\tensorflow_core\python\framework\tensor_util.py", line 484, in make_tensor_proto
(dtype, nparray.dtype, values))
TypeError: Incompatible types: <dtype: 'variant'> vs. int32. Value is 1
It looks similar to the error mentioned in TF Issue #31894, but it doesn't seem to solve this problem. The TypeError does not tell much about where and why the error is happening and directly googling it does not help.
Although it may not be too obvious from the TypeError variant vs int32, if we carefully check the logs, we can see that the error occurs when finding gradients:
File "F:\Projects\python\my_project\model.py", line 62, in model_fn
gradients = tape.gradient(loss, model.trainable_variables)
Also, it should be noted that we get the same error even if one of them is present. So, if we try and analyze the common attributes in BatchNormalization and Dropout layer, both may seem to not come under the core layers, but when we look carefully, only those two layers in the model have a different train/test phase i.e. dropout doesn't zero out the values in test phase and batch norm uses a moving mean and variance during test phase.
Now the problem is narrowed down to using any layer that has a different train/test phase. This happens because tensorflow identifies if training mode is on or not using training parameter passed to the model.
This problem can be solved by using
y_pred = model(features, training=True)
when finding the gradients i.e. for the training phase and by using
y_pred = model(features, training=False)
otherwise i.e. for predict and eval phases.
Linked: Errors where moving mean is not updating is also reported, which can be solved by adding the same attribute.
I am attempting to write a CNN in python using tensorflow 1.13.1. For some reason, even after simplifying my model to only a single affine layer I am getting a dimensions error. Here is the relevant code:
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, X_SHAPE[1], X_SHAPE[2], 1]) # X_SHAPE is the shape of the input image types I am workig with
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
def my_model(X, y, is_training):
output = X
output = tf.reshape(output, [-1, output.shape[1] * output.shape[2] *
output.shape[3]])
output = tf.layers.dense(output, 2) # makes the error
output = tf.contrib.layers.batch_norm(output)
return output
y_out = my_model(X, y, is_training)
total_loss = tf.losses.softmax_cross_entropy(tf.one_hot(y, 2), logits=y_out)
mean_loss = tf.reduce_mean(total_loss)
optimizer = tf.train.RMSPropOptimizer(1e-3)
# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = optimizer.minimize(mean_loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print('Training')
run_model(sess,y_out,mean_loss,X_train,y_train,8,64,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)
The error I get is the following:
Traceback (most recent call last):
File "C:/Users/t8484200/Documents/fanta/dicom_snippet.py", line 189, in <module>
y_out = my_model(X, y, is_training)
File "C:/Users/t8484200/Documents/fanta/dicom_snippet.py", line 181, in my_model
output = tf.layers.dense(output, 2)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\core.py", line 188, in dense
return layer.apply(inputs)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1227, in apply
return self.__call__(inputs, *args, **kwargs)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 530, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 554, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\layers\core.py", line 975, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5629, in mat_mul
name=name)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 350, in _apply_op_helper
g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5713, in _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
File "C:\Users\t8484200\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5649, in _assert_same_graph
original_item))
ValueError: Tensor("dense/kernel:0", shape=(262144, 2), dtype=float32_ref) must be from the same graph as Tensor("Reshape:0", shape=(?, 262144), dtype=float32).
But dimensions seem to be fine, so I would really appreciate your help on this one!
You are not getting a dimension error. The dimensions are just mentioned as part of the information about the two relevant tensors. I ran this code with tf 1.13.1 and it works for me.
I was able to get the same error for example when I replace the first 4 lines with: (same lines in different order)
X = tf.placeholder(tf.float32, [None, X_SHAPE[1], X_SHAPE[2], 1]) # X_SHAPE is the shape of the input image types I am workig with
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
tf.reset_default_graph()
The reason is that X is created in the existing graph, and then a new graph is created by the reset command. Then the dense tensor is created in the new graph, but using X from the old graph which is not allowed. (It looks like the reshape command is in the old graph, maybe because it does not create new variables.)
So you need to check where you are resetting the graph compared to where you are defining your placeholders.
I have stored a Tensorflow model with the files .meta, .index, checkpoint, and .data-0001. I restore my graph and model using:
model = tf.train.import_meta_graph("models/model.meta")
model.restore(sess, tf.train.latest_checkpoint("models/"))
I restored some variables like weights and bias but I also need to restore the loss function. My model is using nce_loss.
Essentially, I want to get the gradient for my loss function given a certain input where I don't have to redefine the loss variables just call the operation from the restored version. So:
loss = graph.get_operation_by_name("loss")
grads = tf.gradients(loss,loss.inputs)
And here I get the following error message:
File "/tmp/fgsm.py", line 114, in main
grads = tf.gradients(loss,loss.inputs)
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 675, in _GradientsHelper
ys = ops.convert_n_to_tensor_or_indexed_slices(ys, name="y")
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1377, in convert_n_to_tensor_or_indexed_slices
values=values, dtype=dtype, name=name, as_ref=False)
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1348, in internal_convert_n_to_tensor_or_indexed_slices
value, dtype=dtype, name=n, as_ref=as_ref))
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1307, in internal_convert_to_tensor_or_indexed_slices
value, dtype=dtype, name=name, as_ref=as_ref)
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/tmp/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6168, in _operation_conversion_error
name, as_ref))
TypeError: Can't convert Operation 'loss' to Tensor (target dtype=None, name='y_0', as_ref=False)
What am I doing wrong here?
Edit:
so by switching to
loss = graph.get_tensor_by_name("loss:0")
I can successfully get my loss tensor. Now how do I get the gradient for the input given the restored loss function?
nce_loss has an "input" parameter and I want to calculate the gradient given the loss function and the input parameter. How can I use tf.gradients for this? When I do tf.gradients(loss,loss.inputs) I get an error
AttributeError: 'Tensor' object has no attribute 'inputs'
When you are retrieving tensors from tensorflow, you must index them. In your code:
loss = graph.get_operation_by_name("loss")
grads = tf.gradients(loss,loss.inputs)
As the error states you are retrieving the operation of loss not its output tensor. To retrieve its tensor you can index the operation like so:
loss = graph.get_operation_by_name("loss:0")
grads = tf.gradients(loss,loss.inputs)
I am trying to implement a custom loss function in Keras with TF backend based on the Laplacian of two images.
def blur_loss(y_true, y_pred):
#weighting of blur loss
alpha = 1
mae = losses.mean_absolute_error(y_true, y_pred)
lapKernel = K.constant([0, 1, 0, 1, -4, 1, 0, 1, 0],shape = [3, 3])
trueLap = K.conv2d(y_true, lapKernel)
predLap = K.conv2d(y_pred, lapKernel)
trueBlur = K.var(trueLap)
predBlur = K.var(predLap)
blurLoss = alpha * K.abs(trueBlur - predBlur)
loss = (1-alpha) * mae + alpha * blurLoss
return loss
When I try to compile the model I get this error
Traceback (most recent call last):
File "kitti_train.py", line 65, in <module>
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=[blur_loss])
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/keras/engine/training.py", line 924, in compile
handle_metrics(output_metrics)
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/keras/engine/training.py", line 921, in handle_metrics
mask=masks[i])
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/keras/engine/training.py", line 450, in weighted
score_array = fn(y_true, y_pred)
File "/home/ubuntu/prednet/blur_loss.py", line 14, in blur_loss
trueLap = K.conv2d(y_true, lapKernel)
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 3164, in conv2d
data_format='NHWC')
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 655, in convolution
num_spatial_dims, strides, dilation_rate)
File "/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 483, in _get_strides_and_dilation_rate
(len(dilation_rate), num_spatial_dims))
ValueError: len(dilation_rate)=2 but should be 0
After reading other questions, my understanding is that this problem stems from the compilation using placeholder tensors for y_true and y_pred. I've tried checking if the inputs are placeholders and replacing them with zero tensors, but this gives me other errors.
How do I use a convolution (the image processing function, not a layer) in my loss function without getting these errors?
The problem here was a misunderstanding of the conv2d function which is not simply a 2-dimensional convolution. It is a batched 2-d convolution of multiple channels. So while you might expect a *2d function to accept 2-dimensional tensors, the input should actually 4 dimensions (batch_size, height, width, channels) and the filter should also be 4 dimensions (filter_height, filter_width, input_channels, output_channels). Details can be found in the TF docs
I'm attempting to load 3D images and their labels from a numpy array to TensorFlow records, then read them from a queue while training my network. The code for conversion is based on the conversion for TensorFlow's Inception model.
Each image has a different height, width, and depth value, so when reshaping the array I need to know these values. However, I'm getting an error when I try to use set_shape, as somewhere down the line int() is being used, and it doesn't accept Tensor values.
reader = tf.TFRecordReader()
_, value = reader.read(filename_queue)
# Features in Example proto
feature_map = {
'height': tf.VarLenFeature(dtype=tf.int64),
'width': tf.VarLenFeature(dtype=tf.int64),
'depth': tf.VarLenFeature(dtype=tf.int64),
'label': tf.VarLenFeature(dtype=tf.int64),
'image_raw': tf.VarLenFeature(dtype=tf.string)
}
features = tf.parse_single_example(value, feature_map)
result.label = tf.cast(features['label'].values[0], dtype=tf.int32)
result.height = tf.cast(features['height'].values[0], dtype=tf.int32)
result.width = tf.cast(features['width'].values[0], dtype=tf.int32)
result.depth = tf.cast(features['depth'].values[0], dtype=tf.int32)
image = tf.decode_raw(features['image_raw'].values[0], tf.int16)
image = tf.reshape(image, [result.depth, result.height, result.width])
image = tf.cast(tf.transpose(image, [1, 2, 0]), tf.float32)
result.image = tf.expand_dims(image, 3)
result.image.set_shape([result.height, result.width, result.depth, 1])
result.label = tf.expand_dims(result.label, 0)
result.label.set_shape([1])
Error trace:
Traceback (most recent call last):
File "dsb17_multi_gpu_train.py", line 227, in <module>
tf.app.run()
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "dsb17_multi_gpu_train.py", line 223, in main
train()
File "dsb17_multi_gpu_train.py", line 129, in train
loss = tower_loss(scope)
File "dsb17_multi_gpu_train.py", line 34, in tower_loss
images, labels = dsb17.inputs(False)
File "/home/ubuntu/dsb17/model/dsb17.py", line 104, in inputs
batch_size=FLAGS.batch_size)
File "/home/ubuntu/dsb17/model/dsb17_input.py", line 161, in inputs
read_input = read_data(filename_queue)
File "/home/ubuntu/dsb17/model/dsb17_input.py", line 62, in read_data
result.image.set_shape([result.height, result.width, result.depth, 1])
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 425, in set_shape
self._shape = self._shape.merge_with(shape)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 573, in merge_with
other = as_shape(other)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 821, in as_shape
return TensorShape(shape)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 457, in __init__
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 457, in <listcomp>
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 378, in as_dimension
return Dimension(value)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 33, in __init__
self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'
I originally thought this was because the Tensor did not have a value until it was evaluated in a session, but loss is being evaluated in a sess.run(), which is what requires the call to tower_loss(). My training code is identical in structure to cifar10_multi_gpu_train.py, and the overall file structure is also very similar.
The question then is: Is it actually being evaluated in a session, or is the graph not built yet? Do I need to somehow extract a value from the zero-dimensional Tensor? More generally, what am I misunderstanding about Tensors and sessions that is making my code not work as I expect it to?
According to TensorFlow's tf.cast docs, tf.cast returns a Tensor.
Your error says that when using set_shape(), you cannot have a Tensor as an argument, but rather an int.
You may try to force Tensorflow to evaluate the cast. This simple example works for me:
a = tf.constant(2.0)
b = tf.constant([1.0,2.0])
b.set_shape(a.eval())
Without the call to eval(), I get the same error as you.
In general you cannot do this using tf.Tensor.set_shape(), because that method expects a static shape. The tensors result.height, result.width, result.depth represent values read from a file, and at runtime they could evaluate to many different integers (depending on what is in your file), so there is no single int that you can pass for them. In that case, the best you can currently do is represent those dimensions as being statically unknown, using None for the unknown dimensions:
result.image.set_shape([None, None, None, 1])
Note that this statement shouldn't change anything, because TensorFlow should already be able to infer that the shape is 4-D with size 1 in the last dimension.
For more details about static and dynamic shapes, see this answer.
Actually you can pass the image shape to the reshape function but you need one more step. Just change the line:
image = tf.reshape(image, [result.depth, result.height, result.width])
to:
image_shape = tf.stack([result.depth, result.height, result.width])
image = tf.reshape(image, image_shape)