Softmax Output Layer. Which dimension? - python

I am having a question regarding Neuronal Nets used for image segmentation. I am using a 3D Implementation of Deeplab that can be found here
I am using softmax, so the output layer is the following:
elif self.last_activation.lower() == 'softmax':
output = nn.Softmax()(output)
No dimension is defined, so I want to define it manually. But I am not sure which dimension I need tó set. The dimension of the output tensor is the following:
[batch_size, num_classes, width, height, depth]
So I would think that dim=1 would be correct. Is that correct?

Indeed it should be 1 as you want this axis to be summed to 1.
Be careful if you need to train your network with a crossentropyloss as this latter already include a softmax.


shape of an output tensor after convolutional filter on a colour image

I find it difficult to understand a notion about tensors.
For VGG (, we start from a batch of colour images (none,224,224,3) and apply 64 2D convolutional filters.
At the output we obtain a tensor of (none,224,224,64), we can see this by making a summary of the model.
However, a filter must treat all 3 colours and my intuition tells me that I should have an output tensor of (none,224,224,3,64).
Could one explain to me why my reasoning is wrong?
Thank you for your explanations.
All filters have shape
(kernel_height, kernel_width, input_channels)
When they pass on your input data with 'SAME' padding, the output shape result is
(input_height, input_width)
And that, for all filters, so
(input_height, input_width, n_filters)

2D convolution with reparameterization using just tf.nn.convolution

I want to do something like what tfp.layers.Conv2DReparameterization does but simpler - no priors etc.
Given an augmented input x of shape [num_particles, batch, in_height, in_width, in_channels] and a filter of mean f_mean and standard deviation f_std shape [filter_height, filter_width, in_channels, out_channels] which are trainable variables, I use the reparameterization trick to get filter samples:
filter_samples = f_mean + f_std * tf.random_normal(([num_particles] + f_mean.shape))
Thus, filter_samples is of shape [num_particles, filter_height, filter_width, in_channels, out_channels].
Then, I want to do:
output = tf.nn.conv2d(x, filter_samples, padding='SAME') # or VALID
where output should be of shape [num_particles] + standard convolution output shape.
For dense layers, it works to just do a tf.matmul(x, filter_samples), but for conv2d I'm not sure about the results and I can't find the implementation code to check it. Implementing it myself would end up slower than TF code, so I want to avoid it.
For SAME padding, the resulting shape seems okay, for VALID the batch dim is changed making me believe it doesn't work as I expect.
Just to make it clear, I need the output to have the num_particles dim. Code is TF1.x
Any ideas on how to get that?
I think there is some code to do similar in tfp.experimental.nn. We can follow up in the github issues you filed/responded to.

Is it possible to see the output after Conv2D layer in Keras

I am trying to understand each layer of Keras while implementing CNN.
In Conv2D layer i understand that it creates different convolution layer depending on various feature map values.
Now, My question is that
Can i see different feature map matrix that are applied on input image to get the convolution layer
Can i see the value of matrix that is generated after completion of Conv2D step.
Thanks in advance
You can get the output of a certain convolutional layer in this way:
import keras.backend as K
func = K.function([model.get_layer('input').input], model.get_layer('conv').output)
conv_output = func([numpy_input]) # numpy array
where 'input' and 'conv' denote the names of your input layer and convolutional layer. And you can get the weights of a certain layer like this:
conv_weights = model.get_layer('conv').get_weights() # numpy array

Tensorflow: Different activation values for same image

I'm trying to retrain (read finetune) a MobileNet image Classifier.
The script for retraining given by tensorflow here (from the tutorial), updates only the weights of the newly added fully connected layer. I modified this script to update weights of all the layers of the pre-trained model. I'm using MobileNet architecture with depth multiplier of 0.25 and input size of 128.
However while retraining I obsereved a strange thing, if I give a particular image as input for inference in a batch with some other images, the activation values after some layers are different from those when the image is passed alone. Also activation values for same image from different batches are different. Example - For two batches -
batch_1 : [img1, img2, img3]; batch_2 : [img1, img4, img5]. The activations for img1 are different from both the batches.
Here is the code I use for inference -
for tf.Session(graph=tf.get_default_graph()) as sess:
image_path = '/tmp/images/10dsf00003.jpg'
id_ = gfile.FastGFile(image_path, 'rb').read()
#The line below loads the jpeg using tf.decode_jpeg and does some preprocessing
id =, {jpeg_data_tensor: id_})
input_image_tensor = graph.get_tensor_by_name('input')
layerXname='MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu:0' #Name of the layer whose activations to inspect.
layerX = graph.get_tensor_by_name(layerXname), {input_image_tensor: id})
The above code is executed once as it is and once with the following change in the last line:, {input_image_tensor: np.asarray([np.squeeze(id), np.squeeze(id), np.squeeze(id)])})
Following are some nodes in the graph :
[u'input', u'MobilenetV1/Conv2d_0/weights', u'MobilenetV1/Conv2d_0/weights/read', u'MobilenetV1/MobilenetV1/Conv2d_0/convolution', u'MobilenetV1/Conv2d_0/BatchNorm/beta', u'MobilenetV1/Conv2d_0/BatchNorm/beta/read', u'MobilenetV1/Conv2d_0/BatchNorm/gamma', u'MobilenetV1/Conv2d_0/BatchNorm/gamma/read', u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean', u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean/read', u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance', u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance/read', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add/y', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/Rsqrt', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_2', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/sub', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add_1', u'MobilenetV1/MobilenetV1/Conv2d_0/Relu6', u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights', u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read', ... ...]
Now when layerXname = 'MobilenetV1/MobilenetV1/Conv2d_0/convolution'
The activations are same in both of the above specified cases. (i.e.
layerxactivations and layerxactivations_batch[0] are same).
But after this layer, all layers have different activation values. I feel that the batchNorm operations after 'MobilenetV1/MobilenetV1/Conv2d_0/convolution' layer behave differently for batch inputs and a single image. Or is the issue caused by something else ?
Any help/pointers would be appreciated.
When you build the mobilenet there is one parameter called is_training. If you don't set it to false the dropout layer and the batch normalization layer will give you different results in different iterations. Batch normalization will probably change very little the values but dropout will change them a lot as it drops some input values.
Take a look to the signature of mobilnet:
def mobilenet_v1(inputs,
"""Mobilenet v1 model for classification.
inputs: a tensor of shape [batch_size, height, width, channels].
num_classes: number of predicted classes.
dropout_keep_prob: the percentage of activation values that are retained.
is_training: whether is training or not.
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
ValueError: Input rank is invalid.
This is due to Batch Normalisation.
How are you running inference. Are you loading it from the checkpoint files or are you using a Frozen Protobuf model. If you use a frozen model you can expect similar results for different formats of inputs.
Check this out. A similar issue for a different application is raised here.

Output of Tensorflow LSTM-Cell

I've got a question on Tensorflow LSTM-Implementation. There are currently several implementations in TF, but I use:
cell = tf.contrib.rnn.BasicLSTMCell(n_units)
where n_units is the amount of 'parallel' LSTM-Cells.
Then to get my output I call:
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x,
initial_state=initial_state, time_major=False)
where (as time_major=False) x is of shape (batch_size, time_steps, input_length)
where batch_size is my batch_size
where time_steps is the amount of timesteps my RNN will go through
where input_length is the length of one of my input vectors (vector fed into the network on one specific timestep on one specific batch)
I expect rnn_outputs to be of shape (batch_size, time_steps, n_units, input_length) as I have not specified another output size.
Documentation of nn.dynamic_rnn tells me that output is of shape (batch_size, input_length, cell.output_size).
The documentation of tf.contrib.rnn.BasicLSTMCell does have a property output_size, which is defaulted to n_units (the amount of LSTM-cells I use).
So does each LSTM-Cell only output a scalar for every given timestep? I would expect it to output a vector of the length of the input vector. This seems not to be the case from how I understand it right now, so I am confused. Can you tell me whether that's the case or how I could change it to output a vector of size of the input vector per single lstm-cell maybe?
I think the primary confusion is on the terminology of the LSTM cell's argument: num_units. Unfortunately it doesn't mean, as the name suggests, "the no. of LSTM cells" that should be equal to your time-steps. They actually correspond to the number of dimensions in the hidden state (cell state + hidden state vector).
The call to dynamic_rnn() returns a tensor of shape: [batch_size, time_steps, output_size] where,
(Please note this) output_size = num_units; if (num_proj = None) in the lstm cell
where as, output_size = num_proj; if it is defined.
Now, typically, you will extract the last time_step's result and project it to the size of output dimensions using a mat-mul + biases operation manually, or use the num_proj argument in the LSTM cell.
I have been through the same confusion and had to look really deep to get it cleared. Hope this answer clears some of it.

