Why the negative reshape (-1) in MNIST tutorial?

Why the negative reshape (-1) in MNIST tutorial? - python

Reading the Tensorflow MNIST tutorial, I stumbled over the line
x_image = tf.reshape(x, [-1,28,28,1])
28, 28 comes from width, height, 1 comes from the number of channels. But why -1?
I guess this is related to mini-batch training, but I wondered why -1 and not 1 (which seems to give the same result in numpy).
(Probably related: Why does the reshape of numpy give the same results for -1,-2 and 1)?

-1 means that the length in that dimension is inferred. This is done based on the constraint that the number of elements in an ndarray or Tensor when reshaped must remain the same. In the tutorial, each image is a row vector (784 elements) and there are lots of such rows (let it be n, so there are 784n elements). So, when you write
x_image = tf.reshape(x, [-1, 28, 28, 1])
TensorFlow can infer that -1 is n.

In the MNIST tutorial that you are reading, the desired shape for your input layer : [batch_size, 28, 28, 1]
x_image = tf.reshape(x, [-1,28,28,1])
Here -1 for input x specifies that this dimension should be dynamically computed based on the number of input values in x, holding the size of all other dimensions constant. This allows us to treat batch_size(parameter with value -1) as a hyperparameter that we can tune.

−1 indicates that the length on the current axis needs to be automatically deduced according to the rule that the total elements of the tensor remain unchanged

Related

Tensorflow: Diagonal of matrix of matrices / Diagonal of 4D tensor

Given a 4D tensor x of shape (batch_size, batch_size, seq_len, feature_dim), I want to be able to retrieve the matrices along the diagonal entries, i.e. I need a way to fetch all x[diag_entry, diag_entry, :, :] slices for the values range(batch_size) producing a tensor of shape (batch_size, seq_len, feature_dim). However, I cannot explicitly loop over range(batch_size) as batch_size may vary since I work in Keras. Does Tensorflow have functionality supporting such an operation?

How to achieve elementwise convolution for two tensors using tensorflow?

In my problem, I want to convolve two tensors in my neural network model.
The shape of two tensors is [None, 2, 1], [None, 3, 1] respectively. The axis with dimension None means the batch size of the input tensor. For each sample in batch, I want to convolve the two tensors with shape [2, 1] and [3, 1].
However, the tf.nn.conv1d in TensorFlow can only convolve the input with a fixed kernel. Is there any function that can support the convolution of two tensors according to the batch size axis, similar to the tf.multiply which can multiply two tensors for each sample or just elementwise multiplication.
The code I ran can be simplified as follows:
input_signal = Input(shape=(L, M), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
What I want to do is that the sample of input_signal can be convolved by the sample of input_h with the same index. However, it just shows my pure idea which can not be able to run in the env. My question is that how I can modify the code to enable the input tensor can be convolved with another input tensor for every sample in the batch.

According to the description of the kernel size arguments for Conv1D layer or any other layer mentioned in the documentation, you cannot add multiple filters with different Kernel size or strides.
Also, Convolutions with Kernels of different sizes will produce outputs of different height and width.
The general formula for output size assuming a symmetric kernel is given by
(X−K+2P)/S+1
Where X is the input Height / Width
K is the Kernel size
P is the zero-padding
S is the stride length
So assuming you are keeping zero paddings and stride same you cannot have multiple kernels with different sizes in ConvD layer.
You can, however, use the tf.keras.Model API to create Conv1D multiple times on the same input OR multiple Conv1D Layer for different inputs and kernel size respectively in your case and then either maxpool, crop or use zero paddings to match the dimensions of the different outputs before stacking them.
Example:
inputs = tf.keras.Input(shape=(n_timesteps,n_features))
x1 = tf.keras.layers.Conv1D(filters=32, kernel_size=2)(inputs)
x2 = tf.keras.layers.Conv1D(filters=16, kernel_size=3)(inputs)
#match dimensions (height and width) of x1 or x2 here
x3 = tf.keras.layers.Concatenate(axis=-1)[x1,x2]
You can use either Zeropadding1D or Cropping2D or Maxpool1D for matching the dimensions.

Tensorflow: combining two tensors with dimension X into one tensor with dimension X+1

I am doing some sentiment analysis with Tensorflow, but there is a problem I can't solve:
I have one tensor (input) shaped as [?, 38] [batch_size, max_word_length] and one (prediction) shaped as [?, 3] [batch_size, predicted_label].
My goal is to combine both tensors into a single tensor with the shape of [?, 38, 3].
This tensor is used as the input of my second stage.
Seems easy, but i can't find a way of doing it.
Can (and will) you tell me how to do this?

This is impossible. You have tensor, which contains batch_size * max_word_length
elements and tensor which contains batch_size * predicted_label elements. Hence there are
batch_size * (max_word_length + predicted_label)
elements. And now you want to create new tensor [batch_size, max_word_length, predicted_label] with
batch_size * max_word_length * predicted_label
elements. You don't have enough elements for this.

Get last output of dynamic_rnn in tensorflow?

I am using dynamic_rnn to process MNIST data:
# LSTM Cell
lstm = rnn_cell.LSTMCell(num_units=200,
forget_bias=1.0,
initializer=tf.random_normal)
# Initial state
istate = lstm.zero_state(batch_size, "float")
# Get lstm cell output
output, states = rnn.dynamic_rnn(lstm, X, initial_state=istate)
# Output at last time point T
output_at_T = output[:, 27, :]
Full code: http://pastebin.com/bhf9MgMe
The input to the lstm is (batch_size, sequence_length, input_size)
As a result the dimensions of output_at_T is (batch_size, sequence_length, num_units) where num_units=200.
I need to get the last output along the sequence_length dimension. In the code above, this is hardcoded as 27. However, I do not know the sequence_length in advance as it can change from batch to batch in my application.
I tried:
output_at_T = output[:, -1, :]
but it says negative indexing is not implemented yet, and I tried using a placeholder variable as well as a constant (into which I could ideally feed the sequence_length for a particular batch); neither worked.
Any way to implement something like this in tensorflow atm?

Have you noticed that there are two outputs from dynamic_rnn?
Output 1, let's call it h, has all outputs at each time steps (i.e. h_1, h_2, etc),
Output 2, final_state, has two elements: the cell_state, and the last output for each element of the batch (as long as you input the sequence length to dynamic_rnn).
So from:
h, final_state= tf.dynamic_rnn( ..., sequence_length=[batch_size_vector], ... )
the last state for each element in the batch is:
final_state.h
Note that this includes the case when the length of the sequence is different for each element of the batch, as we are using the sequence_length argument.

This is what gather_nd is for!
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.gather_nd(data, indices)
return res
In your case (assuming sequence_length is a 1-D tensor with the length of each axis 0 element):
output = extract_axis_1(output, sequence_length - 1)
Now output is a tensor of dimension [batch_size, num_cells].

output[:, -1, :]
works with Tensorflow 1.x now!!

Most answers cover it thoroughly, but this code snip might help understand what's really being returned by the dynamic_rnn layer
=> Tuple of (outputs, final_output_state).
So for an input with max sequence length of T time steps outputs is of the shape [Batch_size, T, num_inputs] (given time_major=False; default value) and it contains the output state at each timestep h1, h2.....hT.
And final_output_state is of the shape [Batch_size,num_inputs] and has the final cell state cT and output state hT of each batch sequence.
But since the dynamic_rnn is being used my guess is your sequence lengths vary for each batch.
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.dynamic_rnn(cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n({"outputs": outputs, "states":states},
n=1,
feed_dict=None)
assert result[0]["outputs"].shape == (2, 10, 64)
print result[0]["outputs"].shape
print result[0]["states"].h.shape
# the final outputs state and states returned must be equal for each
# sequence
assert(result[0]["outputs"][0][-1]==result[0]["states"].h[0]).all()
assert(result[0]["outputs"][-1][5]==result[0]["states"].h[-1]).all()
assert(result[0]["outputs"][-1][-1]==result[0]["states"].h[-1]).all()
The final assertion will fail as the final state for the 2nd sequence is at 6th time step ie. the index 5 and the rest of the outputs from [6:9] are all 0s in the 2nd timestep

I am new to Stackoverflow and cannot comment yet so I am writing this new answer. #VM_AI, the last index is tf.shape(output)[1] - 1.
So, reusing your answer:
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1] - 1
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This works for me.

You should be able to access the shape of your output tensor using tf.shape(output). The tf.shape() function will return a 1d tensor containing the sizes of the output tensor. In your example, this would be (batch_size, sequence_length, num_units)
You should then be able to extract the value of output_at_T as output[:, tf.shape(output)[1], :]

There is a function in TensorFlow tf.shape that allows you to get the symbolic interpretation of shape rather than None being returned by output._shape[1]. And after fetching the last index you can lookup by using tf.nn.embedding_lookup, which is recommended especially when the data to be fetched is high as this does parallel lookup 32 by default.
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1]
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This should work.
Just to clarify what #Benoit Steiner said. His solution would not work as tf.shape would return symbolic interpretation of the shape value, and such cannot be used for slicing tensors i.e., direct indexing

Using Sparse Tensors to feed a placeholder for a softmax layer in TensorFlow

Has anyone tried using Sparse Tensors for Text Analysis with TensorFlow with success? Everything is ready and I manage to feed feed_dict in tf.Session for a Softmax layer with numpy arrays, but I am unable to feed the dictionary with SparseTensorValues.
I have not found either documentation about using sparse matrices to train a model ( softmax for example ) with Tensor Flow, which is strange, as classes SparseTensor and SparseTensorValues or TensorFlow.sparse_to_dense methods are ready for it, but there is no documentation about how to feed the feed_dict dictionary of values in the session.run(fetches,feed_dict=None) method.
Thanks a lot,

I have found a way of putting sparse images into tensorflow including batch processing if that is of any help.
I create a 4-d sparse matrix in a dictionary where the dimensions are batchSize, xLen, ylen, zLen (where zLen is 3 for colour for example). The following pseudo code is for a batch of 50 32x96 pixel 3-color images. Values are the intensity of each pixel. In the snippet below I show the first 2 pixels of the first batch being initialised...
shape = [50, 32, 96, 3]
indices = [[0, 20, 31, 0],[0, 22, 33, 1], etc...]
values = [12, 24, etc...]
batch = {"indices": indices, "values": values, "shape": shape}
When setting up the computational graph I create a sparse-placeholder of the correct dimensions
images = tf.sparse_placeholder(tf.float32, shape=[None, 32, 96, 3])
'None' is used so I can vary the batch size.
When I first want to use the images, e.g. to feed into a batch convolution, I convert them back to a dense tensor:
images = tf.sparse_tensor_to_dense(batch)
Then when I am ready to run a session, e.g. for training, I pass the 3 components of the batch into the dictionary so that they will be picked up by the sparse_placeholder:
train_dict = {images: (batch['indices'], batch['values'], batch['shape']), etc...}
sess.run(train_step, feed_dict=train_dict)
If you are not needing to do batch processing just leave off the first dimension and remove 'none' from the placeholder shape.
I couldn't find any way of passing the images across in batch as an array of sparse matrices. It only worked if I created the 4th dimension. I'd be interested to know of alternatives.
Whilst this doesn't give an exact answer to your question I hope it is of use as I have been struggling with similar issues.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why the negative reshape (-1) in MNIST tutorial? - python

−1 indicates that the length on the current axis needs to be automatically deduced according to the rule that the total elements of the tensor remain unchanged

Related

Tensorflow: Diagonal of matrix of matrices / Diagonal of 4D tensor

How to achieve elementwise convolution for two tensors using tensorflow?

Tensorflow: combining two tensors with dimension X into one tensor with dimension X+1

Get last output of dynamic_rnn in tensorflow?

Using Sparse Tensors to feed a placeholder for a softmax layer in TensorFlow

Categories

Resources