Output of Tensorflow LSTM-Cell - python

I've got a question on Tensorflow LSTM-Implementation. There are currently several implementations in TF, but I use:
cell = tf.contrib.rnn.BasicLSTMCell(n_units)
where n_units is the amount of 'parallel' LSTM-Cells.
Then to get my output I call:
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x,
initial_state=initial_state, time_major=False)
where (as time_major=False) x is of shape (batch_size, time_steps, input_length)
where batch_size is my batch_size
where time_steps is the amount of timesteps my RNN will go through
where input_length is the length of one of my input vectors (vector fed into the network on one specific timestep on one specific batch)
I expect rnn_outputs to be of shape (batch_size, time_steps, n_units, input_length) as I have not specified another output size.
Documentation of nn.dynamic_rnn tells me that output is of shape (batch_size, input_length, cell.output_size).
The documentation of tf.contrib.rnn.BasicLSTMCell does have a property output_size, which is defaulted to n_units (the amount of LSTM-cells I use).
So does each LSTM-Cell only output a scalar for every given timestep? I would expect it to output a vector of the length of the input vector. This seems not to be the case from how I understand it right now, so I am confused. Can you tell me whether that's the case or how I could change it to output a vector of size of the input vector per single lstm-cell maybe?

I think the primary confusion is on the terminology of the LSTM cell's argument: num_units. Unfortunately it doesn't mean, as the name suggests, "the no. of LSTM cells" that should be equal to your time-steps. They actually correspond to the number of dimensions in the hidden state (cell state + hidden state vector).
The call to dynamic_rnn() returns a tensor of shape: [batch_size, time_steps, output_size] where,
(Please note this) output_size = num_units; if (num_proj = None) in the lstm cell
where as, output_size = num_proj; if it is defined.
Now, typically, you will extract the last time_step's result and project it to the size of output dimensions using a mat-mul + biases operation manually, or use the num_proj argument in the LSTM cell.
I have been through the same confusion and had to look really deep to get it cleared. Hope this answer clears some of it.

Related

Softmax Output Layer. Which dimension?

I am having a question regarding Neuronal Nets used for image segmentation. I am using a 3D Implementation of Deeplab that can be found here
I am using softmax, so the output layer is the following:
elif self.last_activation.lower() == 'softmax':
output = nn.Softmax()(output)
No dimension is defined, so I want to define it manually. But I am not sure which dimension I need tó set. The dimension of the output tensor is the following:
[batch_size, num_classes, width, height, depth]
So I would think that dim=1 would be correct. Is that correct?
Thanks!
Indeed it should be 1 as you want this axis to be summed to 1.
Be careful if you need to train your network with a crossentropyloss as this latter already include a softmax.

What is the input shape of the InputLayer in keras Tensorflow?

I have this data
X_regression = tf.range(0, 1000, 5)
y_regression = X + 100
X_reg_train, X_reg_test = X_regression[:150], X_regression[150:]
y_reg_train, y_reg_test = y_regression[:150], y_regression[150:]
I inspect the data input data
X_reg_train[0], X_reg_train[0].shape, X_reg_train[0].ndim
and it returns:
(<tf.Tensor: shape=(), dtype=int32, numpy=0>, TensorShape([]), 0)
I build a model:
# Set the random seed
tf.random.set_seed(42)
# Create the model
model_reg = tf.keras.models.Sequential()
# Add Input layer
model_reg.add(tf.keras.layers.InputLayer(input_shape=[1]))
# Add Hidden layers
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))
# Add last layer
model_reg.add(tf.keras.layers.Dense(units=1))
# Compile the model
model_reg.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.mae,
metrics=[tf.keras.metrics.mae])
# Fit the model
model_reg.fit(X_reg_train, y_reg_train, epochs=10)
The model works.
However, I am confused about input_shape
Why is it [1] in this situation? Why is it sometimes a tuple?
Would appreciate an explanation of different formats of input_shape in different situations.
InputLayer is actually just the same as specifying the parameter input_shape in a Dense layer. Keras actually uses InputLayer when you use method 2 in the background.
# Method 1
model_reg.add(tf.keras.layers.InputLayer(input_shape=(1,)))
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))
# Method 2
model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(1,), activation=tf.keras.activations.relu))
The parameter input_shape is actually supposed to be a tuple, if you noticed that I set the input_shape in your example to be (1,) this is a tuple with a single element in it. As your data is 1D, you pass in a single element at a time therefore the input shape is (1,).
If your input data was a 2D input for example when trying to predict the price of a house based on multiple variables, you would have multiple rows and multiple columns of data. In this case, you pass in the input shape of the last dimension of the X_reg_train which is the number of inputs. If X_reg_train was (1000,10) then we use the input_shape of (10,).
model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(X_reg_train.shape[1],), activation=tf.keras.activations.relu))
Ignoring the batch_size for a moment, with this we are actually just sending a single row of the data to predict a single house price. The batch_size is just here to chunk multiple rows of data together so that we do not have to load the entire dataset into memory which is computationally expensive, so we send small chunks, with the default value being 32. When running the training you would have noticed that under each epoch it says 5/5 which are for the 5 batches of data you have, since the training size is 150, 150 / 32 = 5(rounded up).
For 3D input with the Dense layer it actually just gets flattened to a 2D input, i.e. from (batch_size, sequence_length, dim) -> (batch_size * sequence_length, dim) -> (batch_size, sequence_length, hidden_units) which is the same as using a Conv1D layer with a kernel of 1. So I wouldn't even use the Dense layer in this case.
In Keras, the input layer itself is not a layer, but a tensor. It's the starting tensor you send to the first hidden layer. This tensor must have the same shape as your training data.
Example: if you have 30 images of 50x50 pixels in RGB (3 channels), the shape of your input data is (30,50,50,3). Then your input layer tensor, must have this shape (see details in the "shapes in keras" section).
Each type of layer requires the input with a certain number of dimensions:
Dense layers require inputs as (batch_size, input_size) or (batch_size, optional,...,optional, input_size) or in your case just (input_size)
2D convolutional layers need inputs as:
if using channels_last: (batch_size, imageside1, imageside2, channels)
if using channels_first: (batch_size, channels, imageside1, imageside2)
1D convolutions and recurrent layers use (batch_size, sequence_length, features)
Here are some helpful links : Keras input explanation: input_shape, units, batch_size, dim, etc https://keras.io/api/layers/core_layers/input/

Variable sentence length for LSTM using word2vec as inputs on tensorflow

I am building an LSTM Model using word2vec as an input. I am using the tensorflow framework. I have finished word embedding part, but I am stuck with LSTM part.
The issue here is that I have different sentence lengths, which means that I have to either do padding or use dynamic_rnn with specified sequence length. I am struggling with both of them.
Padding.
The confusing part of padding is when I do padding. My model goes like
word_matrix=model.wv.syn0
X = tf.placeholder(tf.int32, shape)
data = tf.placeholder(tf.float32, shape)
data = tf.nn.embedding_lookup(word_matrix, X)
Then, I am feeding sequences of word indices for word_matrix into X. I am worried that if I pad zero's to the sequences fed into X, then I would incorrectly keep feeding unnecessary input (word_matrix[0] in this case).
So, I am wondering what is the correct way of 0 padding. It would be great if you let me know how to implement it with tensorflow.
dynamic_rnn
For this, I have declared a list containing all the lengths of sentences and feed those along with X and y at the end. In this case, I cannot feed the inputs as batch though. Then, I have encountered this error (ValueError: as_list() is not defined on an unknown TensorShape.), which seems to me that sequence_length argument only accepts list? (My thoughts might be entirely incorrect though).
The following is my code for this.
X = tf.placeholder(tf.int32)
labels = tf.placeholder(tf.int32, [None, numClasses])
length = tf.placeholder(tf.int32)
data = tf.placeholder(tf.float32, [None, None, numDimensions])
data = tf.nn.embedding_lookup(word_matrix, X)
lstmCell = tf.contrib.rnn.BasicLSTMCell(lstmUnits, state_is_tuple=True)
lstmCell = tf.contrib.rnn.DropoutWrapper(cell=lstmCell, output_keep_prob=0.25)
initial_state=lstmCell.zero_state(batchSize, tf.float32)
value, _ = tf.nn.dynamic_rnn(lstmCell, data, sequence_length=length,
initial_state=initial_state, dtype=tf.float32)
I am so struggling with this part so that any help would be very much appreciated.
Thank you in advance.
Tensorflow does not support variable length Tensor. So when you declare a Tensor, the list/numpy array should have a uniform shape.
From your 1st part, what I understand is that you were already able to pad the zeros in the last time steps of the sequence length. Which is what the ideal situation should be. Here is how it should look for a batch size of 4, max sequence length 10 and 50 hidden units ->
[4,10,50] would be the size of your whole batch, but internally, it may be shaped like this when you try to visualize the paddings ->
`[[5+5pad,50],[10,50],[8+2pad,50],[9+1pad,50]`
Each pad would represent a sequence length of 1 with hidden state size 50 Tensor. All filled with nothing but zeroes. Look at this question and this one to know more about how to pad manually.
You will use dynamic rnn for the exact reason that you do not want to compute it on the padding sequences. The tf.nn.dynamic_rnn api will ensure that by passing the sequence_length argument.
For the above example, that argument will be: [5,10,8,9] for the example above. You can compute it by summing the non-zero entities for each batch component. A simple way to compute that would be:
data_mask = tf.cast(data, tf.bool)
data_len = tf.reduce_sum(tf.cast(data_mask, tf.int32), axis=1)
and pass it in the tf.nn.dynamic_rnn api:
tf.nn.dynamic_rnn(lstmCell, data, sequence_length=data_len, initial_state=initial_state)

How does tensorflow scale RNNCell weight tensors when changing their dimensions?

I'm trying to understand how the weights are scaled in a RNNCell when going from training to inference in tensorflow.
Consider the following placeholders defined as:
data = tf.placeholder(tf.int32,[None,max_seq_len])
targets = tf.placholder(tf.int32,[None,max_seq_len])
During training the batch_size is set to 10, e.g. both tensors have shape [10,max_seq_len]. However, during inference only one example is used, not a batch of ten, so the tensors have shape [1,max_seq_len].
Tensorflow handles this dimension change seamlessly, however, I'm uncertain of how it does this?
My hypothesis is that weigth tensors, in the RNNCell, are actually shape [1,hidden_dim], and scaling to larger batch sizes is acheived by broadcasting, but I'm unable to find something that reflects this in the source. I've read through the rnn source and the rnn cell source. Any help with understanding this would be much appreciated.
you have defined your data tensor as data = tf.placeholder(tf.int32,[None,max_seq_len]) which means that the first dimension will change according to the input but the second dimension will always remain max_seq_len
So if max_seq_len = 5 than you feed shape can be either [1, 5], [2, 5], [3, 5] which means you can change the first dimension but not the second one
If you change the second dimension to a number other than 5 then it will throw you an error for mismatch shape or similar error
Your input's first dimension which is the batch_size won't affect the weight matrix of any of the neurons in you network

LSTM example with MNIST

I am currently trying to understand the meaning of outputs and states of the tf.nn.rnn function in tensorflow:
outputs, states = tf.nn.rnn(lstm_cell, x, dtype=tf.float32)
through the LSTM MNIST tutorial.
Indeed, with respect to the following tutorial, Understanding LSTM, I am wondering what correspond to these variables.
In my opinion, outputs correspond to the hidden state (denoted as h_t in the previous link) but I am not sure.
Thus, I understand that outputs is a list of time_steps tensors of shape (batch_size, n_hidden). But why does states is a a list of 2 tensors of shape (batch_size, n_hidden). Is it just the cell state for the last time step?
This seems like an old question. Sorry if you figured it out already.
The outputs are the values coming out of the "top" of your LSTM units in the diagram. That is the output of your LSTM layer in total.
As you suggest in your question, the hidden states are actually the temporal hidden states that move from one time period to the next. They are the C_t and h_t values that move along the time axis as output/input to the unrolled LSTM layer. That is why you get 2 tensors each with size (batch_size,n_hidden)

Categories

Resources