I'm following this tutorial (https://theneuralperspective.com/2016/11/20/recurrent-neural-networks-rnn-part-3-encoder-decoder/) to implement a RNN using tensorflow.
I have prepared the input data for both encoder and decoder as described but I'm having trouble understanding how the dynamic_rnn function on tensorflow is expecting the input for this particular task so I'm going to describe the data I have for now in hopes someone has an idea.
I have 20 sentences in english and in french. As is described on the above link, each sentence is padded according to the longest sentence in each language and tokens for GO and EOS are added where expected.
Since not all the code is available on the link I tried doing some of the missing code for myself. This is what my encoder code looks like
with tf.variable_scope('encoder') as scope:
# Encoder RNN cell
self.encoder_lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_units, forget_bias=0.0, state_is_tuple=True)
self.encoder_cell = tf.nn.rnn_cell.MultiRNNCell([self.encoder_lstm_cell] * num_layers, state_is_tuple=True)
# Embed encoder RNN inputs
with tf.device("/cpu:0"):
embedding = tf.get_variable(
"embedding", [self.vocab_size_en, hidden_units], dtype=data_type)
self.embedded_encoder_inputs = tf.nn.embedding_lookup(embedding, self.encoder_inputs)
# Outputs from encoder RNN
self.encoder_outputs, self.encoder_state = tf.nn.dynamic_rnn(
cell=self.encoder_cell,
inputs=self.embedded_encoder_inputs,
sequence_length=self.seq_lens_en, time_major=False, dtype=tf.float32)
My embeddings are of shape [20, 60, 60] but I think this is incorrect. If I look into the documentation for dynamic_rnn (https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) then the inputs are expected to be of size [batch, max_time, ...]. If I look into this example here (https://medium.com/#erikhallstrm/using-the-dynamicrnn-api-in-tensorflow-7237aba7f7ea#.4hrf7z1pd) then the inputs are supposed to be [batch, truncated_backprop_length, state_size].
The max length of a sentence for the encoder is 60. This is where I am confused about what exactly should the 2 last numbers on the inputs should be. Should max_time be also the max length for my sentence sequence or should that be another quantity. Also should the last number be also the max length of the sentences' sequences or the length of the current input without the padding?
Related
I am building a 1D CNN model using Keras for text classification where the input is a sequence of words generated by tokenizer.texts_to_sequences. Is there a way to also feed in a sequence of numerical features (e.g. a score) for each word in the sequence? For example, for sentence 1 the input would be ['the', 'dog', 'barked'] and each word in this particular sequence has the scores [0.9, 0.75, 0.6]. The scores are not word specific, but sentence specific scores of the words (if that makes a difference for how to format the input). Would an LSTM be more appropriate in this case?
Many thanks in advance!
Yes, just use 2 channels in the input tensor.
In better terms, if you input before had shape: (batch_size, seq_len)
Now you could have: (batch_size, seq_len, 2)
If you look at the Keras documentation, you see that with the parameter data_format you pass a string, one of channels_last (default) or channels_first. In this case the default would be fine, because the 2 (number of channels is last).
You can just stack the 2 input arrays into a tensor with this shape.
Now if you use a word embedding probably the number of channels will not be 2, but it would be embedding_dim + 1, so the final input shape would be: (batch_size, seq_len, embedding_dim + 1)
In general you can also refer to this other Stack Overflow question.
In any case, both CNN 1D and LSTM could be good models... but this you need to discover yourself depending on your task, data and model constraints.
Now as a final remark, you could even think of a model with multiple inputs one the word sequence and the other the scores. See this documentation page or this random tutorial I found on the internet. You can again refer also to the same SO question.
I've been trying to teach myself the basics of RNN's with a personnal project on PyTorch. I want to produce a simple network that is able to predict the next character in a sequence (idea mainly from this article http://karpathy.github.io/2015/05/21/rnn-effectiveness/ but I wanted to do most of the stuff myself).
My idea is this : I take a batch of B input sequences of size n (np array of n integers), one hot encode them and pass them through my network composed of several LSTM layers, one fully connected layers and one softmax unit.
I then compare the output to the target sequences which are the input sequences shifted one step ahead.
My issue is that when I include the softmax layer, the output is the same every single epoch for every single batch. When I don't include it, the network seems to learn appropriately. I can't figure out what's wrong.
My implementation is the following :
class Model(nn.Module):
def __init__(self, one_hot_length, dropout_prob, num_units, num_layers):
super().__init__()
self.LSTM = nn.LSTM(one_hot_length, num_units, num_layers, batch_first = True, dropout = dropout_prob)
self.dropout = nn.Dropout(dropout_prob)
self.fully_connected = nn.Linear(num_units, one_hot_length)
self.softmax = nn.Softmax(dim = 1)
# dim = 1 as the tensor is of shape (batch_size*seq_length, one_hot_length) when entering the softmax unit
def forward_pass(self, input_seq, hc_states):
output, hc_states = self.LSTM (input_seq, hc_states)
output = output.view(-1, self.num_units)
output = self.fully_connected(output)
# I simply comment out the next line when I run the network without the softmax layer
output = self.softmax(output)
return output, hc_states
one_hot_length is the size of my character dictionnary (~200, also the size of a one hot encoded vector)
num_units is the number of hidden units in a LSTM cell, num_layers the number of LSTM layers in the network.
The inside of the training loop (simplified) goes as follows :
input, target = next_batches(data, batch_pointer)
input = nn.functional.one_hot(input_seq, num_classes = one_hot_length).float().
for state in hc_states:
state.detach_()
optimizer.zero_grad()
output, states = net.forward_pass(input, hc_states)
loss = nn.CrossEntropyLoss(output, target)
loss.backward()
nn.utils.clip_grad_norm_(net.parameters(), MaxGradNorm)
optimizer.step()
With hc_states a tuple with the hidden states tensor and the cell states tensor, input, is a tensor of size (B,n,one_hot_length), target is (B,n).
I'm training on a really small dataset (sentences in a .txt of ~400Ko) just to tune my code, and did 4 different runs with different parameters and each time the outcome was the same : the network doesn't learn at all when it has the softmax layer, and trains somewhat appropriately without.
I don't think it is an issue with tensors shapes as I'm almost sure I checked everything.
My understanding of my problem is that I'm trying to do classification, and that the usual is to put a softmax unit at the end to get "probabilities" of each character to appear, but clearly this isn't right.
Any ideas to help me ?
I'm also fairly new to Pytorch and RNN so I apologize in advance if my architecture/implementation is some kind of monstrosity to a knowledgeable person. Feel free to correct me and thanks in advance.
So currently i'm sitting on a text-classification problem, but i can't even set up my model in Tensorflow. I have a batch of sentences of length 70 (using padding) and i'm using a embedding_lookup with an embedding size of 300. Here the code for the embedding:
embedding = tf.constant(embedding_matrix, name="embedding")
inputs = tf.nn.embedding_lookup(embedding, input_.input_data)
So now inputs should be of shape [batch_size, sentence_length, embedding_size] which is not surprising. Now sadly i'm getting a ValueError for my LSTMCell since it is expecting ndim=2 and obviously inputs is of ndim=3. I have not found a way to change the expected input shape of the LSTM Layer. Here is the code for my LSTMCell init:
for i in range(num_layers):
cells.append(LSTMCell(num_units, forget_bias, state_is_tuple, reuse=reuse, name='lstm_{}'.format(i))
cell = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True)
The error is triggered in the call function of the cell, that looks like this:
for step in range(self.num_steps):
if step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, step, :], state)
Similar question but not helping: Understanding Tensorflow LSTM Input shape
I could solve the problem myself. As it seems, the LSTMCell implementation is more hands on and basic in relation to how a LSTM actually works. The Keras LSTM Layers took care of stuff i need to consider when i'm using TensorFlow. The example i'm using is from the following official TensorFlow example:
https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb
As we want to feed our LSTM Layers with a sequence, we need to feed the cells each word after another. As the call of the Cell creates two outputs (cell output and cell state), we use a loop for all words in all sentences to feed the cell and reuse our cell states. This way we create the output for our layers, which we can then use for further operations. The code for this looks like this:
self._initial_state = cell.zero_state(config.batch_size, data_type())
state = self._initial_state
outputs = []
with tf.variable_scope("RNN"):
for time_step in range(self.num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
num_steps represents the amount of words in our sentence, that we are going to use.
I am building an LSTM Model using word2vec as an input. I am using the tensorflow framework. I have finished word embedding part, but I am stuck with LSTM part.
The issue here is that I have different sentence lengths, which means that I have to either do padding or use dynamic_rnn with specified sequence length. I am struggling with both of them.
Padding.
The confusing part of padding is when I do padding. My model goes like
word_matrix=model.wv.syn0
X = tf.placeholder(tf.int32, shape)
data = tf.placeholder(tf.float32, shape)
data = tf.nn.embedding_lookup(word_matrix, X)
Then, I am feeding sequences of word indices for word_matrix into X. I am worried that if I pad zero's to the sequences fed into X, then I would incorrectly keep feeding unnecessary input (word_matrix[0] in this case).
So, I am wondering what is the correct way of 0 padding. It would be great if you let me know how to implement it with tensorflow.
dynamic_rnn
For this, I have declared a list containing all the lengths of sentences and feed those along with X and y at the end. In this case, I cannot feed the inputs as batch though. Then, I have encountered this error (ValueError: as_list() is not defined on an unknown TensorShape.), which seems to me that sequence_length argument only accepts list? (My thoughts might be entirely incorrect though).
The following is my code for this.
X = tf.placeholder(tf.int32)
labels = tf.placeholder(tf.int32, [None, numClasses])
length = tf.placeholder(tf.int32)
data = tf.placeholder(tf.float32, [None, None, numDimensions])
data = tf.nn.embedding_lookup(word_matrix, X)
lstmCell = tf.contrib.rnn.BasicLSTMCell(lstmUnits, state_is_tuple=True)
lstmCell = tf.contrib.rnn.DropoutWrapper(cell=lstmCell, output_keep_prob=0.25)
initial_state=lstmCell.zero_state(batchSize, tf.float32)
value, _ = tf.nn.dynamic_rnn(lstmCell, data, sequence_length=length,
initial_state=initial_state, dtype=tf.float32)
I am so struggling with this part so that any help would be very much appreciated.
Thank you in advance.
Tensorflow does not support variable length Tensor. So when you declare a Tensor, the list/numpy array should have a uniform shape.
From your 1st part, what I understand is that you were already able to pad the zeros in the last time steps of the sequence length. Which is what the ideal situation should be. Here is how it should look for a batch size of 4, max sequence length 10 and 50 hidden units ->
[4,10,50] would be the size of your whole batch, but internally, it may be shaped like this when you try to visualize the paddings ->
`[[5+5pad,50],[10,50],[8+2pad,50],[9+1pad,50]`
Each pad would represent a sequence length of 1 with hidden state size 50 Tensor. All filled with nothing but zeroes. Look at this question and this one to know more about how to pad manually.
You will use dynamic rnn for the exact reason that you do not want to compute it on the padding sequences. The tf.nn.dynamic_rnn api will ensure that by passing the sequence_length argument.
For the above example, that argument will be: [5,10,8,9] for the example above. You can compute it by summing the non-zero entities for each batch component. A simple way to compute that would be:
data_mask = tf.cast(data, tf.bool)
data_len = tf.reduce_sum(tf.cast(data_mask, tf.int32), axis=1)
and pass it in the tf.nn.dynamic_rnn api:
tf.nn.dynamic_rnn(lstmCell, data, sequence_length=data_len, initial_state=initial_state)
I'm trying to feed sentences in which each world has word2vec representation.
How can I do it in tensorflow seq2seq models?
Suppose the variable
enc_inp = [tf.placeholder(tf.int32, shape=(None,10), name="inp%i" % t)
for t in range(seq_length)]
Which has dimensions [num_of_observations or batch_size x word_vec_representation x sentense_lenght].
when I pass it to embedding_rnn_seq2seq
decode_outputs, decode_state = seq2seq.embedding_rnn_seq2seq(
enc_inp, dec_inp, stacked_lstm,
seq_length, seq_length, embedding_dim)
error occurs
ValueError: Linear is expecting 2D arguments: [[None, 10, 50], [None, 50]]
Also there is a more complex problem
How can i pas as input a vector, not a scalar to first cell of my RNN?
By now it looks like (when we are about any sequence)
get first value of sequence (scalar)
compute First layer RNN First layer embedding cell output
compute First layer RNN Second layer embedding cell output
etc
But this is needed:
Get first value of sequence (vector)
compute First layer RNN First layer cell output (as ordinary computing simple perceptron when Input is a vector)
compute First layer RNN Second layer embedding cell output (as ordinary computing simple perceptron when Input is a vector)
The main point is that:
seq2seq make inside themself word embedding.
Here is reddit question and answer
Also, if smbd wants to use pretrained Word2Vec there are ways to do it,
see:
stackoverflow 1
stackoverflow 2
So this can be used no only for word embedding