Trying to overfit in GRU characters RNN

Trying to overfit in GRU characters RNN - python

I have a GRU network, which is manually built (i.e. no nn.GRU) with 2 vertical layers, 128-dim hidden layer, and sequence length of 64 chars.
I'm trying to overfit a small corpus taken out of Shakespeare:
I ran training for 500+ epochs, and after every 25 epochs I generate a sample with (very low temperature) by giving only the first letter "B". At first it's gibberish, but at the end it does get close to the actual text. The first version was:
This was without passing the hidden state between batches. I thought maybe this was my problem, so I passed the hidden state, detached, between batches. But I still don't get perfect overfit, and it actually seems to worsen it:
Here's how I implemented the GRU:
for i in range(self.n_layers):
layer_input = layer_middle
layer_middle = torch.zeros((batch_size, seq_len, self.h_dim))
params = self.layer_params[i]
for j in range(seq_len):
x = layer_input[:, j, :].to(torch.device('cuda'))
z = F.sigmoid(params['W1'](x) + params['W2'](layer_states[i]))
r = F.sigmoid(params['W3'](x) + params['W4'](layer_states[i]))
g = F.tanh(params['W5'](x) + params['W6'](r*layer_states[i]))
layer_states[i] = z*layer_states[i] + (1-z)*g
layer_middle[:, j, :] = layer_states[i]
layer_middle = nn.Dropout(self.dropout)(layer_middle)
layer_output = torch.zeros((batch_size, seq_len, self.out_dim))
for j in range(seq_len):
x = layer_middle[:, j, :].to(torch.device('cuda'))
layer_output[:, j, :] = self.layer_params[-1]['W7'](x)
The params are nn.Linear with the correct shapes.
Any idea what might prevent me from overfitting to this tiny corpus??

Related

Pytorch - Handling in place operation for sequence to sequence Multi-dimensional LSTM layer

I am trying to implement a sequence to sequence LSTM layer in Pytorch. Here a sequence can be of higher dimension than 1 (an image is a sequence, it just has 2 indexes that increase instead of one). More information in the following paper.
The forward() function of my network is:
def forward(self, x):
""" Note: x is of shape (d1, ..., dn, batch_size, input_size). """
dimensions = x.shape[:-2]
batch_size = x.shape[-2]
f = torch.empty(self.dim_in, *dimensions, batch_size, self.size_out)
i = torch.empty(*dimensions, batch_size, self.size_out)
o = torch.empty(*dimensions, batch_size, self.size_out)
c = torch.empty(*dimensions, batch_size, self.size_out)
s = torch.empty(*dimensions, batch_size, self.size_out)
h = torch.empty(*dimensions, batch_size, self.size_out)
for idx in self.iter_idx(dimensions):
# 1/ Forget, input, output and cell activation gates.
for l in range(self.dim_in):
f[l][idx] = torch.sigmoid(self.biasf[l] + torch.mm(x[idx], self.wf[l]) + sum(torch.mul(h[prev(idx,k)], self.uf[l][k]) for k in np.nonzero(idx)[0]))
i[idx] = torch.sigmoid(self.biasi + torch.mm(x[idx], self.wi) + sum(torch.mul(h[prev(idx,k)], self.ui[k]) for k in np.nonzero(idx)[0]))
o[idx] = torch.sigmoid(self.biaso + torch.mm(x[idx], self.wo) + sum(torch.mul(h[prev(idx,k)], self.uo[k]) for k in np.nonzero(idx)[0]))
c[idx] = torch.sigmoid(self.biasc + torch.mm(x[idx], self.wc) + sum(torch.mul(h[prev(idx,k)], self.uc[k]) for k in np.nonzero(idx)[0]))
# 2/ cell state
s[idx] = torch.tanh(torch.mul(i[idx], c[idx]) + sum(torch.mul(f[k][idx], s[prev(idx,k)]) for k in np.nonzero(idx)[0]))
# 3/ Final output
h[idx] = torch.mul(o[idx], s[idx])
return h
When I try to run it and the backward() method of the loss is called, the following error is thrown:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 3]], which is output 0 of SelectBackward, is at version 100; expected version 99 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The backtrace points to this specific line:
s[idx] = torch.tanh(torch.mul(i[idx], c[idx]) + sum(torch.mul(f[k][idx], s[prev(idx,k)]) for k in np.nonzero(idx)[0]))
From what I understand, Pytorch is angry at me because I modified the tensor s in place. Looking at the error message, I see version 100 and expected version 99, which implies the error is thrown when computing the gradient for the last iteration of the loop and Pytorch expected to have s at a version ine loop iteration prior.
For me it shouldn't be an issue, because this loop merely "initializes" s (and the other tensors too). But I understand that the interpreter may have a hard time determining this statically.
So I have 3 questions:
why s specifically? It is neither the first nor the last tensor to be modified in place during the loop.
is there a way to tell Pytorch that I am just filling the arrays? Or to tell him to only look at the last version for gradient computation?
is there a way to rewrite this code without in place operations? I tried a version where f, i, o, etc are lists that grow with the loop, but I get the same error on lines like s = torch.cat((s,s_tmp), axis=0).
Thank you in advance.

Always same output for tensorflow autoencoder

At the moment I try to build an Autoencoder for timeseries data in tensorflow. I have nearly 500 days of data where each day have 24 datapoints. Since this is my first try my architecture is very simple. After my input of size 24 the hidden layers are of size: 10; 3; 10 with an output of again 24. I normalized the data (datapoints are in range [-0.5; 0.5]), use the sigmoid activation function and the RMSPropOptimizer.
After training (loss function in picture) the output is the same for every timedata i give into the network. Does someone know what is the reason for that? Is it possible that my Dataset is the issue (code below)?
class TimeDataset:
def __init__(self,data):
self._index_in_epoch = 0
self._epochs_completed = 0
self._data = data
self._num_examples = data.shape[0]
pass
#property
def data(self):
return self._data
def next_batch(self, batch_size, shuffle=True):
start = self._index_in_epoch
# first call
if start == 0 and self._epochs_completed == 0:
idx = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx) # shuffle indexe
self._data = self.data[idx] # get list of `num` random samples
if start + batch_size > self._num_examples:
# not enough samples left -> go to the next batch
self._epochs_completed += 1
rest_num_examples = self._num_examples - start
data_rest_part = self.data[start:self._num_examples]
idx0 = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx0) # shuffle indexes
self._data = self.data[idx0] # get list of `num` random samples
start = 0
self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
end = self._index_in_epoch
data_new_part = self._data[start:end]
return np.concatenate((data_rest_part, data_new_part), axis=0)
else:
# get next batch
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._data[start:end]
*edit: here are some examples of the output (red original, blue reconstructed):
**edit: I just saw an autoencoder example with a more complicant luss function than mine. Someone know if the loss function self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2)) is sufficient?
***edit: some more code to describe my training
This is my Autoencoder Class:
class AutoEncoder():
def __init__(self):
# Training Parameters
self.learning_rate = 0.005
self.alpha = 0.5
# Network Parameters
self.num_input = 24 # one day as input
self.num_hidden_1 = 10 # 2nd layer num features
self.num_hidden_2 = 3 # 2nd layer num features (the latent dim)
self.X = tf.placeholder("float", [None, self.num_input])
self.weights = {
'encoder_h1': tf.Variable(tf.random_normal([self.num_input, self.num_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([self.num_hidden_2, self.num_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_input])),
}
self.biases = {
'encoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([self.num_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([self.num_input])),
}
self.encoded = self.encoder(self.X)
self.decoded = self.decoder(self.encoded)
# Define loss and optimizer, minimize the squared error
self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2))
self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss)
def encoder(self, x):
# sigmoid, tanh, relu
en_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['encoder_h1']),
self.biases['encoder_b1']))
en_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(en_layer_1, self.weights['encoder_h2']),
self.biases['encoder_b2']))
return en_layer_2
def decoder(self, x):
de_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['decoder_h1']),
self.biases['decoder_b1']))
de_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(de_layer_1, self.weights['decoder_h2']),
self.biases['decoder_b2']))
return de_layer_2
and this is how I train my network (input data have shape (number_days, 24)):
model = autoencoder.AutoEncoder()
num_epochs = 3
batch_size = 50
num_batches = 300
display_batch = 50
examples_to_show = 16
loss_values = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#training
for e in range(1, num_epochs+1):
print('starting epoch {}'.format(e))
for b in range(num_batches):
# get next batch of data
batch_x = dataset.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
l = sess.run([model.loss], feed_dict={model.X: batch_x})
sess.run(model.optimizer, feed_dict={model.X: batch_x})
# Display logs
if b % display_batch == 0:
print('Epoch {}: Batch ({}) Loss: {}'.format(e, b, l))
loss_values.append(l)
# testing
test_data = dataset.next_batch(batch_size)
decoded_test_data = sess.run(model.decoded, feed_dict={model.X: test_data})

Just a suggestion, I have had some issues with autoencoders using the sigmoid function.
I switched to tanh or relu and those improved the results.
With the autoencoder it is basically learning to recreate the output from the input, by encoding and decoding. If you mean it's the same as the input, then you are getting what you want. It has learned the data set.
Ultimately you can compare by reviewing the Mean Squared Error between the input and output and see if it is exactly the same. If you mean that the output is exactly the same regardless of the input, that isn't something I've run into. I guess if your input doesn't vary much from day to day, then I could imagine that would have some impact. Are you looking for anomalies?
Also, if you have a time series for training, I wouldn't shuffle the data in this particular case. If the temporal order is significant, you introduce data leakage (basically introducing future data into the training set) depending on what you are trying to achieve.
Ah, I didn't initially see your post with the graph results.. thanks for adding.

The sigmoid output is floored at 0, so it cannot reproduce your data that is below 0.
If you want to use a sigmoid output, then rescale your data between ]0;1[ (0 and 1 excluded).

I know this is a very old post, so this is just an attempt to help whoever wonders here again with the same problem.... If the autoencoder is converging to the same encoding for all the different instances, there may be a problem in the loss function.... Check the size and shape of the return of the loss function, as it may be getting confused and evaluating the wrong tensors (i.e. you may need to transpose something somewhere) Basically, assuming you are using the autoencoder to encode M features of N training instances, your loss function should return N values. the size of your loss tensor should be the amount of instances in your training set. I found that the hard way.....

How does batching work in a seq2seq model in pytorch?

I am trying to implement a seq2seq model in Pytorch and I am having some problem with the batching.
For example I have a batch of data whose dimensions are
[batch_size, sequence_lengths, encoding_dimension]
where the sequence lengths are different for each example in the batch.
Now, I managed to do the encoding part by padding each element in the batch to the length of the longest sequence.
This way if I give as input to my net a batch with the same shape as said, I get the following outputs:
output, of shape [batch_size, sequence_lengths, hidden_layer_dimension]
hidden state, of shape [batch_size, hidden_layer_dimension]
cell state, of shape [batch_size, hidden_layer_dimension]
Now, from the output, I take for each sequence the last relevant element, that is the element along the sequence_lengths dimension corresponding to the last non padded element of the sequence. Thus the final output I get is of shape [batch_size, hidden_layer_dimension].
But now I have the problem of decoding it from this vector. How do I handle a decoding of sequences of different lengths in the same batch? I tried to google it and found this, but they don't seem to address the problem. I thought of doing element by element for the whole batch, but then I have the problem to pass the initial hidden states, given that the ones from the encoder will be of shape [batch_size, hidden_layer_dimension], while the ones from the decoder will be of shape [1, hidden_layer_dimension].
Am I missing something? Thanks for the help!

You are not missing anything. I can help you since I have worked on several sequence-to-sequence application using PyTorch. I am giving you a simple example below.
class Seq2Seq(nn.Module):
"""A Seq2seq network trained on predicting the next query."""
def __init__(self, dictionary, embedding_index, args):
super(Seq2Seq, self).__init__()
self.config = args
self.num_directions = 2 if self.config.bidirection else 1
self.embedding = EmbeddingLayer(len(dictionary), self.config)
self.embedding.init_embedding_weights(dictionary, embedding_index, self.config.emsize)
self.encoder = Encoder(self.config.emsize, self.config.nhid_enc, self.config.bidirection, self.config)
self.decoder = Decoder(self.config.emsize, self.config.nhid_enc * self.num_directions, len(dictionary),
self.config)
#staticmethod
def compute_decoding_loss(logits, target, seq_idx, length):
losses = -torch.gather(logits, dim=1, index=target.unsqueeze(1)).squeeze()
mask = helper.mask(length, seq_idx) # mask: batch x 1
losses = losses * mask.float()
num_non_zero_elem = torch.nonzero(mask.data).size()
if not num_non_zero_elem:
return losses.sum(), 0 if not num_non_zero_elem else losses.sum(), num_non_zero_elem[0]
def forward(self, q1_var, q1_len, q2_var, q2_len):
# encode the query
embedded_q1 = self.embedding(q1_var)
encoded_q1, hidden = self.encoder(embedded_q1, q1_len)
if self.config.bidirection:
if self.config.model == 'LSTM':
h_t, c_t = hidden[0][-2:], hidden[1][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2), torch.cat(
(c_t[0].unsqueeze(0), c_t[1].unsqueeze(0)), 2)
else:
h_t = hidden[0][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2)
else:
if self.config.model == 'LSTM':
decoder_hidden = hidden[0][-1], hidden[1][-1]
else:
decoder_hidden = hidden[-1]
decoding_loss, total_local_decoding_loss_element = 0, 0
for idx in range(q2_var.size(1) - 1):
input_variable = q2_var[:, idx]
embedded_decoder_input = self.embedding(input_variable).unsqueeze(1)
decoder_output, decoder_hidden = self.decoder(embedded_decoder_input, decoder_hidden)
local_loss, num_local_loss = self.compute_decoding_loss(decoder_output, q2_var[:, idx + 1], idx, q2_len)
decoding_loss += local_loss
total_local_decoding_loss_element += num_local_loss
if total_local_decoding_loss_element > 0:
decoding_loss = decoding_loss / total_local_decoding_loss_element
return decoding_loss
You can see the complete source code here. This application is about predicting users' next web-search query given the current web-search query.
The answerer to your question:
How do I handle a decoding of sequences of different lengths in the same batch?
You have padded sequences, so you can consider as all the sequences are of the same length. But when you are computing loss, you need to ignore loss for those padded terms using masking.
I have used a masking technique to achieve the same in the above example.
Also, you are absolutely correct on: you need to decode element by element for the mini-batches. The initial decoder state [batch_size, hidden_layer_dimension] is also fine. You just need to unsqueeze it at dimension 0, to make it [1, batch_size, hidden_layer_dimension].
Please note, you do not need to loop over each example in the batch, you can execute the whole batch at a time, but you need to loop over the elements of the sequences.

How to get results from all Tensorflow's while_loop() iterations as a list of not specified length?

I want to get results from all while_loop() iterations, so that tf.shape(result) == [batch_size, input_length].
batch_size and input_length may differ from mini-batch to mini-batch, so I evaluate their shapes dynamically.
I created this code:
batch_size = tf.shape(x)[0]
input_length = tf.shape(x)[1]
result = []
lstm = BasicLSTMCell(hidden_units)
cell = MultiRNNCell([lstm])
init_state = cell.zero_state(batch_size, dtype=tf.float32)
def body(i, x, result, state):
h, state = cell(x[:, i, :], state)
result.append(h)
i += 1
return i, x, result, state
def cond(i, *_):
return tf.less(i, input_length - 1)
_, _, result, state = tf.while_loop(cond, body, (i, x, result, init_state))
But Tensorflow implicitly flatten() all loop_vars I passed to it, so after 2 iterations my while_loop() input has size_of 4 but output has size_of 5 (due to 2 elements in the result variables).
I tried to use shape_invatiants but it doesn't work, probably due to usage of regular Python list. But still - I can't provide the exact size of that list (and if I will - it would be a hack) - because it matches number of iterations and this is the key feature which stands behind usage of while_loop() at all - to stop dynamically!
I don't need to add that it works fine if I just reassign result like this instead of append h to the result list:
h, state = cell(x[:, i, :], state)
result = h
So it seems to me a bit unreasonable- in most cases while_loop() is just an acumulator, not iterator! But if you want to use it as the iterator - it's almost defend itself from allowing you do that..
Can anyone know how to deal with it? I would be extremely grateful.
I'm using Tensorflow 1.5
EDIT:
From the line 749 of rnn.py in TF github library I understand that at least batch or input_length must be constant, but both can't be None
So - if you want to compute RNN dynamicaly - you should provide at least max_input_length param, for results, but your loop will be stopped after batch_size (when you reshape input x to [input_length, batch_size, vocab_size]. Is that correct?

How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?
I read the following related questions:
TensorFlow using LSTMs for generating text
TensorFlow LSTM Generative Model
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.
The following code works:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. What is going on here?
Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What to do if I want to use this trained network to use its own output as its input in the following time step?

If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:
Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
Pass the initial state into dynamic_rnn.
Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).
I.e., change the definition of the model to something like this:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.

You can use its own output (last state) as the next-step input (initial state).
One way to do this is to:
use zero-initialized variables as the input state at every time step
each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.
The second can be done by either:
fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.

I know I'm a bit late to the party but I think this gist could be useful:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.
Please ask any questions!
Edit:
In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.