How to slice a tensor with None dimension in Tensorflow - python

I want to slice a tensor in "None" dimension.
For example,
tensor = tf.placeholder(tf.float32, shape=[None, None, 10], name="seq_holder")
sliced_tensor = tensor[:,1:,:] # it works well!
but
# Assume that tensor's shape will be [3,10, 10]
tensor = tf.placeholder(tf.float32, shape=[None, None, 10], name="seq_holder")
sliced_seq = tf.slice(tensor, [0,1,0],[3, 9, 10]) # it doens't work!
It is same that i get a message when i used another place_holder to feed size parameter for tf.slice().
The second methods gave me "Input size (depth of inputs) must be accessible via shape inference" error message.
I'd like to know what's different between two methods and what is more tensorflow-ish way.
[Edited]
Whole code is below
import tensorflow as tf
import numpy as np
print("Tensorflow for tests!")
vec_dim = 5
num_hidden = 10
# method 1
input_seq1 = np.random.random([3,7,vec_dim])
# method 2
input_seq2 = np.random.random([5,10,vec_dim])
shape_seq2 = [5,9,vec_dim]
# seq: [batch, seq_len]
seq = tf.placeholder(tf.float32, shape=[None, None, vec_dim], name="seq_holder")
# Method 1
sliced_seq = seq[:,1:,:]
# Method 2
seq_shape = tf.placeholder(tf.int32, shape=[3])
sliced_seq = tf.slice(seq,[0,0,0], seq_shape)
cell = tf.contrib.rnn.GRUCell(num_units=num_hidden)
init_state = cell.zero_state(tf.shape(seq)[0], tf.float32)
outputs, last_state = tf.nn.dynamic_rnn(cell, sliced_seq, initial_state=init_state)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# method 1
# states = sess.run([sliced_seq], feed_dict={seq:input_seq1})
# print(states[0].shape)
# method 2
states = sess.run([sliced_seq], feed_dict={seq:input_seq2, seq_shape:shape_seq2})
print(states[0].shape)

Your problem is exactly described by issue #4590
The problem is that tf.nn.dynamic_rnn needs to know the size of the last dimension in the input (the "depth"). Unfortunately, as the issue points out, currently tf.slice cannot infer any output size if any of the slice ranges are not fully known at graph construction time; therefore, sliced_seq ends up having a shape (?, ?, ?).
In your case, the first issue is that you are using a placeholder of three elements to determine the size of the slice; this is not the best approach, since the last dimension should never change (even if you later pass vec_dim, it could cause errors). The easiest solution would be to turn seq_shape into a placeholder of size 2 (or even two separate placeholders), and then do the slicing like:
sliced_seq = seq[:seq_shape[0], :seq_shape[1], :]
For some reason, the NumPy-style indexing seems to have better shape inference capabilities, and this will preserve the size of the last dimension in sliced_seq.

Related

Variable batch_size in call function

I am trying to implement an attention network with TensorFlow 2. Thus, for every image, I want to take only some glimpses, i.e. a small part from the image. For this I have implemented a subclass from tensorflow.keras.models.Model, here is a snippet out of it.
class RecurrentAttentionModel(models.Model):
# ...
def call(self, inputs):
l = tf.random.uniform((40,2,), minval=0, maxval=1)
for _ in range(0, self.glimpses):
glimpse = tf.image.extract_glimpse(inputs, size=(self.retina_size, self.retina_size), offsets=l, centered=False, normalized=True)
# some other code...
# update l to take a glimpse somewhere else
return result
Now, the code above works and trains perfectly, but my issue is, that I have the hardcoded 40 in it, the batch_size which I have defined in my dataset. I am not able to read/get the batch_size in the call method since the variable "inputs" is of the form Tensor("input_1_77:0", shape=(None, 250, 500, 1), dtype=float32) where the None for the batch_size seems to be expected behavior.
When I just initialize l with the following code (without the batch_size)
l = tf.random.uniform((2,), minval=0, maxval=1)
it throws this error
ValueError: Shape must be rank 2 but is rank 1 for 'recurrent_attention_model_86/ExtractGlimpse' (op: 'ExtractGlimpse') with input shapes: [?,250,500,1], [2], [2]
what I totally understand but I have no idea how I could implement the initial values according to the batch_size.
You can extract the batch size dimension dynamically by using tf.shape.
l = tf.random.normal(tf.stack([tf.shape(inputs)[0], 2]), minval=0, maxval=1))

Reverse every other row in TensorFlow

Given a tensor input of undefined shape H x W, I would like to reverse every other row.
In numpy, I would simply do
input[1::2, :] = input[1::2, ::-1]
but this is apparently not possible in TensorFlow.
Note that the input shape is only partially-known, i.e., input.shape == (None, None).
Any ideas?
You can achieve the same using placeholder
input = tf.placeholder(shape=(None, None), dtype=tf.int32)
# define axis to reverse
axis_to_reverse=1
input_reversed = tf.reverse(input, [axis_to_reverse])
sess = tf.Session()
_input_reversed = sess.run(input_reversed, {input: your array})

Range of size of tensor's dimension - tf.range

I'm trying to define an operation for a NN I'm implementing, but to do so I need to iterate over the dimension of a tensor. I have a small working example below.
X = tf.placeholder(tf.float32, shape=[None, 10])
idx = [[i] for i in tf.range(X.get_shape()[0])]
This produces an error stating
ValueError: Cannot convert an unknown Dimension to a Tensor: ?
When using the same code but using tf.shape instead, resulting in the code being
X = tf.placeholder(tf.float32, shape=[None, 10])
idx = [[i] for i in tf.range(tf.shape(X)[0])]
Gives the following error
TypeError: 'Tensor' object is not iterable.
The way that I'm implementing this NN, the batch_size isn't defined until the training function, which is at the end of the code. This is just where I'm building the graph itself, so the batch_size isn't known by this point, and it can't be fixed as the training batch_size and the test set batch_sizes are different.
What is the best way to fix this? This is the last thing keeping my code from running, as I got it to run with a fixed batch_size, though those results aren't useful. I've been pouring over the TensorFlow API Documentation and stack overflow for weeks to no avail.
I've also tried to feed in a placeholder into the range, so when I'm running the test/training set the code would be the following
X = tf.placeholder(tf.float32, shape=[None, 10])
bs = tf.placeholder(tf.int32)
def My_Function(X):
# Do some stuff to X
idx = [[i] for i in tf.range(bs)]
# return some tensor
A = tf.nn.relu(My_Function(X))
However, this gives the same error as above
TypeError: 'Tensor' object is not iterable.
I think you should use the tf.shape(x) instead.
x = tf.placeholder(..., shape=[None, ...])
batch_size = tf.shape(x)[0] # Returns a scalar `tf.Tensor`
print x.get_shape()[0] # ==> "?"
# You can use `batch_size` as an argument to other operators.
some_other_tensor = ...
some_other_tensor_reshaped = tf.reshape(some_other_tensor, [batch_size, 32, 32])
# To get the value, however, you need to call `Session.run()`.
sess = tf.Session()
x_val = np.random.rand(37, 100, 100)
batch_size_val = sess.run(batch_size, {x: x_val})
print x_val # ==> "37"
See : get the size of a variable batch dimension
You can't operate on tensors that way. You need to use tf.map_fn as user1735003 mentioned.
Here is an example where I used tf.map_fn in order to pass the output of an LSTM at each timestep into a linear layer, defined by weights['out'] and biases['out'].
x = tf.placeholder("float", [features_dimension, None, n_timesteps])
weights = {'out': tf.Variable(tf.zeros([N_HIDDEN_LSTM, labels_dimension]))}
biases = {'out': tf.Variable(tf.zeros([labels_dimension]))}
def LSTM_model(x, weights, biases):
lstm_cell = rnn.LSTMCell(N_HIDDEN_LSTM)
# outputs is a Tensor of shape (n_timesteps, n_observations, N_HIDDEN_LSTM)
outputs, states = tf.nn.dynamic_rnn(lstm_cell, x, dtype=tf.float32, time_major=True)
# Linear activation
def pred_fn(current_output):
return tf.matmul(current_output, weights['out']) + biases['out']
# Use tf.map_fn to apply pred_fn to each tensor in outputs, along
# dimension 0 (timestep dimension)
pred = tf.map_fn(pred_fn, outputs)
return pred
Could tf.map_fn be what you are looking for?
x = tf.placeholder(tf.float32, shape=[None, 10])
f = tf.map_fn(lambda y: y, x) # or perhaps something more useful than identity
EDIT
Now that I understand better, I think the problem is that you are trying to get the range while the graph is created, as opposed to when the graph is run.
Also, you need to use tf.range to query the shape at run time.
In [2]: import numpy as np
...: import tensorflow as tf
...: x = tf.placeholder(tf.float32, shape=[None, 10])
...: sess = tf.InteractiveSession()
...: sess.run(tf.range(tf.shape(x)[0]), {x: np.zeros((7,10))})
Out[2]: array([0, 1, 2, 3, 4, 5, 6])
There's a small trick you can use, if you are using tensorflow >= 1.13, although not very efficient, since it uses sorting.
x=tf.placeholder(dtype=tf.float32, shape=[None])
xargs=tf.argsort(x)
range=tf.sort(xargs)

How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?
I read the following related questions:
TensorFlow using LSTMs for generating text
TensorFlow LSTM Generative Model
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.
The following code works:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. What is going on here?
Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What to do if I want to use this trained network to use its own output as its input in the following time step?
If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:
Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
Pass the initial state into dynamic_rnn.
Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).
I.e., change the definition of the model to something like this:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.
You can use its own output (last state) as the next-step input (initial state).
One way to do this is to:
use zero-initialized variables as the input state at every time step
each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.
The second can be done by either:
fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.
I know I'm a bit late to the party but I think this gist could be useful:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.
Please ask any questions!
Edit:
In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

using MultiRNNCell in tensorflow 0.12

With Tensorflow 0.12, there have been changes to the way that MultiRNNCell works, for starters, state_is_tuple is now set to True by default, furthermore, there is this discussion on it:
state_is_tuple: If True, accepted and returned states are n-tuples, where n = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.
I'm wondering how exactly I could use a multi layer RNN with GRU cells, here is my code so far:
def _run_rnn(self, inputs):
# embedded inputs are passed in here
self.initial_state = tf.zeros([self._batch_size, self._hidden_size], tf.float32)
cell = tf.nn.rnn_cell.GRUCell(self._hidden_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=self._dropout_placeholder)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * self._num_layers, state_is_tuple=False)
outputs, last_state = tf.nn.dynamic_rnn(
cell = cell,
inputs = inputs,
sequence_length = self.sequence_length,
initial_state = self.initial_state
)
return outputs, last_state
My inputs look up word ids and return a corresponding embedding vectors. Now, running with the code above I'm greeted by the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 100 and 200 for 'rnn/while/Select_1' (op: 'Select') with input shapes: [?], [64,100], [64,200]
The places I've got a ? in is within my placeholders:
def _add_placeholders(self):
self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self._max_steps])
self.label_placeholder = tf.placeholder(tf.int32, shape=[None, self._max_steps])
self.sequence_length = tf.placeholder(tf.int32, shape=[None])
self._dropout_placeholder = tf.placeholder(tf.float32)
Your main issue is in the setting of the initial_state. Since your state is now a tuple, (more specifically an LSTMStateTuple, you cannot directly assign it to tf.zeros. Instead use,
self.initial_state = cell.zero_state(self._batch_size, tf.float32)
Have a look at the documentation for more.
To use this in code, you will need to pass this tensor in the feed_dict. Do something like this,
state = sess.run(model.initial_state)
for batch in batches:
# Logic to add input placeholder in `feed_dict`
feed_dict[model.initial_state] = state
# Note I'm re-using `state` below
(loss, state) = sess.run([model.loss, model.final_state], feed_dict=feed_dict)

Categories

Resources