Variable batch_size in call function

Variable batch_size in call function - python

I am trying to implement an attention network with TensorFlow 2. Thus, for every image, I want to take only some glimpses, i.e. a small part from the image. For this I have implemented a subclass from tensorflow.keras.models.Model, here is a snippet out of it.
class RecurrentAttentionModel(models.Model):
# ...
def call(self, inputs):
l = tf.random.uniform((40,2,), minval=0, maxval=1)
for _ in range(0, self.glimpses):
glimpse = tf.image.extract_glimpse(inputs, size=(self.retina_size, self.retina_size), offsets=l, centered=False, normalized=True)
# some other code...
# update l to take a glimpse somewhere else
return result
Now, the code above works and trains perfectly, but my issue is, that I have the hardcoded 40 in it, the batch_size which I have defined in my dataset. I am not able to read/get the batch_size in the call method since the variable "inputs" is of the form Tensor("input_1_77:0", shape=(None, 250, 500, 1), dtype=float32) where the None for the batch_size seems to be expected behavior.
When I just initialize l with the following code (without the batch_size)
l = tf.random.uniform((2,), minval=0, maxval=1)
it throws this error
ValueError: Shape must be rank 2 but is rank 1 for 'recurrent_attention_model_86/ExtractGlimpse' (op: 'ExtractGlimpse') with input shapes: [?,250,500,1], [2], [2]
what I totally understand but I have no idea how I could implement the initial values according to the batch_size.

You can extract the batch size dimension dynamically by using tf.shape.
l = tf.random.normal(tf.stack([tf.shape(inputs)[0], 2]), minval=0, maxval=1))

Related

Keras custom data generator giving dimension errors with multi input and multi output( functional api model)

I have written a generator function with Keras, before returning X,y from __getitem__ I have double check the shapes of the X's and Y's and they are alright, but generator is giving dimension mismatch array and warnings.
(Colab Code to reproduce: https://colab.research.google.com/drive/1bSJm44MMDCWDU8IrG2GXKBvXNHCuY70G?usp=sharing)
My training and validation generators are pretty much same as
class ValidGenerator(Sequence):
def __init__(self, df, batch_size=64):
self.batch_size = batch_size
self.df = df
self.indices = self.df.index.tolist()
self.num_classes = num_classes
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
return int(len(self.indices) // self.batch_size)
def __getitem__(self, index):
index = self.index[index * self.batch_size:(index + 1) * self.batch_size]
batch = [self.indices[k] for k in index]
X, y = self.__get_data(batch)
return X, y
def on_epoch_end(self):
self.index = np.arange(len(self.indices))
if self.shuffle == True:
np.random.shuffle(self.index)
def __get_data(self, batch):
#some logic is written here
#hat prepares 3 X features and 3 Y outputs
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
#print(len(X))
return X, y
I am return tupple of X,y from which has 3 input features and 3 output features each, so shape of X is (3,32,10,1)
I am using functional api to build model(I have things like concatenation, multi input/output, which isnt possible with sequential) with following structure
When I try to fit the model with generator with following code
train_datagen = TrainGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
valid_datagen = ValidGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
model.fit(train_datagen, epochs=2,verbose=1,callbacks=[checkpoint,es])
I get these warnings and errors, that dont go away
Epoch 1/2
WARNING:tensorflow:Model was constructed with shape (None, 10) for input >Tensor("input_1:0", shape=(None, 10), dtype=float32), but it was called >on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for input
Tensor("input_2:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for
input Tensor("input_3:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
...
...
call
return super(RNN, self).call(inputs, **kwargs)
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:975
call
input_spec.assert_input_compatibility(self.input_spec, inputs,
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/input_spec.py:176
assert_input_compatibility
raise ValueError('Input ' + str(input_index) + ' of layer ' +
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
I have rechecked whole code and it isnt possible to have input (None,None,None) like in warning or in error, my input dimension is (3,32,10,1)
Update
I have also tried to write a generator function with python and got exactly same error.
My generator function
def generate_arrays_from_file(batchsize,df):
#print(bat)
inputs = []
targets = []
batchcount = 0
while True:
df3 = df.loc[np.arange(batchcount*batchsize,(batchcount*batchsize)+batchsize)]
#Some pre processing
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
yield X,y
batchcount = batchcount +1
It seems like it is something wrong internally wit keras (may be due to the fact I am using functional API)
Update 2
I also tried to output tuple
X = (input1_X,input2_X,input3_X)
y = (output1_y,output2_y,output3_y)
and also named input/output, but it doesnt work
X = {"input_1": input1_X, "input_2": input2_X,"input_3": input3_X}
y = {"output_1": output1_y, "output_2": output2_y,"output_3": output3_y}
Note about problem formulation:
Changing the individual X features to shape (32,10) instead of (32,10,1) might help to get rid of this error but that is not what I want, it changes my problem(I no longer have 10 time steps with one feature each)

Keras use 'None' for dynamic dimensions.
As you can see on the model.summary() chart - the model expecting shape(None, 10) for all of your inputs, which is two dimensional. With batch dimension - you should feed three dimensional data to the model.
But you are feeding four dimensional data.
I would guess that your model doesn't split your input list by three inputs. Try to change your inputs to tuple:
X = (input_array_1,input_array_2,input_array_3)

In order to resolve this error:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
TrainGenerator should be changed in the following way.
Current code:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len,1)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len,1)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len,1)
Should be changed to:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len)
The reason is that each of the 3 Inputs expects a 2-dimensional array, but the generator provides a 3-dimensional one. The expected shape is (batch_size, 10).

I had a similar issue with a custom generator that just had to pass a numpy array of size 10 as input and one single output.
To solve this problem i had to trasform the shape of the 2 vectors passed to the neural network like this:
def slides_generator(integer_list):
# stuff happens
x = np_ts[np_index:np_index+10] # numpy array
y = np_ts[np_index+10] # numpy array
yield tf.convert_to_tensor(x)[np.newaxis, ...], tf.convert_to_tensor(y)[np.newaxis, ...]
doge_gen = slides_generator(integer_list) #next(doge_gen)
basically you need to pass the 2 arrays with shape (None,size),
so in my case were (None,10) and (None,1), and to achieve this i just passed 2 reshaped tensors.
you need the None dimension as the batch size.

Cannot take the length of Shape with unknown rank

I have a neural network, from a tf.data data generator and a tf.keras model, as follows (a simplified version-because it would be too long):
dataset = ...
A tf.data.Dataset object that with the next_x method calls the get_next for the x_train iterator and for the next_y method calls the get_next for the y_train iterator. Each label is a (1, 67) array in one-hot form.
Layers:
input_tensor = tf.keras.layers.Input(shape=(240, 240, 3)) # dim of x
output = tf.keras.layers.Flatten()(input_tensor)
output= tf.keras.Dense(67, activation='softmax')(output) # 67 is the number of classes
Model:
model = tf.keras.models.Model(inputs=input_tensor, outputs=prediction)
model.compile(optimizer=tf.train.AdamOptimizer(), loss=tf.losses.softmax_cross_entropy, metrics=['accuracy'])
model.fit_generator(gen(dataset.next_x(), dataset.next_y()), steps_per_epochs=100)
gen is defined like this:
def gen(x, y):
while True:
yield(x, y)
My problem is that when I try to run it, I get an error in the model.fit part:
ValueError: Cannot take the length of Shape with unknown rank.
Any ideas are appreciated!

Could you post a longer stack-trace? I think your problem might be related to this recent tensorflow issue:
https://github.com/tensorflow/tensorflow/issues/24520
There's also a simple PR that fixes it (not yet merged). Maybe try it out yourself?
EDIT
Here is the PR:
open tensorflow/python/keras/engine/training_utils.py
replace the following (line 232 at the moment):
if (x.shape is not None
and len(x.shape) == 1
with this:
if tensor_util.is_tensor(x):
x_shape_ndims = x.shape.ndims if x.shape is not None else None
else:
x_shape_ndims = len(x.shape)
if (x_shape_ndims == 1

I found out what was wrong. Actually I have to run next batch in a tf.Session before yielding it.
Here is how it works (I don't write the rest of the code, since it stays the same):
model.fit_generator(gen(), steps_per_epochs=100)
def gen():
with tf.Session() as sess:
next_x = dataset.next_x()
next_y = dataset.next_y()
while True:
x_batch = sess.run(next_x)
y_batch = sess.run(next_y)
yield x_batch, y_batch

For the issue Cannot take the length of Shape with unknown rank,
Thanks to above answer, I solved by add output_shape to from_generator according to this issue comment.
In my case, I was using Dataset.from_generator for dataset pipeline.
Before:
Dataset.from_generator(_generator_factory,
output_types=(tf.float32, tf.int8))
Working code for me:
Dataset.from_generator(_generator_factory,
output_types = (tf.float32, tf.int8),
output_shapes = (
tf.TensorShape([2, 224, 224, 3]),
tf.TensorShape([1,])
))
Also found this dataset official guide from tensorflow indicates that:
...
The output_shapes argument is not required but is highly recomended as many tensorflow operations do not support tensors with unknown rank. If the length of a particular axis is unknown or variable, set it as None in the output_shapes.
...

How does batching work in a seq2seq model in pytorch?

I am trying to implement a seq2seq model in Pytorch and I am having some problem with the batching.
For example I have a batch of data whose dimensions are
[batch_size, sequence_lengths, encoding_dimension]
where the sequence lengths are different for each example in the batch.
Now, I managed to do the encoding part by padding each element in the batch to the length of the longest sequence.
This way if I give as input to my net a batch with the same shape as said, I get the following outputs:
output, of shape [batch_size, sequence_lengths, hidden_layer_dimension]
hidden state, of shape [batch_size, hidden_layer_dimension]
cell state, of shape [batch_size, hidden_layer_dimension]
Now, from the output, I take for each sequence the last relevant element, that is the element along the sequence_lengths dimension corresponding to the last non padded element of the sequence. Thus the final output I get is of shape [batch_size, hidden_layer_dimension].
But now I have the problem of decoding it from this vector. How do I handle a decoding of sequences of different lengths in the same batch? I tried to google it and found this, but they don't seem to address the problem. I thought of doing element by element for the whole batch, but then I have the problem to pass the initial hidden states, given that the ones from the encoder will be of shape [batch_size, hidden_layer_dimension], while the ones from the decoder will be of shape [1, hidden_layer_dimension].
Am I missing something? Thanks for the help!

You are not missing anything. I can help you since I have worked on several sequence-to-sequence application using PyTorch. I am giving you a simple example below.
class Seq2Seq(nn.Module):
"""A Seq2seq network trained on predicting the next query."""
def __init__(self, dictionary, embedding_index, args):
super(Seq2Seq, self).__init__()
self.config = args
self.num_directions = 2 if self.config.bidirection else 1
self.embedding = EmbeddingLayer(len(dictionary), self.config)
self.embedding.init_embedding_weights(dictionary, embedding_index, self.config.emsize)
self.encoder = Encoder(self.config.emsize, self.config.nhid_enc, self.config.bidirection, self.config)
self.decoder = Decoder(self.config.emsize, self.config.nhid_enc * self.num_directions, len(dictionary),
self.config)
#staticmethod
def compute_decoding_loss(logits, target, seq_idx, length):
losses = -torch.gather(logits, dim=1, index=target.unsqueeze(1)).squeeze()
mask = helper.mask(length, seq_idx) # mask: batch x 1
losses = losses * mask.float()
num_non_zero_elem = torch.nonzero(mask.data).size()
if not num_non_zero_elem:
return losses.sum(), 0 if not num_non_zero_elem else losses.sum(), num_non_zero_elem[0]
def forward(self, q1_var, q1_len, q2_var, q2_len):
# encode the query
embedded_q1 = self.embedding(q1_var)
encoded_q1, hidden = self.encoder(embedded_q1, q1_len)
if self.config.bidirection:
if self.config.model == 'LSTM':
h_t, c_t = hidden[0][-2:], hidden[1][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2), torch.cat(
(c_t[0].unsqueeze(0), c_t[1].unsqueeze(0)), 2)
else:
h_t = hidden[0][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2)
else:
if self.config.model == 'LSTM':
decoder_hidden = hidden[0][-1], hidden[1][-1]
else:
decoder_hidden = hidden[-1]
decoding_loss, total_local_decoding_loss_element = 0, 0
for idx in range(q2_var.size(1) - 1):
input_variable = q2_var[:, idx]
embedded_decoder_input = self.embedding(input_variable).unsqueeze(1)
decoder_output, decoder_hidden = self.decoder(embedded_decoder_input, decoder_hidden)
local_loss, num_local_loss = self.compute_decoding_loss(decoder_output, q2_var[:, idx + 1], idx, q2_len)
decoding_loss += local_loss
total_local_decoding_loss_element += num_local_loss
if total_local_decoding_loss_element > 0:
decoding_loss = decoding_loss / total_local_decoding_loss_element
return decoding_loss
You can see the complete source code here. This application is about predicting users' next web-search query given the current web-search query.
The answerer to your question:
How do I handle a decoding of sequences of different lengths in the same batch?
You have padded sequences, so you can consider as all the sequences are of the same length. But when you are computing loss, you need to ignore loss for those padded terms using masking.
I have used a masking technique to achieve the same in the above example.
Also, you are absolutely correct on: you need to decode element by element for the mini-batches. The initial decoder state [batch_size, hidden_layer_dimension] is also fine. You just need to unsqueeze it at dimension 0, to make it [1, batch_size, hidden_layer_dimension].
Please note, you do not need to loop over each example in the batch, you can execute the whole batch at a time, but you need to loop over the elements of the sequences.

How to slice a tensor with None dimension in Tensorflow

I want to slice a tensor in "None" dimension.
For example,
tensor = tf.placeholder(tf.float32, shape=[None, None, 10], name="seq_holder")
sliced_tensor = tensor[:,1:,:] # it works well!
but
# Assume that tensor's shape will be [3,10, 10]
tensor = tf.placeholder(tf.float32, shape=[None, None, 10], name="seq_holder")
sliced_seq = tf.slice(tensor, [0,1,0],[3, 9, 10]) # it doens't work!
It is same that i get a message when i used another place_holder to feed size parameter for tf.slice().
The second methods gave me "Input size (depth of inputs) must be accessible via shape inference" error message.
I'd like to know what's different between two methods and what is more tensorflow-ish way.
[Edited]
Whole code is below
import tensorflow as tf
import numpy as np
print("Tensorflow for tests!")
vec_dim = 5
num_hidden = 10
# method 1
input_seq1 = np.random.random([3,7,vec_dim])
# method 2
input_seq2 = np.random.random([5,10,vec_dim])
shape_seq2 = [5,9,vec_dim]
# seq: [batch, seq_len]
seq = tf.placeholder(tf.float32, shape=[None, None, vec_dim], name="seq_holder")
# Method 1
sliced_seq = seq[:,1:,:]
# Method 2
seq_shape = tf.placeholder(tf.int32, shape=[3])
sliced_seq = tf.slice(seq,[0,0,0], seq_shape)
cell = tf.contrib.rnn.GRUCell(num_units=num_hidden)
init_state = cell.zero_state(tf.shape(seq)[0], tf.float32)
outputs, last_state = tf.nn.dynamic_rnn(cell, sliced_seq, initial_state=init_state)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# method 1
# states = sess.run([sliced_seq], feed_dict={seq:input_seq1})
# print(states[0].shape)
# method 2
states = sess.run([sliced_seq], feed_dict={seq:input_seq2, seq_shape:shape_seq2})
print(states[0].shape)

Your problem is exactly described by issue #4590
The problem is that tf.nn.dynamic_rnn needs to know the size of the last dimension in the input (the "depth"). Unfortunately, as the issue points out, currently tf.slice cannot infer any output size if any of the slice ranges are not fully known at graph construction time; therefore, sliced_seq ends up having a shape (?, ?, ?).
In your case, the first issue is that you are using a placeholder of three elements to determine the size of the slice; this is not the best approach, since the last dimension should never change (even if you later pass vec_dim, it could cause errors). The easiest solution would be to turn seq_shape into a placeholder of size 2 (or even two separate placeholders), and then do the slicing like:
sliced_seq = seq[:seq_shape[0], :seq_shape[1], :]
For some reason, the NumPy-style indexing seems to have better shape inference capabilities, and this will preserve the size of the last dimension in sliced_seq.

How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?
I read the following related questions:
TensorFlow using LSTMs for generating text
TensorFlow LSTM Generative Model
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.
The following code works:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. What is going on here?
Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What to do if I want to use this trained network to use its own output as its input in the following time step?

If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:
Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
Pass the initial state into dynamic_rnn.
Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).
I.e., change the definition of the model to something like this:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.

You can use its own output (last state) as the next-step input (initial state).
One way to do this is to:
use zero-initialized variables as the input state at every time step
each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.
The second can be done by either:
fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.

I know I'm a bit late to the party but I think this gist could be useful:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.
Please ask any questions!
Edit:
In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.