Issue with pytorch LSTM source code - python

I am using bidirectional LSTM with batch_first=True. However, it is throwing me an error regarding dimensions.
**Error:
Expected hidden[0] size (6, 5, 40), got (5, 6, 40)**
When I checked the source code, the error occurred due to below function
if is_input_packed:
mini_batch = int(batch_sizes[0])
else:
mini_batch = input.size(0) if self.batch_first else input.size(1)
num_directions = 2 if self.bidirectional else 1
expected_hidden_size = (self.num_layers * num_directions,
mini_batch, self.hidden_size)
def check_hidden_size(hx, expected_hidden_size, msg='Expected hidden size {}, got {}'):
if tuple(hx.size()) != expected_hidden_size:
raise RuntimeError(msg.format(expected_hidden_size, tuple(hx.size())))
By default expected_hidden_size is written with respect to sequence first. I believe it is causing the problem. Can someone advise if I am right and the issue needs to be fixed?

Related

Can't fix torch autograd runtime error: UNet inplace operation

I can't fix the runtime error "one of the variables needed for gradient computation has been modified by an inplace operation.
I know, that if I comment out loss.backward() the code will run, but I don't get in which order should I call the functions to avoid this error
When I call it my wrapper with Resnet50 I don't experience any problems, but with Unet the RuntimeError occurs
for i, (x, y) in batch_iter:
with torch.autograd.set_detect_anomaly(True):
input, target = x.to(self.device), y.to(self.device)
self.optimizer.zero_grad()
if self.box_training:
out = self.model(input)
else:
out = self.model(input).clamp(0,1)
loss = self.criterion(out, target)
loss_value = loss.item()
train_losses.append(loss_value)
loss.backward()
self.optimizer.step()
batch_iter.set_description(f'Training: (loss {loss_value:.4f})')
self.training_loss.append(np.mean(train_losses))
self.learning_rate.append(self.optimizer.param_groups[0]['lr'])
As the comments pointed out, I should provide a model
And by looking at it, I actually found what was the problem:
model = UNet(in_channels=1,
num_encoding_blocks = 6,
out_classes = 1,
padding=1,
dimensions = 2,
out_channels_first_layer = 32,
normalization = None,
pooling_type = 'max',
upsampling_type = 'conv',
preactivation = False,
#residual = True,
padding_mode = 'zeros',
activation = 'ReLU',
initial_dilation = None,
dropout = 0,
monte_carlo_dropout = 0
)
It is residual = True which I has commented out. I will look into the docs, what is going on. Maybe if you have an idea, you can enlighten me
Explanation:
It looks like the UNet library you are using includes a += (in-place tensor addition) in the residual branch of the encoder:
if self.residual:
connection = self.conv_residual(x)
x = self.conv1(x)
x = self.conv2(x)
x += connection # <------- !!!
In-place operations like += may overwrite information that is needed for gradient computation during loss.backward(). PyTorch detects when this necessary information has been overwritten, and complains.
Fix:
If you want to train this network with residual enabled, you would need to replace this += with a not-in-place add:
if self.residual:
connection = self.conv_residual(x)
x = self.conv1(x)
x = self.conv2(x)
x = x + connection # <-------
A similar edit is needed in the decoder. If you installed this unet library via pip, you would want to download it directly from github instead so you can make these edits (and uninstall the pip version to avoid confusion).
For more information about why in-place operations can cause problems, see this blog post or this section of the PyTorch docs.

Tensorflow ValueError for ReinforceAgent

I see that Tensorflow support is pretty slim but I'll try anyway …
When running my agent:
optimizer = tf.keras.optimizers.Adam()
train_step_counter = tf.Variable(0)
tf_agent = reinforce_agent.ReinforceAgent(
train_py_env.time_step_spec(),
train_py_env.action_spec(),
actor_network=actor_net,
optimizer=optimizer,
normalize_returns=True,
train_step_counter=train_step_counter)
I get a ValueError from _make_gin_wrapper (Line 1605). The error text is:
Exception encountered when calling layer \"lambda_12\" (type Lambda).\n\nShapes (1, 9) and (9, 9) are incompatible\n\nCall arguments received by layer \"lambda_12\" (type Lambda):\n • inputs=tf.Tensor(shape=(1, 9), dtype=int32)\n • mask=None\n • training=None\n In call to configurable 'ReinforceAgent' (<class 'tf_agents.agents.reinforce.reinforce_agent.ReinforceAgent'>)"
So apparently some incompatibility with (1,9) and (9,9) shape. The environment is taken from https://towardsdatascience.com/creating-a-custom-environment-for-tensorflow-agent-tic-tac-toe-example-b66902f73059. It is TicTacToe on a [0,0,0,0,0,0,0,0,0]board, which has (9,)-shape, so I can see why there are 9's but I don't know which objects have (1,9) and (9,9) shapes or what I could do to get the agent running.

Variable batch_size in call function

I am trying to implement an attention network with TensorFlow 2. Thus, for every image, I want to take only some glimpses, i.e. a small part from the image. For this I have implemented a subclass from tensorflow.keras.models.Model, here is a snippet out of it.
class RecurrentAttentionModel(models.Model):
# ...
def call(self, inputs):
l = tf.random.uniform((40,2,), minval=0, maxval=1)
for _ in range(0, self.glimpses):
glimpse = tf.image.extract_glimpse(inputs, size=(self.retina_size, self.retina_size), offsets=l, centered=False, normalized=True)
# some other code...
# update l to take a glimpse somewhere else
return result
Now, the code above works and trains perfectly, but my issue is, that I have the hardcoded 40 in it, the batch_size which I have defined in my dataset. I am not able to read/get the batch_size in the call method since the variable "inputs" is of the form Tensor("input_1_77:0", shape=(None, 250, 500, 1), dtype=float32) where the None for the batch_size seems to be expected behavior.
When I just initialize l with the following code (without the batch_size)
l = tf.random.uniform((2,), minval=0, maxval=1)
it throws this error
ValueError: Shape must be rank 2 but is rank 1 for 'recurrent_attention_model_86/ExtractGlimpse' (op: 'ExtractGlimpse') with input shapes: [?,250,500,1], [2], [2]
what I totally understand but I have no idea how I could implement the initial values according to the batch_size.
You can extract the batch size dimension dynamically by using tf.shape.
l = tf.random.normal(tf.stack([tf.shape(inputs)[0], 2]), minval=0, maxval=1))

mxnet: how to debug models with mismatched shapes

I am trying to modify a model I found online (https://github.com/apache/incubator-mxnet/tree/master/example/multivariate_time_series) as I work to get to know mxnet. I am trying to build a model that takes both a CNN and RNN network in parallel and then uses the outputs of both to forecast a time series. However, I am running into this error
RuntimeError: simple_bind error. Arguments: data: (128, 96, 20)
softmax_label: (128, 20) Error in operator concat1: [15:44:09]
src/operator/nn/concat.cc:66: Check failed:
shape_assign(&(*in_shape)[i], dshape) Incompatible input shape:
expected [128,0], got [128,96,300]
This is the code, as I have tried to modify it:
def rnn_cnn_model(iter_train, q, filter_list, num_filter, dropout, seasonal_period, time_interval):
# Choose cells for recurrent layers: each cell will take the output of the previous cell in the list
rcells = [mx.rnn.GRUCell(num_hidden=args.recurrent_state_size)]
skiprcells = [mx.rnn.LSTMCell(num_hidden=args.recurrent_state_size)]
input_feature_shape = iter_train.provide_data[0][1]
X = mx.symbol.Variable(iter_train.provide_data[0].name)
Y = mx.sym.Variable(iter_train.provide_label[0].name)
# reshape data before applying convolutional layer (takes 4D shape incase you ever work with images)
rnn_input = mx.sym.reshape(data=X, shape=(0, q, -1))
###############
# RNN Component
###############
stacked_rnn_cells = mx.rnn.SequentialRNNCell()
for i, recurrent_cell in enumerate(rcells):
stacked_rnn_cells.add(recurrent_cell)
stacked_rnn_cells.add(mx.rnn.DropoutCell(dropout))
outputs, states = stacked_rnn_cells.unroll(length=q, inputs=rnn_input, merge_outputs=False)
rnn_features = outputs[-1] #only take value from final unrolled cell for use later
input_feature_shape = iter_train.provide_data[0][1]
X = mx.symbol.Variable(iter_train.provide_data[0].name)
Y = mx.sym.Variable(iter_train.provide_label[0].name)
# reshape data before applying convolutional layer (takes 4D shape incase you ever work with images)
conv_input = mx.sym.reshape(data=X, shape=(0, 1, q, -1))
###############
# CNN Component
###############
outputs = []
for i, filter_size in enumerate(filter_list):
# pad input array to ensure number output rows = number input rows after applying kernel
padi = mx.sym.pad(data=conv_input, mode="constant", constant_value=0,
pad_width=(0, 0, 0, 0, filter_size - 1, 0, 0, 0))
convi = mx.sym.Convolution(data=padi, kernel=(filter_size, input_feature_shape[2]), num_filter=num_filter)
acti = mx.sym.Activation(data=convi, act_type='relu')
trans = mx.sym.reshape(mx.sym.transpose(data=acti, axes=(0, 2, 1, 3)), shape=(0, 0, 0))
outputs.append(trans)
cnn_features = mx.sym.Concat(*outputs, dim=2)
cnn_reg_features = mx.sym.Dropout(cnn_features, p=dropout)
c_features = mx.sym.reshape(data = cnn_reg_features, shape = (-1))
print(type(c_features))
######################
# Prediction Component
######################
print(rnn_features.infer_shape())
neural_components = mx.sym.concat(*[rnn_features, c_features], dim=1)
neural_output = mx.sym.FullyConnected(data=neural_components, num_hidden=input_feature_shape[2])
model_output = neural_output
loss_grad = mx.sym.LinearRegressionOutput(data=model_output, label=Y)
return loss_grad, [v.name for v in iter_train.provide_data], [v.name for v in iter_train.provide_label]
and I believe the crash is happening on this line of code
neural_components = mx.sym.concat(*[rnn_features, c_features], dim=1)
Here is what I have tried in an effort to get my dimensions to match up:
c_features = mx.sym.reshape(data = cnn_reg_features, shape = (-1))
c_features = cnn_reg_features[-1]
c_features = cnn_reg_features[:, -1, :]
I also tried to look at the git issues and Google around, but all I see is advice to use infer_shape. I tried applying this to c_features, but the output was not clear to me
data: ()
gru_i2h_weight: ()
gru_i2h_bias: ()
Basically, I would like to know at each stage as this graph is built what the shape of the symbol is. I am used to this capability in Tensorflow, which makes it easier to build and debug graphs when one has gone astray in doing an incorrect reshape, or simply for getting the sense of how a model works by looking at its dimension. Is there no equivalent opportunity in mxnet?
Given that the data_iter is fed in when producing these symbols I would think the inferred shape should be available. Ultimately my questions are (1) how can I see that shape of a symbol when it uses the data in the iterator and should know all shapes? (2) general guidelines on debugging in this sort of situation?
Thank you.

Cannot take the length of Shape with unknown rank

I have a neural network, from a tf.data data generator and a tf.keras model, as follows (a simplified version-because it would be too long):
dataset = ...
A tf.data.Dataset object that with the next_x method calls the get_next for the x_train iterator and for the next_y method calls the get_next for the y_train iterator. Each label is a (1, 67) array in one-hot form.
Layers:
input_tensor = tf.keras.layers.Input(shape=(240, 240, 3)) # dim of x
output = tf.keras.layers.Flatten()(input_tensor)
output= tf.keras.Dense(67, activation='softmax')(output) # 67 is the number of classes
Model:
model = tf.keras.models.Model(inputs=input_tensor, outputs=prediction)
model.compile(optimizer=tf.train.AdamOptimizer(), loss=tf.losses.softmax_cross_entropy, metrics=['accuracy'])
model.fit_generator(gen(dataset.next_x(), dataset.next_y()), steps_per_epochs=100)
gen is defined like this:
def gen(x, y):
while True:
yield(x, y)
My problem is that when I try to run it, I get an error in the model.fit part:
ValueError: Cannot take the length of Shape with unknown rank.
Any ideas are appreciated!
Could you post a longer stack-trace? I think your problem might be related to this recent tensorflow issue:
https://github.com/tensorflow/tensorflow/issues/24520
There's also a simple PR that fixes it (not yet merged). Maybe try it out yourself?
EDIT
Here is the PR:
open tensorflow/python/keras/engine/training_utils.py
replace the following (line 232 at the moment):
if (x.shape is not None
and len(x.shape) == 1
with this:
if tensor_util.is_tensor(x):
x_shape_ndims = x.shape.ndims if x.shape is not None else None
else:
x_shape_ndims = len(x.shape)
if (x_shape_ndims == 1
I found out what was wrong. Actually I have to run next batch in a tf.Session before yielding it.
Here is how it works (I don't write the rest of the code, since it stays the same):
model.fit_generator(gen(), steps_per_epochs=100)
def gen():
with tf.Session() as sess:
next_x = dataset.next_x()
next_y = dataset.next_y()
while True:
x_batch = sess.run(next_x)
y_batch = sess.run(next_y)
yield x_batch, y_batch
For the issue Cannot take the length of Shape with unknown rank,
Thanks to above answer, I solved by add output_shape to from_generator according to this issue comment.
In my case, I was using Dataset.from_generator for dataset pipeline.
Before:
Dataset.from_generator(_generator_factory,
output_types=(tf.float32, tf.int8))
Working code for me:
Dataset.from_generator(_generator_factory,
output_types = (tf.float32, tf.int8),
output_shapes = (
tf.TensorShape([2, 224, 224, 3]),
tf.TensorShape([1,])
))
Also found this dataset official guide from tensorflow indicates that:
...
The output_shapes argument is not required but is highly recomended as many tensorflow operations do not support tensors with unknown rank. If the length of a particular axis is unknown or variable, set it as None in the output_shapes.
...

Categories

Resources