Is there a function for torch._C._nn.nll_loss that takes in a CPU input? I don't have enough GPU memory to run my function so I'm trying to run everything on CPU.
This is my specific error (look at the anaconda files)
Traceback (most recent call last):
File "plot_parametric_pytorch.py", line 395, in <module>
val_result = validate(val_loader, model, criterion, 0)
File "plot_parametric_pytorch.py", line 228, in validate
training=False, optimizer=None)
File "plot_parametric_pytorch.py", line 169, in forward
loss = criterion(output, target_var)
File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 932, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/functional.py", line 2317, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/functional.py", line 2115, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_nll_loss_forward
nll_loss works for both CPU and GPU, but the input and the target need to be on the same device. Yours are on different devices, where the first one (output) is on the CPU, but the second (target_var) is on the GPU.
You need to put target_var onto the CPU.
loss = criterion(output, target_var.cpu())
Related
I'm training a neural network with Keras, and trying to use RandomCrop layer. I'm using a dynamic sized dataset (varying resolution), but I've found it's not currently the cause of this issue.
When I run model.fit(), after a short while, I receive the above mentioned error INVALID_ARGUMENT: required broadcastable shapes. I am able to get a summary of my model, so it's not some mismatch there.
My model works fine when I remove this layer, but I need it to reduce the size of my inputs (hence using RandomCrop).
full traceback + tensorflow status
2022-03-23 13:27:28.772937: W tensorflow/core/framework/op_kernel.cc:1733] INVALID_ARGUMENT: required broadcastable shapes
Traceback (most recent call last):
File "c:\Users\samue\Desktop\rcrop\main.py", line 37, in <module>
conv_model.fit(
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'mean_squared_error/SquaredDifference' defined at (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 966, in _bootstrap
self._bootstrap_inner()
File "C:\Program Files\Python310\lib\threading.py", line 1009, in _bootstrap_inner
self.run()
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1000, in run_step
outputs = model.train_step(data)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 860, in train_step
loss = self.compute_loss(x, y, y_pred, sample_weight)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 918, in compute_loss
return self.compiled_loss(
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 245, in call
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 1329, in mean_squared_error
return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=-1)
Node: 'mean_squared_error/SquaredDifference'
Detected at node 'mean_squared_error/SquaredDifference' defined at (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 966, in _bootstrap
self._bootstrap_inner()
File "C:\Program Files\Python310\lib\threading.py", line 1009, in _bootstrap_inner
self.run()
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1000, in run_step
outputs = model.train_step(data)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 860, in train_step
loss = self.compute_loss(x, y, y_pred, sample_weight)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 918, in compute_loss
return self.compiled_loss(
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\engine\compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 245, in call
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "C:\Users\samue\AppData\Roaming\Python\Python310\site-packages\keras\losses.py", line 1329, in mean_squared_error
return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=-1)
Node: 'mean_squared_error/SquaredDifference'
2 root error(s) found.
(0) INVALID_ARGUMENT: required broadcastable shapes
[[{{node mean_squared_error/SquaredDifference}}]]
[[div_no_nan/ReadVariableOp/_84]]
(1) INVALID_ARGUMENT: required broadcastable shapes
[[{{node mean_squared_error/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1308]
How to reproduce
I've create a minimal reproducible example, with only two images with resolution of [10, 10] both saved as .png with rgb colorspace.
Running main.py loads these images and tries to start training (failing with an error).
When I exclude the RandomCrop layer, it works just fine.
folder structure
/main_folder
--main.py
--/data
--001.png
--002.png
main.py
import cv2, os
import keras
import tensorflow as tf
from keras import layers
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
input_layer = keras.Input(shape=(None, None, 3))
cropped = layers.RandomCrop(32, 32)(input_layer)
out = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(cropped)
conv_model = keras.Model(input_layer, out)
conv_model.compile(
optimizer='adam',
loss=tf.keras.losses.MeanSquaredError()
)
conv_model.summary()
path = "data"
data = [cv2.imread(os.path.join(path, f)) / 255 for f in os.listdir(os.path.join(path))]
def data_generator():
for i in range(len(data)):
yield data[i], data[i]
dataset = tf.data.Dataset.from_generator(
data_generator,
output_types=(tf.float32, tf.float32),
output_shapes=((None, None, 3), (None, None, 3))
).batch(1)
conv_model.fit(
dataset,
epochs=1,
validation_data=dataset
)
So, I wanted to use this for an autoencoder (in the example). That means, I'd have to have the same crop done on both the input and compare image. This doesn't sound like something the RandomCrop could do, but since I'm already using a custom generator, I can implement it right there:
def data_generator():
for i in range(len(data)):
# Custom function to determine the patch size
x, x1, y, y1 = randomly_choose(data[i].shape)
yield data[i][x: x1, y: y1], data[i][x: x1, y: y1]
This gives me full power over the generation process, allowing me to include image flipping, rotating and other alterations.
I have a problem with this GitHub project: https://github.com/researchmm/TTSR
If I use it on one GPU only everything runs smoothly. Once I turn on the second GPU and use torch.nn.DataParallel , this results in "Missing key(s) in state_dict":
[2021-08-03 09:01:00,829] - [trainer.py file line:70] - INFO: Current epoch learning rate: 1.000000e-04
Traceback (most recent call last):
File "/rwthfs/rz/cluster/home/ps815691/git/TTSR/main.py", line 53, in <module>
t.train(current_epoch=epoch, is_init=False)
File "/rwthfs/rz/cluster/home/ps815691/git/TTSR/trainer.py", line 126, in train
sr_lv1, sr_lv2, sr_lv3 = self.model(sr=sr)
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/rwthfs/rz/cluster/home/ps815691/git/TTSR/model/TTSR.py", line 32, in forward
self.LTE_copy.load_state_dict(self.LTE.state_dict())#, strict=False)
File "/home/ps815691/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LTE:
Missing key(s) in state_dict: "slice1.0.weight", "slice1.0.bias", "slice2.2.weight", "slice2.2.bias", "slice2.5.weight", "slice2.5.bias", "slice3.7.weight", "slice3.7.bias", "slice3.10.weight", "slice3.10.bias".
I printed the state_dicts for the "LTE" and "LTE_copy":
LTE GPU1 odict_keys([])
LTE GPU0 odict_keys(['sub_mean.weight', 'sub_mean.bias'])
LTE_Copy GPU1 odict_keys([])
LTE_Copy GPU0 odict_keys(['slice1.0.weight', 'slice1.0.bias', 'slice2.2.weight', 'slice2.2.bias', 'slice2.5.weight', 'slice2.5.bias', 'slice3.7.weight', 'slice3.7.bias', 'slice3.10.weight', 'slice3.10.bias', 'sub_mean.weight', 'sub_mean.bias'])
I do not get why that happens. Let me give you a quick introduction to the code:
The code starts in main.py. First, the model gets initialized from model/ttsr.py. This ttsr model is composed of several submodels. One of which is "LTE" & "LTE_copy". Then that model is put into nn.DataParallel and the trainer (trainer.py) is initialized with that model. t.train starts the training
_model = TTSR.TTSR(args).to(device)
_model = nn.DataParallel(_model, list(range(args.num_gpu)))
t = Trainer(args, _logger, _dataloader, _model, _loss_all)
t.train(current_epoch=epoch, is_init=True)
In the train function, after a batch has been fed through the model, the models output is fed back to the model, to get some parts of the loss function (trainer.py line 97). The model then executes this code in ttsr.py:
### used in transferal perceptual loss
self.LTE_copy.load_state_dict(self.LTE.state_dict())
sr_lv1, sr_lv2, sr_lv3 = self.LTE_copy((sr + 1.) / 2.)
return sr_lv1, sr_lv2, sr_lv3
Has anyone a clue why the error message above gets thrown out? It does not appear if I use load_state_dict(...,strict=False), but doesn't this just ignore the underlying problem? There does not seem to be any LTE.state_dict on GPU1's memory for example.
I am executing the head2head model presented in the Github repo here.
When I am running the code using the following command:
./scripts/train/train_on_target.sh Obama head2headDataset
with contents of the train_on_target.sh file as:
target_name=$1
dataset_name=$2
python train.py --checkpoints_dir checkpoints/$dataset_name \
--target_name $target_name \
--name head2head_$target_name \
--dataroot datasets/$dataset_name/dataset \
--serial_batches
Then I am getting the following error:
Traceback (most recent call last):
File "train.py", line 108, in <module>
flow_ref, conf_ref, t_scales, n_frames_D)
File "/home/nitin/head2head/util/util.py", line 48, in get_skipped_flows
flow_ref_skipped[s], conf_ref_skipped[s] = flowNet(real_B[s][:,1:], real_B[s][:,:-1])
File "/home/nitin/anaconda3/envs/head2head/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nitin/anaconda3/envs/head2head/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/nitin/anaconda3/envs/head2head/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nitin/head2head/models/flownet.py", line 38, in forward
flow, conf = self.compute_flow_and_conf(input_A, input_B)
File "/home/nitin/head2head/models/flownet.py", line 55, in compute_flow_and_conf
flow1 = self.flowNet(data1)
File "/home/nitin/anaconda3/envs/head2head/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nitin/head2head/models/flownet2_pytorch/models.py", line 156, in forward
flownetfusion_flow = self.flownetfusion(concat3)
File "/home/nitin/anaconda3/envs/head2head/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nitin/head2head/models/flownet2_pytorch/networks/FlowNetFusion.py", line 62, in forward
concat0 = torch.cat((out_conv0,out_deconv0,flow1_up),1)
RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 5.80 GiB total capacity; 4.77 GiB already allocated; 73.56 MiB free; 4.88 GiB reserved in total by PyTorch)
I have checked the batch size in the file options/base_options.py. It is already set to 1. How can I solve the above mentioned exception. My system has 6 GB NVIDIA GTX 1660 Super GPU.
Data management:
You can try reducing the dataset used for training to check if is a hardware limitation.
Moreover, if it is an image dataset, you can reduce the dimensions of the images by reducing the dpi.
Model parameters management:
Another approach is to reduce the number of parameters of your model. The first suggestion would be to change the Dense layer size and then the other neural network hyperparameters.
While moving a model to eager execution, I encountered an error using gradient_tape for back propagation. While as far as I can tell all operations are taking place on the GPU, during back prop I get the following error:
File "tf_registration_continuous.py", line 128, in single_registration_step
elastic_grads = tape.gradient(loss_value, elastic_variable_list)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/eager/backprop.py", line 767, in gradient
output_gradients=output_gradients)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/eager/imperative_grad.py", line 63, in imperative_grad
tape._tape, vspace, target, sources, output_gradients) # pylint: disable=protected-access
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/eager/backprop.py", line 147, in grad_fn
op_inputs, op_outputs, orig_outputs)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/eager/backprop.py", line 115, in _magic_gradient_function
return grad_fn(mock_op, *out_grads)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 427, in _GatherV2Grad
params_shape = math_ops.to_int32(params_shape)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 875, in to_int32
return cast(x, dtypes.int32, name=name)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 787, in cast
x = gen_math_ops.cast(x, base_type, name=name)
File "/share/software/user/open/py-tensorflow/1.8.0_py27/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1548, in cast
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "/share/software/user/open/py-scipystack/1.0_py27/lib/python2.7/site-packages/six.py", line 718, in raise_from
raise value
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensors on conflicting devices: cannot compute Cast as input #0 was expected to be on /job:localhost/replica:0/task:0/device:GPU:0 but is actually on /job:localhost/replica:0/task:0/device:CPU:0 (operation running on /job:localhost/replica:0/task:0/device:GPU:0) Tensors can be copied explicitly using .gpu() or .cpu() methods, or transparently copied by using tf.enable_eager_execution(device_policy=tfe.DEVICE_PLACEMENT_SILENT). Copying tensors between devices may slow down your model [Op:Cast] name: ToInt32/
I'm using a cnn + lstm + ctc network (based on https://arxiv.org/pdf/1507.05717.pdf) to do a Chinese scene text recognition. For a large number of classes (3500+), the network is very hard to train. I heard that using Group LSTM (https://arxiv.org/abs/1703.10722, O. Kuchaiev and B. Ginsburg "Factorization Tricks for LSTM Networks", ICLR 2017 workshop.) can reduce the number of parameters and accelerate the training, so I've tried to use it in my code.
I use a two-layers bidirectional lstm. This is the original code that using tf.contrib.rnn.LSTMCell
rnn_outputs, _, _ =
tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
[tf.contrib.rnn.LSTMCell(num_units=self.num_hidden, state_is_tuple=True) for _ in range(self.num_layers)],
[tf.contrib.rnn.LSTMCell(num_units=self.num_hidden, state_is_tuple=True) for _ in range(self.num_layers)],
self.rnn_inputs, dtype=tf.float32, sequence_length=self.rnn_seq_len, scope='BDDLSTM')
The training is very slow. After 100 hrs, the prediction acc on the test set is still 39%.
Now I want to use tf.contrib.rnn.GLSTMCell. When I replace the LSTMCell with this GLSTMCell like
rnn_outputs, _, _ = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
[tf.contrib.rnn.GLSTMCell(num_units=self.num_hidden, num_proj=self.num_proj, number_of_groups=4) for _ in range(self.num_layers)],
[tf.contrib.rnn.GLSTMCell(num_units=self.num_hidden, num_proj=self.num_proj, number_of_groups=4) for _ in range(self.num_layers)],
self.rnn_inputs, dtype=tf.float32, sequence_length=self.rnn_seq_len, scope='BDDLSTM')
I get the following error
/home/frisasz/miniconda2/envs/dl/bin/python "/media/frisasz/DATA/FSZ_Work/deep learning/IDOCR_/work/train.py"
Traceback (most recent call last):
File "/media/frisasz/DATA/FSZ_Work/deep learning/IDOCR_/work/train.py", line 171, in <module>
train(train_dir='/media/frisasz/Windows/40T/', val_dir='../../0000/40V/')
File "/media/frisasz/DATA/FSZ_Work/deep learning/IDOCR_/work/train.py", line 41, in train
FLAGS.momentum)
File "/media/frisasz/DATA/FSZ_Work/deep learning/IDOCR_/work/model.py", line 61, in __init__
self.logits = self.rnn_net()
File "/media/frisasz/DATA/FSZ_Work/deep learning/IDOCR_/work/model.py", line 278, in rnn_net
self.rnn_inputs, dtype=tf.float32, sequence_length=self.rnn_seq_len, scope='BDDLSTM')
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/rnn.py", line 220, in stack_bidirectional_dynamic_rnn
dtype=dtype)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 375, in bidirectional_dynamic_rnn
time_major=time_major, scope=fw_scope)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn
dtype=dtype)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 737, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2549, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 720, in _time_step
skip_conditionals=True)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 206, in _rnn_step
new_output, new_state = call_cell()
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 708, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 180, in __call__
return super(RNNCell, self).__call__(inputs, state)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 441, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/rnn_cell.py", line 2054, in call
R_k = _linear(x_g_id, 4 * self._group_shape[1], bias=False)
File "/home/frisasz/miniconda2/envs/dl/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1005, in _linear
"but saw %s" % (shape, shape[1]))
ValueError: linear expects shape[1] to be provided for shape (?, ?), but saw ?
Process finished with exit code 1
I'm not sure if the GLSTMCell can simply replace the LSTMCell in tf.contrib.rnn.stack_bidirectional_dynamic_rnn() (or other functions that help to build the rnn). I didn't find any examples of the use of GLSTMCell. Anybody know the right way to build a bidirectional rnn with GLSTMCell?
I got the exact same error trying to build bidirectional GLSTM using bidirectional_dynamic_rnn.
In my case, the problem came from the fact that GLSTM can only be used when defined in a static way : when the graph is computed you can't have undefined shape parameters (such as batch_size for instance).
So, try to define in the graph all the shapes that will end up at some point in the GLSTM cell and it should work fine.