Error when building seq2seq model with tensorflow

Error when building seq2seq model with tensorflow - python

I'm trying to understand the seq2seq models defined in seq2seq.py in tensorflow. I use bits of code I copy from the translate.py example that comes with tensorflow. I keep getting the same error and really do not understand where it comes from.
A minimal code example to reproduce the error:
import tensorflow as tf
from tensorflow.models.rnn import rnn_cell
from tensorflow.models.rnn import seq2seq
encoder_inputs = []
decoder_inputs = []
for i in xrange(350):
encoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
name="encoder{0}".format(i)))
for i in xrange(45):
decoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
name="decoder{0}".format(i)))
model = seq2seq.basic_rnn_seq2seq(encoder_inputs,
decoder_inputs,rnn_cell.BasicLSTMCell(512))
The error I get when evaluating the last line (I evaluated it interactively in the python interpreter):
>>> Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/py1053173el", line 12, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/seq2seq.py", line 82, in basic_rnn_seq2seq
_, enc_states = rnn.rnn(cell, encoder_inputs, dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/rnn.py", line 85, in rnn
output_state = cell(input_, state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/rnn_cell.py", line 161, in __call__
concat = linear.linear([inputs, h], 4 * self._num_units, True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/models/rnn/linear.py", line 32, in linear
raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
ValueError: Linear is expecting 2D arguments: [[None], [None, 512]]
I suspect the error comes from my side :)
On a sidenote. The documentation and the tutorials are really great but the example code for the sequence to sequence model (the english to french translation example) is quite dense. You also have to jump a lot between files to understand what's going on. Me at least got lost several times in the code.
A minimal example (perhaps on some toy data) of constructing and training a basic seq2seq model would really be helpful here. Somebody know if this already exist somewhere?
EDIT
I have fixed the code above according #Ishamael suggestions (meaning, no errors returns) (see below), but there are still some things not clear in this fixed version. My input is a sequence of vectors of length 2 of real valued values. And my output is a sequence of binary vectors of length 22. Should my tf.placeholder code not look like the following? (EDIT yes)
tf.placeholder(tf.float32, shape=[None,2],name="encoder{0}".format(i))
tf.placeholder(tf.float32, shape=[None,22],name="encoder{0}".format(i))
I also had to change tf.int32 to tf.float32 above. Since my output is binary. Should this not be tf.int32 for the tf.placeholder of my decoder? But tensorflow complains again if I do this. I'm not sure what the reasoning is behind this.
The size of my hidden layer is 512 here.
the complete fixed code
import tensorflow as tf
from tensorflow.models.rnn import rnn_cell
from tensorflow.models.rnn import seq2seq
encoder_inputs = []
decoder_inputs = []
for i in xrange(350):
encoder_inputs.append(tf.placeholder(tf.float32, shape=[None,512],
name="encoder{0}".format(i)))
for i in xrange(45):
decoder_inputs.append(tf.placeholder(tf.float32, shape=[None,512],
name="decoder{0}".format(i)))
model = seq2seq.basic_rnn_seq2seq(encoder_inputs,
decoder_inputs,rnn_cell.BasicLSTMCell(512))

Most of the models (seq2seq is not an exception) expect their input to be in batches, so if the shape of your logical input is [n], then a shape of a tensor you will be using as an input to your model should be [batch_size x n]. In practice the first dimension of the shape is usually left out as None and inferred to be the batch size at runtime.
Since the logical input to seq2seq is a vector of numbers, the actual tensor shape should be [None, input_sequence_length]. So fixed code would look along the lines of:
input_sequence_length = 2; # the length of one vector in your input sequence
for i in xrange(350):
encoder_inputs.append(tf.placeholder(tf.int32, shape=[None, input_sequence_length],
name="encoder{0}".format(i)))
(and then the same for the decoder)

There is a self-test method in the translate module that shows its minimal usage.[here]
I just ran the self-test method using.
python translate.py --self_test 1

Related

Keras Model Multi Input - TypeError: ('Keyword argument not understood:', 'input')

I am trying to build a CNN that receives multiple inputs and I am trying the following:
input = keras.Input()
classifier = keras.Model(inputs=input,output=classifier)
When run the code I am receiving the following error for line 6 though:
TypeError: ('Keyword argument not understood:', 'input').
A hint would be much appreciated, thank you!

Some parameters of your code are not specified. I have copied your example with some numbers that you can change back.
import keras
input_dim_1 = 10
input1 = keras.layers.Input(shape=(input_dim_1,1))
cnn_classifier_1 = keras.layers.Conv1D(64, 5, activation='sigmoid')(input1)
cnn_classifier_1 = keras.layers.Dropout(0.5)(cnn_classifier_1)
cnn_classifier_1 = keras.layers.Conv1D(48, 5, activation='sigmoid')(cnn_classifier_1)
cnn_classifier_1 = keras.models.Model(inputs=input1,outputs=cnn_classifier_1)
Some things to note
The imports of your layers were not right. You need to import the layers/models you want from the right places. You can check my code against yours to see this.
With the functional API of keras you do not need to specify the input shape as you have done in the first Conv1D layer. This is handled automatically.
You need to correctly specify the keywords in Model. Specifically inputs and outputs. Different versions of keras use input / output or inputs/outputs as keywords for the call of the class Model.

Hey, its simple, use following code:
classifier = keras.Model(input, classifier)
instead of calling
classifier = keras.Model(inputs = input, output = classifier)
Issue seems to come from latest versions of keras implementation.

tflite quantization how to change the input dtype

see possible solution at the end of the post
I am trying to fully quantize the keras-vggface model from rcmalli to run on an NPU. The model is a Keras model (not tf.keras).
When using TF 1.15 for quantization with:
print(tf.version.VERSION)
num_calibration_steps=5
converter = tf.lite.TFLiteConverter.from_keras_model_file('path_to_model.h5')
#converter.post_training_quantize = True # This only makes the weight in8 but does not initialize model quantization
def representative_dataset_gen():
for _ in range(num_calibration_steps):
pfad='path_to_image(s)'
img=cv2.imread(pfad)
# Get sample input data as a numpy array in a method of your choosing.
yield [img]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
open("quantized_model", "wb").write(tflite_quant_model)
The model is converted but as I need full int8 quantization, I add:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
This error message appears:
ValueError: Cannot set tensor: Got value of type UINT8 but expected type FLOAT32 for input 0, name: input_1
clearly, the input of the model still requires float32.
Questions:
Do I have to adapt the quantization method that the input dtype is changed ? or
Do I have to change the input layer of the model to dtype int8 beforehand?
Or is that actually reporting that the model is not actually quantized?
If 1 or 2 is the answer, would you also have a best practice tip for me?
Addition:
Using :
h5_path = 'my_model.h5'
model = keras.models.load_model(h5_path)
model.save(os.getcwd() +'/modelTF2')
to save the h5 as pb with TF 2.2 and then using converter=tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
as TF 2.x tflite takes floats, and convert them to uint8s internally . I thought that could be a solution. Unfortunately, this error message appears:
tf.lite.TFLiteConverter.from_keras_model giving 'str' object has no attribute 'call'
Apparently TF2.x cannot handle pure keras models.
using tf.compat.v1.lite.TFLiteConverter.from_keras_model_file() to solve this error just repeats the error from above, as we are back again at "TF 1.15" level.
Addition 2
Another solution is to transfer the keras model to tf.keras manually. I will look into that if there is no other solution.
Regarding the comment of Meghna Natraj
To recreate the model (using TF 1.13.x) just:
pip install git+https://github.com/rcmalli/keras-vggface.git
and
from keras_vggface.vggface import VGGFace
pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max
pretrained_model.summary()
pretrained_model.save("my_model.h5") #using h5 extension
The input layer is connected. Too bad, that looked like a good/easy fix.
Possible Solution
It seems to work using TF 1.15.3 I used 1.15.0 beforehand. I will check if I did something else different by accident.

A possible reason why this fails is that the model has input tensors that are not connected to the output tensor, i.,e they are probably unused.
Here is a colab notebook where I've reproduced this error. Modify the io_type at the beginning of the notebook to tf.uint8 to see an error similar to one you got.
SOLUTION
You need to manually inspect the model and to see if there are any inputs that are dangling/lost/not connected to the output and remove them.
Post a link to the model and I can try to debug it as well.

Incorrect results obtained on running a model in LibTorch that was trained and exported from PyTorch

I am trying to export a trained model along with weights for inference in C++ using LibTorch. However, the output tensor results do not match.
The shape of the output tensor is the same.
model = FCN()
state_dict = torch.load('/content/gdrive/My Drive/model/trained_model.pth')
model.load_state_dict(state_dict)
example = torch.randn(1, 3, 768, 1024)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save('/content/gdrive/My Drive/model/mymodel.pt')
However some warnings are generated which I think maybe causing the incorrect results to be generated.
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:137:
TracerWarning: Converting a tensor to a Python index might cause the
trace to be incorrect. We can't record the data flow of Python values,
so this value will be treated as a constant in the future. This means
that the trace might not generalize to other inputs!
/usr/local/lib/python3.6/dist-packages/torch/tensor.py:435:
RuntimeWarning: Iterating over a tensor might cause the trace to be
incorrect. Passing a tensor of different shape won't change the number
of iterations executed (and might lead to errors or silently give
incorrect results).'incorrect results).', category=RuntimeWarning)
Following is the LibTorch code to generate the output tensor
at::Tensor predict(std::shared_ptr<torch::jit::script::Module> model, at::Tensor &image_tensor) {
std::vector<torch::jit::IValue> inputs;
inputs.push_back(image_tensor);
at::Tensor result = model->forward(inputs).toTensor();
return result;
}
Has anyone tried using a trained PyTorch model in LibTorch?

Just ran into the same issue, and found a solution:
add
model.eval()
before
traced_script_module = torch.jit.trace(model, example)
and the model gives the same result in c++ as in python

How to implement a stacked RNNs in Tensorflow?

I want to implement an RNN using Tensorflow1.13 on GPU. Following the official recommendation, I write the following code to get a stack of RNN cells
lstm = [tk.layers.CuDNNLSTM(128) for _ in range(2)]
cells = tk.layers.StackedRNNCells(lstm)
However, I receive an error message:
ValueError: ('All cells must have a state_size attribute. received cells:', [< tensorflow.python.keras.layers.cudnn_recurrent.CuDNNLSTM object at 0x13aa1c940>])
How can I correct it?

This may be a Tensorflow bug and I would suggest creating an issue on Github. However, if you want to by pass the bug, you can use:
import tensorflow as tf
import tensorflow.keras as tk
lstm = [tk.layers.CuDNNLSTM(128) for _ in range(2)]
stacked_cells = tf.nn.rnn_cell.MultiRNNCell(lstm)
This will work but it will give a deprecation warning that you can suppress.

Thanks #qlzh727. Here, I quote the response:
Either StackedRNNCells or StackedRNNCells only works with Cell, not layer. The difference between the cell and layer in RNN is that cell will only process one time step within the whole sequence, whereas the layer will process the whole sequence. You can treat RNN layer as:
for t in whole_time_steps:
output_t, state_t = cell(input_t, state_t-1)
If you want to stack 2 LSTM layers to together with cudnn in 1.x, you can do:
l1 = tf.layers.CuDNNLSTM(128, return_sequence=True)
l2 = tf.layers.CuDNNLSTM(128)
l1_output = l1(input)
l2_oupput = l2(l1_output)
In tf 2.x, we unify the cudnn and normal implementation together, you can just change the example above with tf.layers.LSTM(128, return_sequence=True), which will use the cudnn impl if available.

Getting graph disconnected error when trying to build encoder-decoder model in Keras

I am trying to implement, in Keras, a simplified version of the encoder-decoder model based on the one in the image below (source: https://arxiv.org/pdf/1805.07685.pdf). Note there is only one encoder and decoder in this model, they have been rolled out in the image for clarity.
I'm only focusing on the bottom branch and not including attention and a style label s_i for now. I've been following this Keras tutorial on seq2seq models for guidance. Here is my script where I define this model.
Training runs successfully but I get the errors below during the inference step.
Traceback (most recent call last):
File "/run_model.py", line 110, in <module>
decoded_sentence = benchmark_obj.inference(test_encoded, id2word, max_sequence_length)
File "/benchmark_model.py", line 173, in inference
encoder_inference = Model(self.encoder_inputs, self.encoder_states)
File "/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/python3.6/site-packages/keras/engine/network.py", line 91, in __init__
self._init_graph_network(*args, **kwargs)
File "/python3.6/site-packages/keras/engine/network.py", line 235, in _init_graph_network
self.inputs, self.outputs)
File "/python3.6/site-packages/keras/engine/network.py", line 1489, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("decoder_inputs_forward:0", shape=(?, 1, 13105), dtype=float32) at layer "decoder_inputs_forward". The following previous layers were accessed without issue: ['encoder_inputs']
During inference I create a new encoder and decoder, as per the tutorial, with the same weights as the ones trained. However, I don't include the backward transfer part as this was just for training the model. I am guessing this is the cause of the problem, because during training the encoder and decoder are almost circularly linked but during inference I only want to focus on the forward transfer.
I'm not sure however how I should go about fixing this issue. I thought maybe I should create two independent encoder / decoders for the forward and backward transfer parts and have them share weights but not sure if this is sensible. I'm a beginner with Keras so explanations without assumptions would be hugely appreciated. Thanks.
Some further context which might help:
I am attempting to transfer the style of text. I have two non-parallel corpuses for styles A and B and hence this is an unsupervised problem. This is why the decoder during forward transfer uses the output at timestep t-1 as the input at timestep t. However, during backward transfer, the decoder aims to reconstruct the original sentence and so uses the ground truth as input. Thus two decoder input layers are created.
Update:
I have solved this specific error. It turned out that I was overwriting the output from the encoder (self.encoder_states) during forward transfer with the output from the encoder during backward transfer. The backward transfer encoder takes input from the decoder which was not passed. Instead I was calling Model(self.encoder_inputs, self.encoder_states).
Following on from this, I am wondering if the approach I have taken is the simplest for implementing this model. Is there a better alternative?

I have solved this specific error. It turned out that I was overwriting the output from the encoder (self.encoder_states) during forward transfer with the output from the encoder during backward transfer. The backward transfer encoder takes input from the decoder which was not passed. Instead I was calling Model(self.encoder_inputs, self.encoder_states).

When you are using keras functional API to define model, you will need to connect the layers that is,
input_tensor = Input((784,))
x = Dense(16, activation="relu")(inputs) # inputs -> x
output_tensor = Dense(10, activation="softmax") # inputs -> x -> outputs
model = Model(inputs=input_tensor, outputs=output_tensor)
in your case, you haven't connected the layers (nodes) in the graph.
Lines 61:62
self.encoder_inputs = Input(shape=(max_timestep, self.input_dims), name="encoder_inputs")
self.encoder = LSTM(self.latent_dims, return_state=True, dropout=dropout, name="encoder")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.