I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.
Related
I am using tensorflow 2.3.0.
I implemented my own Layer by inheritting the tf.keras.layers.Layer. I implemented my own computation inside the call function of my own Layer. Like the demo code below.
Now I want to print tensor values inside the call functions. I tried tf.print, but nothing was shown. How to print the actual tensor values along with the training. Thanks advance.
class MyLayer(tf.keras.layers.Layer):
def call(self, inputs):
x=func1(inputs)
tf.print(x) # not working!
x=func2(x)
tf.print(x) # not working!
I used the estimator api to train the model.
I'm having some difficulties writing an extract_weights and an initialize function for a tf.Module model that i later convert to tflite.
The idea is that, i want to use this model for on device training.
The project architecture is as it follows:
-first i create a transfer learning model that will later be used for training
-then i upload this model in my android application where i train it using the tflite.Interpreter
-the model will be trained federated using a flower server
The problem that i have at the moment is that flower needs to colect from each device the weights as ByteBuffers after each training loop, but i don't seem to understand how i could save them in my android application.
These are the methods that i wrote
#tf.function
def extract_weights(self):
"""
Extracts the traininable weights of the head model as a list of numpy arrays.
Paramaters:
Returns:
Map of extracted weights and biases.
"""
tmp_dict = {}
tensor_names = [weight.name for weight in self.head_model.weights]
tensors_to_save = [weight.read_value() for weight in self.head_model.weights]
for index, layer in enumerate(tensors_to_save):
tmp_dict[tensor_names[index]] = layer
return tmp_dict
#tf.function(input_signature=[SIGNATURE_DICT])
def initialize_weights(self, weights):
"""
Initializes weights of the head model.
Paramaters:
weights : Tensors used for initialization.
Returns:
NONE
"""
tensor_names = [weight.name for weight in self.head_model.weights]
for i, tensor in enumerate(self.head_model.weights):
tensor.assign(weights[tensor_names[i]])
To notice that when i instantiate a TransferLearningModel(my model class that implements tf.Module) object and call these to functions i got no problems but when i try to convert them to tflite i get this error:
ValueError: Got a non-Tensor value<tf.Operation 'StatefulPartitionedCall' type=StatefulPartitionedCall>for key 'output_0' in the output of the function __inference_initialize_weights_8582 used to generate the SavedModel signature 'initialize'. Outputs for functions used as signatures must be a ValueError: Got a non-Tensor value<tf.Operation 'StatefulPartitionedCall' type=StatefulPartitionedCall> for key 'output_0' in the output of the function __inference_initialize_weights_8582 used to generate the SavedModel signature 'initialize'. Outputs for functions used as signatures must be a single Tensor, a sequence of Tensors, or a dictionary from string to Tensor.
I understand the error but i don t get why i have to return something when simply initializing the weights of my model.
I initialized nn.Embedding with some pretrain parameters (they are 128 dim vectors), the following code demonstrates how I do this:
self.myvectors = gensim.models.KeyedVectors.load_word2vec_format(cfg.vec_dir)
self.vec_weights = torch.FloatTensor(self.myvectors.vectors)
self.embeds = torch.nn.Embedding.from_pretrained(self.vec_weights)
cfg.vec_dir is a json file where vec_dir indicates the path of the pretrained 128 dim vectors I used to initialize this layer.
After the model is trained, I print out this embedding layer, and I found that the parameters are exactly the same as I initialized them, so clearly the parameters are not updated during the training. Why is this happening? What should I do in order to update these vectors?
The torch.nn.Embedding.from_pretrained classmethod by default freezes the parameters. If you want to train the parameters, you need to set the freeze keyword argument to False. See the documentation.
So you might try this instead:
self.embeds = torch.nn.Embedding.from_pretrained(self.vec_weights, freeze=False)
I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.
This question is about coding strategy using Tensorflow. I would like to create a small classifier network made of:
1: an input
2: a simple layer fully connected (W*x+B)
3: a LSTM layer
4: a softmax layer
5: an ouput
In tensorflow, to use the class tf.nn.dynamic_rnn(), we need to a batch of sequences to the network. So far, it's work perfectly (I love this library).
But as I want to apply a simple layer on each features of my sequences (2nd layer in my description), i'm wondering:
Do i preceed my LSTM layer with this simple layer and pass both to the tf.nn.dynamic_rnn() operation...
OR
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
Normally, it should give me the same result ? If it's the case, what should I use ?
Thank you for your time !
I recently encountered a similar scenario, where I'd like to chain recurrent and non-recurrent layers.
Do i preceed my LSTM layer with this simple layer and pass both to the
tf.nn.dynamic_rnn() operation...
This won't work. The function dynamic_rnn expects a cell as its first argument. A cell is a class that inherits from tf.nn.rnn_cell.RNNCell. Additionally, the second input argument to dynamic_rnn should be a tensor with at least 3 dimensions, where the first two dimensions are batch and time (time_major=False) or time and batch (time_major=True).
Do i use the function tf.map_fn() twice (one to unpack batches, one to unpack sequences), which if a understood well, is able to unpack my sequences and apply a layer on each features line.
This might work, but doesn't appear to me to be an efficient and clean solution. Firstly, it should not be necessary to 'unpack batches', as you presumably want to perform some operation on batches of features and time-steps, where each observation in a batch is independent from the others.
My solution to this particular problem was to create a sub-class of tf.nn.rnn_cell.RNNCell. In my case I wanted a simple feedforward layer that would iterate over all of the time steps and that could be used in dynamic_rnn:
import tensorflow as tf
class FeedforwardCell(tf.nn.rnn_cell.RNNCell):
"""A stateless feedforward cell that can be used with MultiRNNCell
"""
def __init__(self, num_units, activation=tf.tanh, dtype=tf.float32):
self._num_units = num_units
self._activation = activation
# Store a dummy state to make dynamic_rnn happy.
self.dummy = tf.constant([[0.0]], dtype=dtype)
#property
def state_size(self):
return 1
#property
def output_size(self):
return self._num_units
def zero_state(self, batch_size, dtype):
return self.dummy
def __call__(self, inputs, state, scope=None):
"""Basic feedforward: output = activation(W * input)."""
with tf.variable_scope(scope or type(self).__name__): # "FeedforwardCell"
output = self._activation(tf.nn.rnn_cell._linear(
[inputs], self._num_units, True))
return output, self.dummy
An instance of this class can be passed, in a list with "normal" RNN cells, to an tf.nn.rnn_cell.MultiRNNCell initializer. The resulting object instance can be passed as the cell input argument to dynamic_rnn.
Important to note: dynamic_rnn expects that a recurrent cell returns a state when called. I therefore use dummy in FeedforwardCell as a fake state variable.
My solution might not be the smoothest or best way to chain recurrent and non-recurrent layers together. I'd be interested in hearing from other Tensorflow users about their suggestions.
Edit
If you choose to use the sequence_length input argument of dynamic_rnn, then state_size should be self._num_units and the dummy state should have shape [batch_size, self.state_size]. In other words, the state cannot be a scalar. Note that bidirectional_dynamic_rnn requires that the sequence_length argument is not None, whereas dynamic_rnn does not have this requirement. (This is weakly documented in the TF documentation.)