I'm trying to use a subclassed model to decode binary data, thus my inputs are tensors of booleans/bits. I began by using TensorFlow's functional API to build a model and everything worked as expected. This was (the relevant part of) my code:
def build_decoder(bitrate):
input_layer = layers.Input(shape=(bitrate,))
hidden = tf.expand_dims(input_layer, -1)
hidden = layers.Conv1D(filters=128, kernel_size=2, padding="same", activation="elu")(hidden)
hidden = layers.AvgPool1D(pool_size=2)(hidden)
hidden = layers.Conv1D(filters=128, kernel_size=2, padding="same", activation="elu")(hidden)
hidden = layers.AvgPool1D(pool_size=2)(hidden)
hidden = layers.Bidirectional(
layers.GRU(units=128,
activation="tanh",
recurrent_activation="sigmoid",
reset_after=True,
return_sequences=True))(hidden)
[...]
This was working fine, but for some unimportant reasons I wanted to use subclassing to build the model instead. This is my approach so far.
class Decoder(tf.keras.Model):
def __init__(self, config: SplitDecoderConfig or int or str):
super(Decoder, self).__init__()
[...]
self.upper = []
for i in range(config.conv_layers):
self.upper.append(layers.Conv1D(filters=config.conv_filters[i],
kernel_size=config.conv_kernel_sizes[i],
padding="same", activation="elu"))
self.upper.append(layers.AvgPool1D(pool_size=config.conv_pool_sizes[i]))
[...]
print("building")
self.build([None, config.bitrate])
print("calling")
self.call(tf.random.uniform([8, config.bitrate]) < .5)
def call(self, inputs, *args, **kwargs):
# inputs = tf.cast(inputs, tf.half) # un-commenting this works, but is very slow during training
upper = tf.expand_dims(inputs, -1)
for layer in self.upper:
upper = layer(upper) # crashes here when iterating over the first Conv1D layer
[...]
return self.output_layer(combined)
Building the model works fine, using self.build() (which uses a tensor of float32 values) does work as well, but as soon as I call the model (see the function call at the end of _ init_()) with bools the following error is raised:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'T' of bool is not in the list of allowed values: half, bfloat16, float, double, int32
; NodeDef: {{node Conv2D}}; Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE, DT_INT32]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID", "EXPLICIT"]; attr=explicit_paddings:list(int),default=[]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]> [Op:Conv2D]
Note that the operation where it crashes is a 2D-convolution, while I dont use such layers at all... is this a backend thing?
Casting the inputs at the beginning of call() would fix the problem, but generates huge storage overhead and slows down training by a factor of almost 50 (at least I would think that there lies the problem... I tried to add an Input-wrapper like I did in the functional example, but without any positive results...
Looking forward to your help, thanks!
====== EDIT ======
I continued debugging and found out, that the layers within upper do not save any information about the output_shape or anything related, thus after building (or even calling my model with data) I cannot use model.summary() as it will crash with the errormessage that my layers were not built and to not have a known output_shape (even though I can compute the shape i.e. with compute_output_shape())...
Maybe the problem lies in the way I am subclassing Model?
Additionally model.layers() is empty which i find weird...
Related
I get a dimension mismatch error when I run the predictor on the model, even though training, validation and testing work. I suppose this means there is a problem in image processing in the predictor model.
class Predictor(nn.Module):
def __init__(self, model, class_names, mean, std):
super().__init__()
self.model = model.eval()
self.class_names = class_names
self.transforms = nn.Sequential( # --- THIS MIGHT BE THE PROBLEM
T.Resize([256, ]),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float),
T.Normalize(mean.tolist(), std.tolist())
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
with torch.no_grad():
# Apply transforms --- THIS MIGHT BE THE PROBLEM TOO
x = self.transforms(x)
# Get the logits
x = self.model(x)
# Apply softmax
x = F.softmax(x, dim=1)
return x
I tried hardcoding the dimensions of the input neurons of model class, it did work for a couple seconds then I got another dimension mismatch.
For example, at training, the model's input neurons were 128*7*7, and then I hardcoded that to 57600 as this was the dimension of the input that raised the error. It did work for like 26 images during predicting but then it raised another dimension mismatch error with another dimension of 51200.
This indeed means that the image that are passed to the model are of inconsistent dimensions!
This also means that self.transforms(...) does not work because if it did, there wouldn't have been a dimension mismatch
I could resolve the error by forcing the data plugged into the model to be transformed using a hardcoded T.Compose([..., T.CenterCrop(224), ...]) to ensure the images themselves are plugged into the model cropped-ready.
But, I still do not understand why the following does not resolve that error
self.transforms = nn.Sequential(T.Resize([256, ]),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float),
T.Normalize(mean.tolist(), std.tolist()))
Calling the model itself with 224*224 eliminates the error, but the model does not accept any other dimension as it raises a dimensionality error
I'm trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER. I used the padding sequence length same as in the encode method (max_length = max([len(string) for string in list_of_strings])) along with attention_masks. And I got this error:
ValueError: If training, make sure that config.axial_pos_shape factors: (128, 512) multiply to sequence length. Got prod((128, 512)) != sequence_length: 2248. You might want to consider padding your sequence length to 65536 or changing config.axial_pos_shape.
When I changed the sequence length to 65536, my colab session crashed by getting all the inputs of 65536 lengths.
According to the second option(changing config.axial_pos_shape), I cannot change it.
I would like to know, Is there any chance to change config.axial_pos_shape while fine-tuning the model? Or I'm missing something in encoding the input strings for reformer-enwik8?
Thanks!
Question Update: I have tried the following methods:
By giving paramteres at the time of model instantiation:
model = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9, max_position_embeddings=1024, axial_pos_shape=[16,64], axial_pos_embds_dim=[32,96],hidden_size=128)
It gives me the following error:
RuntimeError: Error(s) in loading state_dict for ReformerModelWithLMHead:
size mismatch for reformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([258, 1024]) from checkpoint, the shape in current model is torch.Size([258, 128]).
size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([128, 1, 256]) from checkpoint, the shape in current model is torch.Size([16, 1, 32]).
This is quite a long error.
Then I tried this code to update the config:
model1 = transformers.ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8', num_labels = 9)
Reshape Axial Position Embeddings layer to match desired max seq length
model1.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model1.reformer.embeddings.position_embeddings.weights[1][0][:128])
Update the config file to match custom max seq length
model1.config.axial_pos_shape = 16,128
model1.config.max_position_embeddings = 16*128 #2048
model1.config.axial_pos_embds_dim= 32,96
model1.config.hidden_size = 128
output_model_path = "model"
model1.save_pretrained(output_model_path)
By this implementation, I am getting this error:
RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 2. Target sizes: [1, 128, 512, 768]. Tensor sizes: [128, 768]
Because updated size/shape doesn't match with the original config parameters of pretrained model. The original parameters are: axial_pos_shape = 128,512 max_position_embeddings = 128*512 #65536 axial_pos_embds_dim= 256,768 hidden_size = 1024
Is it the right way I'm changing the config parameters or do I have to do something else?
Is there any example where ReformerModelWithLMHead('google/reformer-enwik8') model fine-tuned.
My main code implementation is as follow:
class REFORMER(torch.nn.Module):
def __init__(self):
super(REFORMER, self).__init__()
self.l1 = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9)
def forward(self, input_ids, attention_masks, labels):
output_1= self.l1(input_ids, attention_masks, labels = labels)
return output_1
model = REFORMER()
def train(epoch):
model.train()
for _, data in enumerate(training_loader,0):
ids = data['input_ids'][0] # input_ids from encode method of the model https://huggingface.co/google/reformer-enwik8#:~:text=import%20torch%0A%0A%23%20Encoding-,def%20encode,-(list_of_strings%2C%20pad_token_id%3D0
input_shape = ids.size()
targets = data['tags']
print("tags: ", targets, targets.size())
least_common_mult_chunk_length = 65536
padding_length = least_common_mult_chunk_length - input_shape[-1] % least_common_mult_chunk_length
#pad input
input_ids, inputs_embeds, attention_mask, position_ids, input_shape = _pad_to_mult_of_chunk_length(self=model.l1,
input_ids=ids,
inputs_embeds=None,
attention_mask=None,
position_ids=None,
input_shape=input_shape,
padding_length=padding_length,
padded_seq_length=None,
device=None,
)
outputs = model(input_ids, attention_mask, labels=targets) # sending inputs to the forward method
print(outputs)
loss = outputs.loss
logits = outputs.logits
if _%500==0:
print(f'Epoch: {epoch}, Loss: {loss}')
for epoch in range(1):
train(epoch)
First of all, you should note that google/reformer-enwik8 is not a properly trained language model and that you will probably not get decent results from fine-tuning it. enwik8 is a compression challenge and the reformer authors used this dataset for exactly that purpose:
To verify that the Reformer can indeed fit large models on a single
core and train fast on long sequences, we train up to 20-layer big
Reformers on enwik8 and imagenet64...
This is also the reason why they haven't trained a sub-word tokenizer and operate on character level.
You should also note that the LMHead is usually used for predicting the next token of a sequence (CLM). You probably want to use a token classification head (i.e. use an encoder ReformerModel and add a linear layer with 9 classes on top+maybe a dropout layer).
Anyway, in case you want to try it still, you can do the following to reduce the memory footprint of the google/reformer-enwik8 reformer:
Reduce the number of hashes during training:
from transformers import ReformerConfig, ReformerModel
conf = ReformerConfig.from_pretrained('google/reformer-enwik8')
conf.num_hashes = 2 # or maybe even to 1
model = transformers.ReformerModel.from_pretrained("google/reformer-enwik8", config =conf)
After you have finetuned your model, you can increase the number of hashes again to increase the performance (compare Table 2 of the reformer paper).
Replace axial-position embeddings:
from transformers import ReformerConfig, ReformerModel
conf = ReformerConfig.from_pretrained('google/reformer-enwik8')
conf.axial_pos_embds = False
model = transformers.ReformerModel.from_pretrained("google/reformer-enwik8", config =conf)
This will replace the learned axial positional embeddings with learnable position embeddings like Bert's and do not require the full sequence length of 65536. They are untrained and randomly initialized (i.e. consider a longer training).
The Reformer model was proposed in the paper Reformer: The Efficient Transformer by Nikita Kitaev, Ćukasz Kaiser, Anselm Levskaya.
The paper contains a method for factorization gigantic matrix which is resulted of working with very long sequences! This factorization is relying on 2 assumptions
the parameter config.axial_pos_embds_dim is set to a tuple (d1,d2) which sum has to be equal to config.hidden_size
config.axial_pos_shape is set to a tuple (n1s,n2s) which product has to be equal to config.max_embedding_size
(more on these here!)
Finally your question ;)
I'm almost sure your session crushed duo to ram overflow
you can change any config parameter during model instantiation like
the official documentation!
In keras / tensorflow it is often quite simple to describe layers directly as functions that map their input to an output, like so:
def resnet_block(x, kernel_size):
ch = x.shape[-1]
out = Conv2D(ch, kernel_size, strides = (1,1), padding='same', activation='relu')(x)
out = Conv2D(ch, kernel_size, strides = (1,1), padding='same', activation='relu')(out)
out = Add()([x,out])
return out
whereas subclassing Layer to get something like
r = ResNetBlock(kernel_size=(3,3))
y = r(x)
is a little more cumbersome (or even a lot more cumbersome for more complex examples).
Since keras seems perfectly happy to construct the underlying weights of its layers when they're being called for the first time, I was wondering if it was possible to just wrap functions such as the one above and let keras figure things out once there are inputs, i.e. I would like it to look like this:
r = FunctionWrapperLayer(lambda x:resnet_block(x, kernel_size=(3,3)))
y = r(x)
I've made an attempt at implementing FunctionWrapperLayer, which looks as follows:
class FunctionWrapperLayer(Layer):
def __init__(self, fn):
super(FunctionWrapperLayer, self).__init__()
self.fn = fn
def build(self, input_shape):
shape = input_shape[1:]
inputs = Input(shape)
outputs = self.fn(inputs)
self.model = Model(inputs=inputs, outputs=outputs)
self.model.compile()
def call(self, x):
return self.model(x)
This looks like it might work, however I've run into some bizarre issues whenever I use activations, e.g. with
def bad(x):
out = tf.keras.activations.sigmoid(x)
out = Conv2D(1, (1,1), strides=(1,1), padding='same')(out)
return out
x = tf.constant(tf.reshape(tf.range(48,dtype=tf.float32),[1,4,-1,1])
w = FunctionWrapperLayer(bad)
w(x)
I get the following error
FailedPreconditionError: Error while reading resource variable _AnonymousVar34 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar34/class tensorflow::Var does not exist.
[[node conv2d_6/BiasAdd/ReadVariableOp (defined at <ipython-input-33-fc380d9255c5>:12) ]] [Op:__inference_keras_scratch_graph_353]
What this suggests to me is that there is something inherently wrong with initializing models like that in the build method. Maybe someone has a better idea as to what might be going on there or how else to get the functionality I would like.
Update:
As mentioned by jr15, the above does work when the function involved only uses keras layers. However, the following ALSO works, which has me a little puzzled:
i = Input(x.shape[1:])
o = bad(i)
model = Model(inputs=i, outputs=o)
model(x)
Incidentally, model.submodules yields
(<tensorflow.python.keras.engine.input_layer.InputLayer at 0x219d80c77c0>,
<tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer at 0x219d7afc820>,
<tensorflow.python.keras.layers.convolutional.Conv2D at 0x219d7deafa0>)
meaning the activation is automatically turned into a "TensorFlowOpLayer" when doing it like that.
Another update:
Looking at the original error message, it seems like the activation isn't the only culprit. If I remove the convolution and use the wrapper everything works as well and again I find a "TensorFlowOpLayer" when inspecting the submodules.
You solution actually works! The trouble you're running into is that tf.keras.activations.sigmoid is not a Layer, but a plain Tensorflow function. To make it work, use keras.layers.Activation("sigmoid")(x) instead. For the more general case, where you want to use some Tensorflow function as a layer, you can wrap it in a Lambda layer like so:
out = keras.layers.Lambda(lambda x: tf.some_function(x))(out)
See the docs for more info: https://keras.io/api/layers/core_layers/lambda/
With Tensorflow 2.4 it apparently just works now. The submodules now show a "TFOpLambda" layer.
To anybody interested, here is some slightly improved wrapper code that also accommodates multi-input models:
class FunctionWrapperLayer(Layer):
def __init__(self, fn):
super(FunctionWrapperLayer, self).__init__()
self.fn = fn
def build(self, input_shapes):
super(FunctionWrapperLayer, self).build(input_shapes)
if type(input_shapes) is list:
inputs = [Input(shape[1:]) for shape in input_shapes]
else:
inputs = Input(input_shapes[1:])
outputs = self.fn(inputs)
self.fn_model = Model(inputs=inputs, outputs=outputs)
self.fn_model.compile()
def call(self, x):
return self.fn_model(x)
I have a complex keras model in which one of the layers is a custom pretrained layer which expects "int32" as inputs. This model is implemented as a class that inherits from Model and it is implemented like this:
class MyModel(tf.keras.models.Model):
def __init__(self, size, input_shape):
super(MyModel, self).__init__()
self.layer = My_Layer()
self.build(input_shape)
def call(self, inputs):
return self.layer(inputs)
But when it reaches the self.build method, it throws the next error:
ValueError: You cannot build your model by calling `build` if your layers do not support float type inputs. Instead, in order to instantiate and build your model, `call` your model on real tensor data (of the correct dtype).
How can I fix it?
The exception is thrown when building a model with model.build.
model.build function build a model based on given input shape.
The error is raised because when we trying to build a model, it first calls a model with x argument depending on input shape type in the following code
if (isinstance(input_shape, list) and
all(d is None or isinstance(d, int) for d in input_shape)):
input_shape = tuple(input_shape)
if isinstance(input_shape, list):
x = [base_layer_utils.generate_placeholders_from_shape(shape)
for shape in input_shape]
elif isinstance(input_shape, dict):
x = {
k: base_layer_utils.generate_placeholders_from_shape(shape)
for k, shape in input_shape.items()
}
else:
x = base_layer_utils.generate_placeholders_from_shape(input_shape)
x is a TensorFlow placeholder here. So when trying to call a model with x as an input it will pop a TypeError and the result except for block will work and give an error.
I assume your input shape is 16x16. Instead of using self.build([(16,16)]) this, call the model based on real tensor
inputs = tf.keras.Input(shape=(16,))
self.call(inputs)
Workaround
I've encountered the same problem when trying to export model with multiple Int-typed input tensor as SavedModel. I worked around by overriding the build method and manually specifying self._build_input_shape. So you solution would look like:
class MyModel(tf.keras.models.Model):
def __init__(self, size, input_shape):
super(MyModel, self).__init__()
self.layer = My_Layer()
self.build(input_shape)
def call(self, inputs):
return self.layer(inputs)
def build(self, input_shapes):
super(tf.keras.Model, self).build(input_shapes)
What happened in the original code
The default build method of tf.keras.Model object will treat by default input tensors as float tensors, which ends up throwing the exception.
Such behavior of tf.keras.Model is defined here, where inputs for your model are created by base_layer_utils.generate_placeholders_from_shape, which will specify dtype as float.
What would happen with the workaround
As tf.keras.Model.build would finally invoke it's super class's build function tf.keras.layer.Layer.build, the workaround skips tf.keras.Model.build logic that causes the problem, but you may have to add complemental code after that in case you rely on other logics in defined in tf.keras.Model.build
I want to write some custom Keras Layers and do some advanced calculations in the layer, for example with Numpy, Scikit, OpenCV...
I know there are some math functions in keras.backend that can operate on tensors, but i need some more advanced functions.
However, i have no clue how to implement this correctly, i get the error message:
You must feed a value for placeholder tensor 'input_1' with dtype float and shape [...]
Here is my custom layer:
class MyCustomLayer(Layer):
def __init__(self, **kwargs):
super(MyCustomLayer, self).__init__(**kwargs)
def call(self, inputs):
"""
How to implement this correctly in Keras?
"""
nparray = K.eval(inputs) # <-- does not work
# do some calculations here with nparray
# for example with Numpy, Scipy, Scikit, OpenCV...
result = K.variable(nparray, dtype='float32')
return result
def compute_output_shape(self, input_shape):
output_shape = tuple([input_shape[0], 256, input_shape[3]])
return output_shape # (batch, 256, channels)
The error appears here in this dummy model:
inputs = Input(shape=(96, 96, 3))
x = MyCustomLayer()(inputs)
x = Flatten()(x)
x = Activation("relu")(x)
x = Dense(1)(x)
predictions = Activation("sigmoid")(x)
model = Model(inputs=inputs, outputs=predictions)
Thanks for all hints...
TD;LR You should not mix Numpy inside Keras layers. Keras uses Tensorflow underneath because it has to track all the computations to be able to compute the gradients in the backward phase.
If you dig in Tensorflow, you will see that it almost covers all the Numpy functionality (or even extends it) and if I remember correctly, Tensorflow functionality can be accessed through the Keras backend (K).
What are the advance calculations/functions you need?
i think that this kinda process should apply before the model because the process does not contain variables so it cant be optimized.
K.eval(inputs) does not work beacuse you are trying to evaluate a placeholder not variable placeholders has not values for evaluate. if you want get values you should feed it or you can make a list from tensors one by one with tf.unstack()
nparray = tf.unstack(tf.unstack(tf.unstack(inputs,96,0),96,0),3,0)
your call function is wrong because returns a variable you should return a constant:
result = K.constant(nparray, dtype='float32')
return result