Note: I posted about this issue already here. I'm creating a new question because:
1. I think the issue specifically relates to reshaping my mask within my custom layer, but I'm not sure enough of that to completely ignore the other error I wrote about in the original post.
2. There are many posts about reshaping Keras layers or adding Masking layers, but I couldn't find any about reshaping a mask within a layer, so I hope this post can be useful more generally.
The issue:
I have a custom Keras layer that takes 2D input and returns 3D output (batch_size, max_length, 1024), which is passed on to a BiLSTM followed by a CRF.
The custom Keras layer is copied from this repository. The difference is I take the 'elmo' instead of 'default' outputs from the Elmo model, so that the output is 3D as required by the BiLSTM:
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['elmo'] # The original code used 'default'
However the compute_mask function isn't appropriate for my architecture, as it's output is 2D. Thus I get the error:
InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
where 32 is batch size and 47 is one less than my specified max_length.
I'm sure I need to reshape the mask, but I couldn't find out anywhere how.
Happy to make a git repo with the whole thing and/or full stack trace if need be.
Custom ELMo Layer:
class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable = True
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable, name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True, signature='default',)['elmo']
return result
# Original compute_mask function. Raises;
# InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '__PAD__')
def compute_output_shape(self, input_shape):
return input_shape[0], 48, self.dimensions
The model is built as follows:
def build_model(): # uses crf from keras_contrib
input = layers.Input(shape=(1,), dtype=tf.string)
model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
crf = CRF(num_tags)
out = crf(model)
model = Model(input, out)
model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
model.summary()
return model
Related
I want to get the weights of my custom layer, but I couldn't get them by model.layer().get_weights()[X].
So I checked the layers of the model, it seems that the custom layer is decomposed into several operations and no weights can be found in these layers.
Here is the custom layer code
class PixelBaseConv(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(PixelBaseConv, self).__init__(**kwargs)
def build(self, input_shape):
# kernel_shape: w*h*c*output_dim
kernel_size = input_shape[1:]
kernel_shape = (1,) + kernel_size + (self.output_dim, )
self.kernel = self.add_weight(name='kernel',
shape=kernel_shape,
initializer='uniform',
trainable=True)
super(PixelBaseConv, self).build(input_shape)
def call(self, inputs):
# output_shape: w*h*output_dim
outputs = []
inputs = K.cast(inputs, dtype="float32")
for i in range(self.output_dim):
#output = tf.keras.layers.Multiply()([inputs, self.kernel[..., i]])
output = inputs*self.kernel[...,i]
output = K.sum(output, axis=-1)
if len(outputs) != 0:
outputs = np.dstack([outputs, output])
else:
outputs = output[..., np.newaxis]
return tf.convert_to_tensor(outputs)
def compute_output_shape(self, input_shape):
return input_shape + (self.output_dim, )
Here is part of the model structure
enter image description here
I tried different ways to obtain the weights but due to the strange layers, failed.
Expected: the first five layers are replaced with single layer which has a trainable kernel. Weights can be get directly by get_weights()
I listed weight list length of the first 10 layers and printed weight of layer 1 by following codes
for i in range(len(model.layers)):
print("layer " + str(i), len(model.layers[i].get_weights()))
print(model.layers[1].get_weights()[0])
and got the result and error
enter image description here
enter image description here
I found why this problem occurred.
I wrote the custom layer by
import tensorflow.python.keras
while using other keras layers and creating the model by
import tensorflow.keras
I think these two libraries may not be compatible, so my custom layer was splitted into several operation layers. Thus, weights cannot be obtained and updated.
I changed all imports to tensorflow.keras, now everything goes well.
I want to compare the performance of classification problem using GIN vs. Fully Connected Network. I have started with example from the spektral library TUDataset classification with GIN. I have created custom dataset for my problem and it is being loaded using DisjointLoader from spektral.data.
I am seeing my supervised learning is showing good results on this data using GIN network. However, to compare these results with Fully Connected network, I am facing problem in loading inputs from dataset into FC network input. The dataset is stored in graph format with Node attributes matrix and adjacency matrix. There are 18 nodes in the graph and each node has 7 attributes in the attribute matrix.
I have tried loading the FC network with just Node attributes matrix but I am facing mismatch error.
here is the FC network that I have defined instead of GIN0 network from the example shared above:
class FCN0(Model):
def __init__(self, channels, outputs):
super().__init__()
self.dense1 = Dense(channels, activation="relu")
self.dropout = Dropout(0.5)
self.dense2 = Dense(channels*3, activation="relu")
self.dense3 = Dense(outputs, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.dense1(x)
x = self.dense2(x)
return self.dense3(x)
The error message is as follows:
File "/home/xx/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [126,1] vs. [7,1]
[[node gradient_tape/mean_squared_error/BroadcastGradientArgs (defined at dut_gin_vs_circuits.py:147) ]] [Op:__inference_train_step_588]
Function call stack:
train_step
I would highly appreciate, if someone can help me identify what transformations are needed at the input to be fit the data into to FC network while loading data using the same dataset loader.
The problem is that your FC network does not have a global pooling layer (also sometimes called "readout"), and so the output of the network will have shape (batch_size * 18, 1) instead of (batch_size, 1) which is the shape of the target.
Essentially, your FC network is suitable for node-level prediction, but not graph-level prediction.
To fix this, you can introduce a global pooling layer as follows:
from spektral.layers import GlobalSumPool
class FCN0(Model):
def __init__(self, channels, outputs):
super().__init__()
self.dense1 = Dense(channels, activation="relu")
self.dropout = Dropout(0.5)
self.dense2 = Dense(channels*3, activation="relu")
self.dense3 = Dense(outputs, activation="relu")
self.pool = GlobalSumPool()
def call(self, inputs):
x, a, i = inputs
x = self.dense1(x)
x = self.dense2(x)
x = self.dense3(x)
return self.pool([x, i]) # Only pass `i` if in disjoint mode
You can move the pooling layer wherever in your computational graph, the important thing is that at some point you reduce the node-level representation to a graph-level representation.
Cheers
I know others have posted similar questions already, but I couldn't find a solution that was appropriate here.
I've written a custom keras layer to average outputs from DistilBert based on a mask. That is, I have dim=[batch_size, n_tokens_out, 768] coming in, mask along n_tokens_out based on a mask that is dim=[batch_size, n_tokens_out]. The output should be dim=[batch_size, 768]. Here's the code for the layer:
class CustomPool(tf.keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(CustomPool, self).__init__(**kwargs)
def call(self, x, mask):
masked = tf.cast(tf.boolean_mask(x, mask = mask, axis = 0), tf.float32)
mn = tf.reduce_mean(masked, axis = 1, keepdims=True)
return tf.reshape(mn, (tf.shape(x)[0], self.output_dim))
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
The model compiles without error, but as soon as the training starts I get this error:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 967 values, but the requested shape has 12288
[[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
[[assert_greater_equal/Assert/AssertGuard/pivot_f/_3/_233]]
(1) Invalid argument: Input to reshape is a tensor with 967 values, but the requested shape has 12288
[[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_211523]
Errors may have originated from an input operation.
Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)
Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)
The dimensions I get back are smaller than the expected dimensions, which is strange to me.
Here is what the model looks like (TFDistilBertModel is from the huggingface transformers library):
dbert_layer = TFDistilBertModel.from_pretrained('distilbert-base-uncased')
in_id = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_ids")
in_mask = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_masks")
dbert_inputs = [in_id, in_mask]
dbert_output = dbert_layer(dbert_inputs)[0]
x = CustomPool(output_dim = dbert_output.shape[2], name='CustomPooling')(dbert_output, in_mask)
dense1 = tf.keras.layers.Dense(256, activation = 'relu', name='dense256')(x)
pred = tf.keras.layers.Dense(n_classes, activation='softmax', name='MODEL_OUT')(dense1)
model = tf.keras.models.Model(inputs = dbert_inputs, outputs = pred, name='pooled_distilBert')
Any help here would be greatly appreciated as I had a look through existing questions, most end up being solved by specifying an input shape (not applicable in my case).
Using tf.reshape before a pooling layer
I know that my answer kinda late, but I want to share my solution to the problem. The thing is when you try to reshape a fixed size of a vector (tensor) during model training. The vector will change its input size and a fixed reshape like tf.reshape(updated_inputs, (shape = fixed_shape)) will trigger your problems, actually my problem :)) Hope it helps
I would like to extract and store the dropout mask [array of 1/0s] from a dropout layer in a Sequential Keras model at each batch while training. I was wondering if there was a straight forward way way to do this within Keras or if I would need to switch over to tensorflow (How to get the dropout mask in Tensorflow).
Would appreciate any help! I'm quite new to TensorFlow and Keras.
There are a couple of functions (dropout_layer.get_output_mask(), dropout_layer.get_input_mask()) for the dropout layer that I tried using but got None after calling on the previous layer.
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="flat", input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(
512,
activation='relu',
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
dropout = tf.keras.layers.Dropout(0.2, name = 'dropout') #want this layer's mask
model.add(dropout)
x = dropout.output_mask
y = dropout.input_mask
model.add(tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
model.compile(...)
model.fit(...)
It's not easily exposed in Keras. It goes deep until it calls the Tensorflow dropout.
So, although you're using Keras, it's will also be a tensor in the graph that can be gotten by name (finding it's name: In Tensorflow, get the names of all the Tensors in a graph).
This option, of course will lack some keras information, you should probably have to do that inside a Lambda layer so Keras adds certain information to the tensor. And you must take extra care because the tensor will exist even when not training (where the mask is skipped)
Now, you can also use a less hacky way, that may consume a little processing:
def getMask(x):
boolMask = tf.not_equal(x, 0)
floatMask = tf.cast(boolMask, tf.float32) #or tf.float64
return floatMask
Use a Lambda(getMasc)(output_of_dropout_layer)
But instead of using a Sequential model, you will need a functional API Model.
inputs = tf.keras.layers.Input((28, 28, 1))
outputs = tf.keras.layers.Flatten(name="flat")(inputs)
outputs = tf.keras.layers.Dense(
512,
# activation='relu', #relu will be a problem here
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
outputs = tf.keras.layers.Dropout(0.2, name = 'dropout')(outputs)
mask = Lambda(getMask)(outputs)
#there isn't "input_mask"
#add the missing relu:
outputs = tf.keras.layers.Activation('relu')(outputs)
outputs = tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
model = Model(inputs, outputs)
model.compile(...)
model.fit(...)
Training and predicting
Since you can't train the masks (it doesn't make any sense), it should not be an output of the model for training.
Now, we could try this:
trainingModel = Model(inputs, outputs)
predictingModel = Model(inputs, [output, mask])
But masks don't exist in prediction, because dropout is only applied in training. So this doesn't bring us anything good in the end.
The only way for training is then using a dummy loss and dummy targets:
def dummyLoss(y_true, y_pred):
return y_true #but this might evoke a "None" gradient problem since it's not trainable, there is no connection to any weights, etc.
model.compile(loss=[loss_for_main_output, dummyLoss], ....)
model.fit(x_train, [y_train, np.zeros((len(y_Train),) + mask_shape), ...)
It's not guaranteed that these will work.
I found a very hacky way to do this by trivially extending the provided dropout layer. (Almost all code from TF.)
class MyDR(tf.keras.layers.Layer):
def __init__(self,rate,**kwargs):
super(MyDR, self).__init__(**kwargs)
self.noise_shape = None
self.rate = rate
def _get_noise_shape(self,x, noise_shape=None):
# If noise_shape is none return immediately.
if noise_shape is None:
return array_ops.shape(x)
try:
# Best effort to figure out the intended shape.
# If not possible, let the op to handle it.
# In eager mode exception will show up.
noise_shape_ = tensor_shape.as_shape(noise_shape)
except (TypeError, ValueError):
return noise_shape
if x.shape.dims is not None and len(x.shape.dims) == len(noise_shape_.dims):
new_dims = []
for i, dim in enumerate(x.shape.dims):
if noise_shape_.dims[i].value is None and dim.value is not None:
new_dims.append(dim.value)
else:
new_dims.append(noise_shape_.dims[i].value)
return tensor_shape.TensorShape(new_dims)
return noise_shape
def build(self, input_shape):
self.noise_shape = input_shape
print(self.noise_shape)
super(MyDR,self).build(input_shape)
#tf.function
def call(self,input):
self.noise_shape = self._get_noise_shape(input)
random_tensor = tf.random.uniform(self.noise_shape, seed=1235, dtype=input.dtype)
keep_prob = 1 - self.rate
scale = 1 / keep_prob
# NOTE: if (1.0 + rate) - 1 is equal to rate, then we want to consider that
# float to be selected, hence we use a >= comparison.
self.keep_mask = random_tensor >= self.rate
#NOTE: here is where I save the binary masks.
#the file grows quite big!
tf.print(self.keep_mask,output_stream="file://temp/droput_mask.txt")
ret = input * scale * math_ops.cast(self.keep_mask, input.dtype)
return ret
I'm trying to set the initial state in an encoder which is composed of a Bidirectional LSTM Layer to 0's. However, if I input a single 0's matrix I get an error saying that a bidirectional layer has to be initialized with a list of tensors (makes sense). When I try to duplicate this 0's matrix into a list containing two of them (to initialize both RNNs), I get an error that the input shape is wrong. What am I missing here?
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
self.lstmb = keras.layers.Bidirectional(lstm(self.enc_units, dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=[hidden, hidden])
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
The error I get is:
ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(128, 512), ndim=2)]; however `cell.state_size` is [512, 512]
Note: the output of the function initialize_hidden_state is fed to the parameter hidden for the call function.
Reading all of the comments and answers, I think I managed to create a working example.
But first some notes:
I think the call to self.lstmb will only return all five states if, you specify it in the LSTM's constructor.
I don't think you need to pass hidden state as a list of hidden states. You should just pass it as the initial state.
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
# tell LSTM you want to get the states, and sequences returned
self.lstmb = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(self.enc_units,
return_sequences=True,
return_state=True,
dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
# no need to pass [hidden, hidden], just pass it as is
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=hidden)
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(self):
# I stole this idea from iamlcc, so the credit is not mine.
return [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, forward_h, forward_c, backward_h, backward_c = encoder(example_input_batch, sample_hidden)
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder forward_h shape: (batch size, units) {}'.format(forward_h.shape))
print('Encoder forward_c shape: (batch size, units) {}'.format(forward_c.shape))
print('Encoder backward_h shape: (batch size, units) {}'.format(backward_h.shape))
print('Encoder backward_c shape: (batch size, units) {}'.format(backward_c.shape))
You are inputting a state size of (batch_size, hidden_units) and you should input a state with size (hidden_units, hidden_units). Also it has to have 4 initial states: 2 for the 2 lstm states and 2 more becuase you have one forward and one backward pass due to the bidirectional.
Try and change this:
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
To
def initialize_hidden_state(enc_units, enc_units):
init_state = [np.zeros((enc_units, enc_units)) for i in range(4)]
return init_state
Hope this helps
If it's not too late, I think your initialize_hidden_state function should be:
def initialize_hidden_state(self):
init_state = [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
return init_state
#BCJuan has the right answer, but I had to make some changes to make it work:
def initialize_hidden_state(batch_sz, enc_units):
init_state = [tf.zeros((batch_sz, enc_units)) for i in range(2)]
return init_state
Very important: use tf.zeros not np.zeros since it is expecting a tf.tensor type.
If you are using a single LSTM layer in the Bidirectional wrapper, you need to return a list of 2 tf.tensors to init each RNN. One for the forward pass, and one for the backward pass.
Also, if you look at an example in TF's documentation, they use batch_sz and enc_units to specify the size of the hidden state.
I ended up not using the bidirectional wrapper, and just create 2 LSTM layers with one of them receiving the parameter go_backwards=True and concatenating the outputs, if it helps anyone.
I think the bidirectional Keras wrapper can't handle this sort of thing at the moment.
I constructed my encoder with tf.keras.Model, and met the same error. this PR may help you.
Finally I built my model by tf.keras.layers.layer, and I'm still working on it. I'll update after I success!