Input to reshape doesn't match requested shape - python

I know others have posted similar questions already, but I couldn't find a solution that was appropriate here.
I've written a custom keras layer to average outputs from DistilBert based on a mask. That is, I have dim=[batch_size, n_tokens_out, 768] coming in, mask along n_tokens_out based on a mask that is dim=[batch_size, n_tokens_out]. The output should be dim=[batch_size, 768]. Here's the code for the layer:
class CustomPool(tf.keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(CustomPool, self).__init__(**kwargs)
def call(self, x, mask):
masked = tf.cast(tf.boolean_mask(x, mask = mask, axis = 0), tf.float32)
mn = tf.reduce_mean(masked, axis = 1, keepdims=True)
return tf.reshape(mn, (tf.shape(x)[0], self.output_dim))
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
The model compiles without error, but as soon as the training starts I get this error:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 967 values, but the requested shape has 12288
[[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
[[assert_greater_equal/Assert/AssertGuard/pivot_f/_3/_233]]
(1) Invalid argument: Input to reshape is a tensor with 967 values, but the requested shape has 12288
[[node pooled_distilBert/CustomPooling/Reshape (defined at <ipython-input-245-a498c2817fb9>:13) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_211523]
Errors may have originated from an input operation.
Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)
Input Source operations connected to node pooled_distilBert/CustomPooling/Reshape:
pooled_distilBert/CustomPooling/Mean (defined at <ipython-input-245-a498c2817fb9>:11)
The dimensions I get back are smaller than the expected dimensions, which is strange to me.
Here is what the model looks like (TFDistilBertModel is from the huggingface transformers library):
dbert_layer = TFDistilBertModel.from_pretrained('distilbert-base-uncased')
in_id = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_ids")
in_mask = tf.keras.layers.Input(shape=(seq_max_length,), dtype='int32', name="input_masks")
dbert_inputs = [in_id, in_mask]
dbert_output = dbert_layer(dbert_inputs)[0]
x = CustomPool(output_dim = dbert_output.shape[2], name='CustomPooling')(dbert_output, in_mask)
dense1 = tf.keras.layers.Dense(256, activation = 'relu', name='dense256')(x)
pred = tf.keras.layers.Dense(n_classes, activation='softmax', name='MODEL_OUT')(dense1)
model = tf.keras.models.Model(inputs = dbert_inputs, outputs = pred, name='pooled_distilBert')
Any help here would be greatly appreciated as I had a look through existing questions, most end up being solved by specifying an input shape (not applicable in my case).

Using tf.reshape before a pooling layer
I know that my answer kinda late, but I want to share my solution to the problem. The thing is when you try to reshape a fixed size of a vector (tensor) during model training. The vector will change its input size and a fixed reshape like tf.reshape(updated_inputs, (shape = fixed_shape)) will trigger your problems, actually my problem :)) Hope it helps

Related

fit() method returns ValueError when using Keras subclassing API

I am trying to tidy up my code by moving from the Keras functional API to the subclassing API. The class I came up with so far is below:
class FeedForwardNN(Model):
def __init__(self, params):
super().__init__()
self.params = params
self.layout = params['layout']
# Define layers
self.dense = Dense(units=params['layout'][1],
activation=params['activation'],
kernel_initializer=params['initializer'])
self.output_layer = Dense(units=params['layout'][-1],
kernel_initializer=params['initializer'])
self.dropout = Dropout(params['dropout'])
self.batch_norm = BatchNormalization()
def call(self, x):
for layer in self.layout[1:-1]:
x = self.dropout(self.dense(x))
if self.params['batch_norm']:
x = self.batch_norm(x)
x = self.output_layer(x)
return x
Where layout is a list of the neurons in each layer (including input and output layers).
However, when fitting the model, the following error is raised:
ValueError: Input 0 of layer "dense" is incompatible with the layer: expected axis -1 of input shape to have value 5, but received input with shape (None, 100)
Call arguments received:
• x=tf.Tensor(shape=(None, 5), dtype=float32)
which seems to occur on the line:
x = self.dropout(self.dense(x))
I checked the shape of the training data X that is passed to the fit() method, and it appears to have the right shape i.e. (number of observations, number of predictors).
Does anyone have an idea of where my mistake is?
The problem is that you are using same self.dense layer over and over again in your for loops
for layer in self.layout[1:-1]:
x = self.dropout(self.dense(x))
After the first loop, x has shape (batch, 100). Then in the second loop, instead of passing this x to the second Dense layer (which you don't seem to have created in the first place), you re-pass it to the first Dense layer, which expects shape (batch, 5), causing the error.
You can create a list of dense layer as follows in __init__
self.denses = [Dense(units=self.layout[i],
activation=params['activation'],
kernel_initializer=params['initializer']) for i in self.layout[1:-1]]
and call them in sequence
for dense_layer in self.denses:
x = self.dropout(dense_layer(x))

Keras custom data generator giving dimension errors with multi input and multi output( functional api model)

I have written a generator function with Keras, before returning X,y from __getitem__ I have double check the shapes of the X's and Y's and they are alright, but generator is giving dimension mismatch array and warnings.
(Colab Code to reproduce: https://colab.research.google.com/drive/1bSJm44MMDCWDU8IrG2GXKBvXNHCuY70G?usp=sharing)
My training and validation generators are pretty much same as
class ValidGenerator(Sequence):
def __init__(self, df, batch_size=64):
self.batch_size = batch_size
self.df = df
self.indices = self.df.index.tolist()
self.num_classes = num_classes
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
return int(len(self.indices) // self.batch_size)
def __getitem__(self, index):
index = self.index[index * self.batch_size:(index + 1) * self.batch_size]
batch = [self.indices[k] for k in index]
X, y = self.__get_data(batch)
return X, y
def on_epoch_end(self):
self.index = np.arange(len(self.indices))
if self.shuffle == True:
np.random.shuffle(self.index)
def __get_data(self, batch):
#some logic is written here
#hat prepares 3 X features and 3 Y outputs
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
#print(len(X))
return X, y
I am return tupple of X,y from which has 3 input features and 3 output features each, so shape of X is (3,32,10,1)
I am using functional api to build model(I have things like concatenation, multi input/output, which isnt possible with sequential) with following structure
When I try to fit the model with generator with following code
train_datagen = TrainGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
valid_datagen = ValidGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
model.fit(train_datagen, epochs=2,verbose=1,callbacks=[checkpoint,es])
I get these warnings and errors, that dont go away
Epoch 1/2
WARNING:tensorflow:Model was constructed with shape (None, 10) for input >Tensor("input_1:0", shape=(None, 10), dtype=float32), but it was called >on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for input
Tensor("input_2:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for
input Tensor("input_3:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
...
...
call
return super(RNN, self).call(inputs, **kwargs)
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:975
call
input_spec.assert_input_compatibility(self.input_spec, inputs,
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/input_spec.py:176
assert_input_compatibility
raise ValueError('Input ' + str(input_index) + ' of layer ' +
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
I have rechecked whole code and it isnt possible to have input (None,None,None) like in warning or in error, my input dimension is (3,32,10,1)
Update
I have also tried to write a generator function with python and got exactly same error.
My generator function
def generate_arrays_from_file(batchsize,df):
#print(bat)
inputs = []
targets = []
batchcount = 0
while True:
df3 = df.loc[np.arange(batchcount*batchsize,(batchcount*batchsize)+batchsize)]
#Some pre processing
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
yield X,y
batchcount = batchcount +1
It seems like it is something wrong internally wit keras (may be due to the fact I am using functional API)
Update 2
I also tried to output tuple
X = (input1_X,input2_X,input3_X)
y = (output1_y,output2_y,output3_y)
and also named input/output, but it doesnt work
X = {"input_1": input1_X, "input_2": input2_X,"input_3": input3_X}
y = {"output_1": output1_y, "output_2": output2_y,"output_3": output3_y}
Note about problem formulation:
Changing the individual X features to shape (32,10) instead of (32,10,1) might help to get rid of this error but that is not what I want, it changes my problem(I no longer have 10 time steps with one feature each)
Keras use 'None' for dynamic dimensions.
As you can see on the model.summary() chart - the model expecting shape(None, 10) for all of your inputs, which is two dimensional. With batch dimension - you should feed three dimensional data to the model.
But you are feeding four dimensional data.
I would guess that your model doesn't split your input list by three inputs. Try to change your inputs to tuple:
X = (input_array_1,input_array_2,input_array_3)
In order to resolve this error:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
TrainGenerator should be changed in the following way.
Current code:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len,1)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len,1)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len,1)
Should be changed to:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len)
The reason is that each of the 3 Inputs expects a 2-dimensional array, but the generator provides a 3-dimensional one. The expected shape is (batch_size, 10).
I had a similar issue with a custom generator that just had to pass a numpy array of size 10 as input and one single output.
To solve this problem i had to trasform the shape of the 2 vectors passed to the neural network like this:
def slides_generator(integer_list):
# stuff happens
x = np_ts[np_index:np_index+10] # numpy array
y = np_ts[np_index+10] # numpy array
yield tf.convert_to_tensor(x)[np.newaxis, ...], tf.convert_to_tensor(y)[np.newaxis, ...]
doge_gen = slides_generator(integer_list) #next(doge_gen)
basically you need to pass the 2 arrays with shape (None,size),
so in my case were (None,10) and (None,1), and to achieve this i just passed 2 reshaped tensors.
you need the None dimension as the batch size.

How to reshape keras mask within custom layer

Note: I posted about this issue already here. I'm creating a new question because:
1. I think the issue specifically relates to reshaping my mask within my custom layer, but I'm not sure enough of that to completely ignore the other error I wrote about in the original post.
2. There are many posts about reshaping Keras layers or adding Masking layers, but I couldn't find any about reshaping a mask within a layer, so I hope this post can be useful more generally.
The issue:
I have a custom Keras layer that takes 2D input and returns 3D output (batch_size, max_length, 1024), which is passed on to a BiLSTM followed by a CRF.
The custom Keras layer is copied from this repository. The difference is I take the 'elmo' instead of 'default' outputs from the Elmo model, so that the output is 3D as required by the BiLSTM:
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['elmo'] # The original code used 'default'
However the compute_mask function isn't appropriate for my architecture, as it's output is 2D. Thus I get the error:
InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
where 32 is batch size and 47 is one less than my specified max_length.
I'm sure I need to reshape the mask, but I couldn't find out anywhere how.
Happy to make a git repo with the whole thing and/or full stack trace if need be.
Custom ELMo Layer:
class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable = True
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable, name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True, signature='default',)['elmo']
return result
# Original compute_mask function. Raises;
# InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '__PAD__')
def compute_output_shape(self, input_shape):
return input_shape[0], 48, self.dimensions
The model is built as follows:
def build_model(): # uses crf from keras_contrib
input = layers.Input(shape=(1,), dtype=tf.string)
model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
crf = CRF(num_tags)
out = crf(model)
model = Model(input, out)
model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
model.summary()
return model

How to set an initial state for a Bidirectional LSTM Layer in Keras?

I'm trying to set the initial state in an encoder which is composed of a Bidirectional LSTM Layer to 0's. However, if I input a single 0's matrix I get an error saying that a bidirectional layer has to be initialized with a list of tensors (makes sense). When I try to duplicate this 0's matrix into a list containing two of them (to initialize both RNNs), I get an error that the input shape is wrong. What am I missing here?
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
self.lstmb = keras.layers.Bidirectional(lstm(self.enc_units, dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=[hidden, hidden])
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
The error I get is:
ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(128, 512), ndim=2)]; however `cell.state_size` is [512, 512]
Note: the output of the function initialize_hidden_state is fed to the parameter hidden for the call function.
Reading all of the comments and answers, I think I managed to create a working example.
But first some notes:
I think the call to self.lstmb will only return all five states if, you specify it in the LSTM's constructor.
I don't think you need to pass hidden state as a list of hidden states. You should just pass it as the initial state.
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
# tell LSTM you want to get the states, and sequences returned
self.lstmb = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(self.enc_units,
return_sequences=True,
return_state=True,
dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
# no need to pass [hidden, hidden], just pass it as is
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=hidden)
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(self):
# I stole this idea from iamlcc, so the credit is not mine.
return [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, forward_h, forward_c, backward_h, backward_c = encoder(example_input_batch, sample_hidden)
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder forward_h shape: (batch size, units) {}'.format(forward_h.shape))
print('Encoder forward_c shape: (batch size, units) {}'.format(forward_c.shape))
print('Encoder backward_h shape: (batch size, units) {}'.format(backward_h.shape))
print('Encoder backward_c shape: (batch size, units) {}'.format(backward_c.shape))
You are inputting a state size of (batch_size, hidden_units) and you should input a state with size (hidden_units, hidden_units). Also it has to have 4 initial states: 2 for the 2 lstm states and 2 more becuase you have one forward and one backward pass due to the bidirectional.
Try and change this:
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
To
def initialize_hidden_state(enc_units, enc_units):
init_state = [np.zeros((enc_units, enc_units)) for i in range(4)]
return init_state
Hope this helps
If it's not too late, I think your initialize_hidden_state function should be:
def initialize_hidden_state(self):
init_state = [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
return init_state
#BCJuan has the right answer, but I had to make some changes to make it work:
def initialize_hidden_state(batch_sz, enc_units):
init_state = [tf.zeros((batch_sz, enc_units)) for i in range(2)]
return init_state
Very important: use tf.zeros not np.zeros since it is expecting a tf.tensor type.
If you are using a single LSTM layer in the Bidirectional wrapper, you need to return a list of 2 tf.tensors to init each RNN. One for the forward pass, and one for the backward pass.
Also, if you look at an example in TF's documentation, they use batch_sz and enc_units to specify the size of the hidden state.
I ended up not using the bidirectional wrapper, and just create 2 LSTM layers with one of them receiving the parameter go_backwards=True and concatenating the outputs, if it helps anyone.
I think the bidirectional Keras wrapper can't handle this sort of thing at the moment.
I constructed my encoder with tf.keras.Model, and met the same error. this PR may help you.
Finally I built my model by tf.keras.layers.layer, and I'm still working on it. I'll update after I success!

How does batching work in a seq2seq model in pytorch?

I am trying to implement a seq2seq model in Pytorch and I am having some problem with the batching.
For example I have a batch of data whose dimensions are
[batch_size, sequence_lengths, encoding_dimension]
where the sequence lengths are different for each example in the batch.
Now, I managed to do the encoding part by padding each element in the batch to the length of the longest sequence.
This way if I give as input to my net a batch with the same shape as said, I get the following outputs:
output, of shape [batch_size, sequence_lengths, hidden_layer_dimension]
hidden state, of shape [batch_size, hidden_layer_dimension]
cell state, of shape [batch_size, hidden_layer_dimension]
Now, from the output, I take for each sequence the last relevant element, that is the element along the sequence_lengths dimension corresponding to the last non padded element of the sequence. Thus the final output I get is of shape [batch_size, hidden_layer_dimension].
But now I have the problem of decoding it from this vector. How do I handle a decoding of sequences of different lengths in the same batch? I tried to google it and found this, but they don't seem to address the problem. I thought of doing element by element for the whole batch, but then I have the problem to pass the initial hidden states, given that the ones from the encoder will be of shape [batch_size, hidden_layer_dimension], while the ones from the decoder will be of shape [1, hidden_layer_dimension].
Am I missing something? Thanks for the help!
You are not missing anything. I can help you since I have worked on several sequence-to-sequence application using PyTorch. I am giving you a simple example below.
class Seq2Seq(nn.Module):
"""A Seq2seq network trained on predicting the next query."""
def __init__(self, dictionary, embedding_index, args):
super(Seq2Seq, self).__init__()
self.config = args
self.num_directions = 2 if self.config.bidirection else 1
self.embedding = EmbeddingLayer(len(dictionary), self.config)
self.embedding.init_embedding_weights(dictionary, embedding_index, self.config.emsize)
self.encoder = Encoder(self.config.emsize, self.config.nhid_enc, self.config.bidirection, self.config)
self.decoder = Decoder(self.config.emsize, self.config.nhid_enc * self.num_directions, len(dictionary),
self.config)
#staticmethod
def compute_decoding_loss(logits, target, seq_idx, length):
losses = -torch.gather(logits, dim=1, index=target.unsqueeze(1)).squeeze()
mask = helper.mask(length, seq_idx) # mask: batch x 1
losses = losses * mask.float()
num_non_zero_elem = torch.nonzero(mask.data).size()
if not num_non_zero_elem:
return losses.sum(), 0 if not num_non_zero_elem else losses.sum(), num_non_zero_elem[0]
def forward(self, q1_var, q1_len, q2_var, q2_len):
# encode the query
embedded_q1 = self.embedding(q1_var)
encoded_q1, hidden = self.encoder(embedded_q1, q1_len)
if self.config.bidirection:
if self.config.model == 'LSTM':
h_t, c_t = hidden[0][-2:], hidden[1][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2), torch.cat(
(c_t[0].unsqueeze(0), c_t[1].unsqueeze(0)), 2)
else:
h_t = hidden[0][-2:]
decoder_hidden = torch.cat((h_t[0].unsqueeze(0), h_t[1].unsqueeze(0)), 2)
else:
if self.config.model == 'LSTM':
decoder_hidden = hidden[0][-1], hidden[1][-1]
else:
decoder_hidden = hidden[-1]
decoding_loss, total_local_decoding_loss_element = 0, 0
for idx in range(q2_var.size(1) - 1):
input_variable = q2_var[:, idx]
embedded_decoder_input = self.embedding(input_variable).unsqueeze(1)
decoder_output, decoder_hidden = self.decoder(embedded_decoder_input, decoder_hidden)
local_loss, num_local_loss = self.compute_decoding_loss(decoder_output, q2_var[:, idx + 1], idx, q2_len)
decoding_loss += local_loss
total_local_decoding_loss_element += num_local_loss
if total_local_decoding_loss_element > 0:
decoding_loss = decoding_loss / total_local_decoding_loss_element
return decoding_loss
You can see the complete source code here. This application is about predicting users' next web-search query given the current web-search query.
The answerer to your question:
How do I handle a decoding of sequences of different lengths in the same batch?
You have padded sequences, so you can consider as all the sequences are of the same length. But when you are computing loss, you need to ignore loss for those padded terms using masking.
I have used a masking technique to achieve the same in the above example.
Also, you are absolutely correct on: you need to decode element by element for the mini-batches. The initial decoder state [batch_size, hidden_layer_dimension] is also fine. You just need to unsqueeze it at dimension 0, to make it [1, batch_size, hidden_layer_dimension].
Please note, you do not need to loop over each example in the batch, you can execute the whole batch at a time, but you need to loop over the elements of the sequences.

Categories

Resources