I am trying to check if the mini-batch output is equal to giving all the elements of the mini-batch one by one for evaluating the feature vector of Mobilenet.
Look at the following code:
model = tf.keras.models.Sequential(
(
hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4",
output_shape=[1280],
trainable=False
),
)
)
images = tf.random.uniform(shape=(20, 224, 224, 3))
features = model.predict(images)
for i in range(20):
image = tf.reshape(images[i, ...], (1, 224, 224, 3))
image_feature = model.predict(image)
self.assertTrue((image_feature == features[i, ...]).all())
The assertTrue fails in my test. Should not it give the same feature vector for all the images whether they are feed as a mini-batch or one by one?
I guess it has something to do with what means and variances (of BN layers) the model is using. If the moving means and variances from the training stage are used (and they should be, IMO), the single outputs should be exactly the same with the minibatch output. Difference <= 10e-4 is still large enough to give inconsistent predictions.
Related
here is my setup:
I have an autoencoder model, which generates a new image in grayscale (1, 256, 256, 1) by mixing three input images (3, 256, 256, 1). This works quite well, however I gave up the batch size, so in every training step the gradient is calculated on one data chunk instead of a whole batch.
To train on batches, I wrote a Custom Dataloader with tf.Sequence, to get Datasets of dimension (bs, 3, 256, 256, 1).
Further, I want to train the autoencoder with a discriminator, so I built one and created a "GAN-based-model", to alternately train both. Here is the code for it:
full_model = GanBasedModel(
autoencoder.input, discriminator(autoencoder(autoencoder.input)))
Here, in my GAN-based-model, I customized the train_step function like that:
#tf.function
def train_step(self, train_data):
generated = []
real_images = []
for train_input in train_data:
generated.append(autoencoder(train_input))
# some code to get real_images
generated_images = tf.stack(generated)
# some more code
So I got this error InaccessibleTensorError: tf.Graph captured an external symbolic tensor. The symbolic tensor <tf.Tensor 'while/sequential/decoder/residual_block_16/StatefulPartitionedCall:0' shape=(1, 256, 256, 1) dtype=float32> is captured by FuncGraph(name=train_step, id=140089464958688), but it is defined at FuncGraph(name=while_body_12507, id=140089463711200). A tf.Graph is not allowed to capture symoblic tensors from another graph. Use return values, explicit Python locals or TensorFlow collections to access it. Please see https://www.tensorflow.org/guide/function#all_outputs_of_a_tffunction_must_be_return_values for more information.
from line generated_images = tf.stack(generated).
As far as I understand, splitting train_data in the for-loop creates new tensors train_input, which can not be traced by tensorflow anymore.
So is there a better way to write the train_step function?
Or are there even better approaches to create a Dataloader which provides batches of triples for my autoencoder?
Thanks for any help
I followed the code examples for structured data classification at keras.io to build a model for classifying a rather simple model similar to the one in the example.
I wanted to extend the model to handle a second output, but I cannot use this model to train. The dataset is generated like it is done in the example (but with two results):
res1 = dataframe.pop("result1")
res2 = dataframe.pop("result2")
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),(res1,res2)))
The model is also similar to the example but using a two-dimensional output:
x = layers.Dense(32, activation="relu")(all_features)
x = layers.Dropout(0.5)(x)
output = layers.Dense(2, activation="sigmoid")(x)
model = keras.Model(all_inputs, output)
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
It compiles, but when i try to run fit...
model.fit(train_ds,epochs=30)
I get an error message:
ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))
How can I prepare the dataset to meet the shape constraints?
I believe you should use the zip() function:
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),zip(res1,res2)))
This way, you are telling from_tensor_slices() to zip labels into a new array of shape (N, 2) instead of concatenating two vectors of shape (N, 1) into (2N, 1).
I'm trying to fine-tune the ReformerModelWithLMHead (google/reformer-enwik8) for NER. I used the padding sequence length same as in the encode method (max_length = max([len(string) for string in list_of_strings])) along with attention_masks. And I got this error:
ValueError: If training, make sure that config.axial_pos_shape factors: (128, 512) multiply to sequence length. Got prod((128, 512)) != sequence_length: 2248. You might want to consider padding your sequence length to 65536 or changing config.axial_pos_shape.
When I changed the sequence length to 65536, my colab session crashed by getting all the inputs of 65536 lengths.
According to the second option(changing config.axial_pos_shape), I cannot change it.
I would like to know, Is there any chance to change config.axial_pos_shape while fine-tuning the model? Or I'm missing something in encoding the input strings for reformer-enwik8?
Thanks!
Question Update: I have tried the following methods:
By giving paramteres at the time of model instantiation:
model = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9, max_position_embeddings=1024, axial_pos_shape=[16,64], axial_pos_embds_dim=[32,96],hidden_size=128)
It gives me the following error:
RuntimeError: Error(s) in loading state_dict for ReformerModelWithLMHead:
size mismatch for reformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([258, 1024]) from checkpoint, the shape in current model is torch.Size([258, 128]).
size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([128, 1, 256]) from checkpoint, the shape in current model is torch.Size([16, 1, 32]).
This is quite a long error.
Then I tried this code to update the config:
model1 = transformers.ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8', num_labels = 9)
Reshape Axial Position Embeddings layer to match desired max seq length
model1.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model1.reformer.embeddings.position_embeddings.weights[1][0][:128])
Update the config file to match custom max seq length
model1.config.axial_pos_shape = 16,128
model1.config.max_position_embeddings = 16*128 #2048
model1.config.axial_pos_embds_dim= 32,96
model1.config.hidden_size = 128
output_model_path = "model"
model1.save_pretrained(output_model_path)
By this implementation, I am getting this error:
RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 2. Target sizes: [1, 128, 512, 768]. Tensor sizes: [128, 768]
Because updated size/shape doesn't match with the original config parameters of pretrained model. The original parameters are: axial_pos_shape = 128,512 max_position_embeddings = 128*512 #65536 axial_pos_embds_dim= 256,768 hidden_size = 1024
Is it the right way I'm changing the config parameters or do I have to do something else?
Is there any example where ReformerModelWithLMHead('google/reformer-enwik8') model fine-tuned.
My main code implementation is as follow:
class REFORMER(torch.nn.Module):
def __init__(self):
super(REFORMER, self).__init__()
self.l1 = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9)
def forward(self, input_ids, attention_masks, labels):
output_1= self.l1(input_ids, attention_masks, labels = labels)
return output_1
model = REFORMER()
def train(epoch):
model.train()
for _, data in enumerate(training_loader,0):
ids = data['input_ids'][0] # input_ids from encode method of the model https://huggingface.co/google/reformer-enwik8#:~:text=import%20torch%0A%0A%23%20Encoding-,def%20encode,-(list_of_strings%2C%20pad_token_id%3D0
input_shape = ids.size()
targets = data['tags']
print("tags: ", targets, targets.size())
least_common_mult_chunk_length = 65536
padding_length = least_common_mult_chunk_length - input_shape[-1] % least_common_mult_chunk_length
#pad input
input_ids, inputs_embeds, attention_mask, position_ids, input_shape = _pad_to_mult_of_chunk_length(self=model.l1,
input_ids=ids,
inputs_embeds=None,
attention_mask=None,
position_ids=None,
input_shape=input_shape,
padding_length=padding_length,
padded_seq_length=None,
device=None,
)
outputs = model(input_ids, attention_mask, labels=targets) # sending inputs to the forward method
print(outputs)
loss = outputs.loss
logits = outputs.logits
if _%500==0:
print(f'Epoch: {epoch}, Loss: {loss}')
for epoch in range(1):
train(epoch)
First of all, you should note that google/reformer-enwik8 is not a properly trained language model and that you will probably not get decent results from fine-tuning it. enwik8 is a compression challenge and the reformer authors used this dataset for exactly that purpose:
To verify that the Reformer can indeed fit large models on a single
core and train fast on long sequences, we train up to 20-layer big
Reformers on enwik8 and imagenet64...
This is also the reason why they haven't trained a sub-word tokenizer and operate on character level.
You should also note that the LMHead is usually used for predicting the next token of a sequence (CLM). You probably want to use a token classification head (i.e. use an encoder ReformerModel and add a linear layer with 9 classes on top+maybe a dropout layer).
Anyway, in case you want to try it still, you can do the following to reduce the memory footprint of the google/reformer-enwik8 reformer:
Reduce the number of hashes during training:
from transformers import ReformerConfig, ReformerModel
conf = ReformerConfig.from_pretrained('google/reformer-enwik8')
conf.num_hashes = 2 # or maybe even to 1
model = transformers.ReformerModel.from_pretrained("google/reformer-enwik8", config =conf)
After you have finetuned your model, you can increase the number of hashes again to increase the performance (compare Table 2 of the reformer paper).
Replace axial-position embeddings:
from transformers import ReformerConfig, ReformerModel
conf = ReformerConfig.from_pretrained('google/reformer-enwik8')
conf.axial_pos_embds = False
model = transformers.ReformerModel.from_pretrained("google/reformer-enwik8", config =conf)
This will replace the learned axial positional embeddings with learnable position embeddings like Bert's and do not require the full sequence length of 65536. They are untrained and randomly initialized (i.e. consider a longer training).
The Reformer model was proposed in the paper Reformer: The Efficient Transformer by Nikita Kitaev, Ćukasz Kaiser, Anselm Levskaya.
The paper contains a method for factorization gigantic matrix which is resulted of working with very long sequences! This factorization is relying on 2 assumptions
the parameter config.axial_pos_embds_dim is set to a tuple (d1,d2) which sum has to be equal to config.hidden_size
config.axial_pos_shape is set to a tuple (n1s,n2s) which product has to be equal to config.max_embedding_size
(more on these here!)
Finally your question ;)
I'm almost sure your session crushed duo to ram overflow
you can change any config parameter during model instantiation like
the official documentation!
I have a trained keras model which takes inputs of size (batchSize,2). This works well and gives good results.
My main problem is to have a model which takes an input a vector of size(batchSize,2,16) and slice it inside the model to 16 vectors of size(batchSize,2) and concatenate the outputs together.
I have used this code for this
y = layers.Input(shape=(2,16,))
model_x= load_model('saved_model')
for i in range(16):
x_input = Lambda(lambda x: x[:, :, i])(y)
if i == 0:
x_output = model_x(x_input)
else:
x_output = layers.concatenate([x_output,
model_x(x_input)])
x_output = Lambda(lambda x: x[:, :tf.cast(N, tf.int32)])(x_output)
final_model = Model(y, x_output)
Although the saved model gives me good performance, this code does not trains well and doesn't give the intended performance.
What can I do to get better results?
I can't say anything about the bad performance of your final model because it might be due to various reasons and this is not readily evident from the content of your question. But to answer your original question: yes, you can use for loops that way, because you are essentially creating layers/tensors and connecting them to each other (i.e. building the graph of the model). So it's a valid thing to do. The problem might be somewhere else, e.g. a wrong indexing, a wrong loss function, etc.
Further, you can build your final model in a much simpler approach. You already have a trained model which gets inputs of shape (batch_size, 2) and gives outputs of shape (batch_size, 8). Now you want to build a model which takes inputs of shape (batch_size, 2, 16), apply the already trained model on each of the 16 (batch_size, 2) segments and then concatenate the results. You can easily do that with a TimeDistributed wrapper:
# load your already trained model
model_x = load_model('saved_model')
inp = layers.Input(shape=(2,16))
# this makes the input shape as `(16,2)`
x = layers.Permute((2,1))(inp)
# this would apply `model_x` on each of the 16 segments; the output shape would be (None, 16, 8)
x = layers.TimeDistributed(model_x)(x)
# flatten to make it have a shape of (None, 128)
out = layers.Flatten()(x)
final_model = Model(inp, out)
I am using model.predict() on a testing tensor, which has the same size of the input used for training, (N_tr*70,1025,11,3)
The model is trained by regression, with three outputs as ground-truth, each of size (N_te*70,1025).
For information, when testing the model N_te=180.
According to the documentation, the output of model.predict() should be a numpy tensor, instead I get a list of three elements, each with shape (N_te*70,1025).
I am afraid that the output might have been somehow shuffled (which would explain my unexpected results).
Do you have any advice to get a numpy array which is compatible to the one I used as ground-truth? If not, do you know any other work-around?
EDIT: added the neural network code
input_img = Input(shape=(1025, 11, 3 ) )
x = ( Flatten())(input_img)
for i in range(0,4):
x = ( Dense(1024*3))(x)
x = ( BatchNormalization() )(x)
x = ( LeakyReLU())(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
Model prediction:
output = model.predict(X_in, batch_size = batch_size, verbose=1)
It is expected that in a multi-output model, predict returns a list of numpy arrays, with each element being the corresponding output. Remember that loss is computed individually between each output and the ground truth, so this format is already ideas for that purpose.