I found an implementation of the DeepSpeech2 model here: https://github.com/tensorflow/models/tree/master/research/deep_speech#deepspeech2-model
This implementation has only to train and evaluate and no pre-trained models available. So I trained the model with a minimal dataset myself and tried to run inference but I didn't get meaningful results. So here my question is
Does this implementation work well if I train with a large amount of dataset?
Is there a pre-trained model available for this?
Thanks in advance
Related
I am coding my own models for a time but I saw huggingface and started using it. I wanted to know whether I should use the pretrained model or train model (the same hugging face model) with my own dataset. I am trying to make a question answering model.
I have dataset of 10k-20k questions.
The state-of-the-art approach is to take a pre-trained model that was pre-trained on tasks that are relevant to your problem and fine-tune the model on your dataset.
So assuming you have your dataset in English, you should take a pre-trained model on natural language English. You can then fine-tune it.
This will most likely work better than training from scratch, but you can experiment on your own. You can also load a model without the pre-trained weights in Huggingface.
I just started to learn pytorch.
However, do you know the method of creating pretrained weight for SSD pytorch?
We have a custom dataset, so we want to create pretrained weight with the custom dataset using VGG16 to enhance the performance of SSD.
Then bring the weight from there will be used for SSD.
Let me know the feasibility of it.
Thank you in advance
Pretrained weights are acquired by training the neural network on a large dataset such as ImageNet in a classification task. It is common for libraries to provide an option to load the weights from such training (hence the name pre-trained model): for instance, models found in torchvision.models has a pretrained option.
If you have a custom dataset then you will have to train your model either from scratch using a randomly initialized model, or starting training from ImageNet weights which is usually the best option.
I'm doing sentiment analysis of Spanish tweets.
After reviewing some of the recent literature, I've seen that there's been a most recent effort to train a RoBERTa model exclusively on Spanish text (roberta-base-bne). It seems to perform better than the current state-of-the-art model for Spanish language modeling so far, BETO.
The RoBERTa model has been trained for a variety of tasks, which do not include text classification.
I want to take this RoBERTa model and fine-tune it for text classification, more specifically, sentiment analysis.
I've done all the preprocessing and created the dataset objects, and want to natively train the model.
Code
# Training with native TensorFlow
from transformers import TFRobertaForSequenceClassification
model = TFRobertaForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne")
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
Question
My questions is regarding the TFRobertaForSequenceClassification:
Is it correct to use this, since it's not specified in the model card? Instead of the AutoModelForMaskedLM specified in the model card.
Do we, by simply applying TFRobertaForSequenceClassification, imply that it will automatically apply the trained (and pretrained) knowledge to the new task, namely text classification?
The model in the model card references what essentially the model has been trained on. If you are familiar with architectural choices for different modeling tasks (e.g., token classification vs sequence classification), it should become clear that these models will have slightly different layouts, specifically in the layers after the Transformer output layer. For token classification, this is (generally speaking) Dropout and an additional linear layer, mapping from the hidden_size of the model to the number of output classes. See here for an example with BERT.
This means that the model checkpoint which was pre-trained with a different learning objective will not have weights for this final layer, but instead you train these (comparably few) parameters during your fine-tuning. In fact, for PyTorch models you will generally get a warning when loading a model checkpoint that slightly differs in the available weights:
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: [...]
This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). [...]
This is exactly what you are doing, so as long as you have a decent number of fine-tuning examples (depending on the number of classes, I would suggest 10e3-10e4 as a rule of thumb), this will not affect your training by much.
I want to point out, however, that it might be necessary for you to specify the number of labels that your TokenClassification layer has. You can do this, by specifying it during the loading of your model:
from transformers import TFRobertaForSequenceClassification
roberta = TFRobertaForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne",
num_labels=<your_value>)
I had a pre-trained model(tensorflow model) which was trained using data from publicly available data set. I had meta file and ckpt file. I’d like to train my tensorflow model using new data from privately obtained data set. I have small dataset, so I’d like to fine-tune my model according to ‘Strategy 2’ or ‘Strategy 3’.
Strategy 2: Train some layers and leave the others frozen.
Strategy 3: Freeze the convolutional base.
Reference site: https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
However, I couldn’t find sample code which is implemented in a transfer learning and fine-tuning for tensorflow model. There are many examples with keras model. How can I implement in a transfer learning and fine-tuning for my tensorflow model?
If you don't have to use Tensorflow's functions, You can use example code with tf.keras module of Tensorflow 2.0 also..
I built a CNN model for image classification using the Keras library. However training takes many hours. Once I trained my model, how can I use it without training once more? I mean after I trained my model, I want to use it many times.
Because I will use my model in android studio.
Any help is appreciated
Thank YOU...
EDIT
When I wrote this question, I did not know the save model and load.model, in the answers you see the appropriate usage of them.
You can easily save your model after the training process by using:
model.save('my_model.h5')
you can later load that model by using:
model = load_model('my_model.h5')
for more details have a look at the documentation: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model