I would like to finetune facebook/mbart-large-cc25 on my data using pre-training tasks, in particular Masked Language Modeling (MLM).
How can I do that in HuggingFace?
Edit: rewrote the question for the sake of clarity
Since you are doing everything in HuggingFace, fine-tuning a model on pre-training tasks (assuming that pre-training task is provided in Huggingface) is pretty much the same for most models. What tasks are you interested in fine-tuning mBART on?
Hugginface provides extensive documentation for several fine-tuning tasks. For instance the links provided below will help you fine tune HF models for Language modelling, MNLI, SQuAD etc. https://huggingface.co/transformers/v2.0.0/examples.html and https://huggingface.co/transformers/training.html
Related
i understand that gpt2 is based on the transformer architecture but where is the source code, there are limited resources and no tutorial on how to write one..
I am new to NLP and also if i had to generate novels, would training the transformer on multiple novels help or one?
I think the best way to train GPT and other trasnformers is by using the library https://huggingface.co/docs/transformers. They also have a course that can help you to familiarize with the topic: https://huggingface.co/course/
Yes, transformer models, if they are not too large, can be trained on Colab.
And yes, GPT-like models can be trained to generate novels, but only short ones (like several paragraphs), because almost all such models can work only with texts of limited length.
Yes, it is possible, and it would be better if you use GPU for training. make sure modify num_train_epochs, per_device_train_batch_size and per_gpu_train_batch_size features in TrainingArguments to prevent runtime from crashing! >> RuntimeError: CUDA out of memory
most of the time it will use the whole GPU and RAM and the notebook would crash!
I am trying to train a custom object detector, is ther a limit on the number of target class objects that the yolov5 architecture can be trained on.
For example- coco dataset has 80 target class, suppose I have 500 object types to detect, is it advisable to use yolov5.
Can this be explain with reasons.
You can add as many classes you want to any network.
The yolo architecture is known for giving more attention to inference time rather than to performance. Although achieving good results on conventional datasets, the yolo model is built for speed.
But essentially you want a network that has a good backbone (deep and wide) that can really obtain rich features from your image.
From my experience, there is really no straight forward answer. It depends on your dataset as well, if you have large/medium/small objects to detect. I really recommend trying out different models, because every single model, will perform differently on custom datasets. From here you select the best one. State-of-the-art models don't directly relate to the best model on transfer learning and fine-tuning.
The Yolo and all other single shot detectors, for me, were the ones who worked best for fine-tuning purposes (RetinaNet was best for my use cases so far), they are good for hyper parameter tuning because you can train them fast and test what works and what doesn't. With two stage detectors (Faster-RCNN etc) I never achieved overall good results, mainly because the training process is different and much slower.
I recommend you read this article, it explains both architecture types, pros and cons.
Additionally if you want to train a model for more than 500 classes, Tensorflow Object Detection API has pre-trained models for the OpenImages dataset (600 classes), and there is the Detectron 2 on LVIS dataset (1200 classes). I recommend starting with models which were trained on a higher number of classes if you want to fine tune to a similar number of classes in your dataset.
I want to train a BERT model to perform a multiclass text classification. I use transformers and followed this tutorial (https://towardsdatascience.com/multi-class-text-classification-with-deep-learning-using-bert-b59ca2f5c613) to train it on Google Colab.
The issue is that I have a huge number of classes (about 600) and I feel like it affects the performance that is quite disappointing.
I looked a bit on Stackoverflow and found this thread (Intent classification with large number of intent classes) that answered my question but I don't know how to implement it.
The answer to the similar question was: "If you could classify your intents into some coarse-grained classes, you could train a classifier to specify which of these coarse-grained classes your instance belongs to. Then, for each coarse-grained class train another classifier to specify the fine-grained one. This hierarchical structure will probably improve the results. Also for the type of classifier, I believe a simple fully connected layer on top of BERT would suffice."
Do I have to train my models separately and use "if" conditions to build tbhe workflow or is there a way to train all your BERT models simultaneously and have one unifying model ?
Thanks in advance
I (complete noob in machine learning and natural language processing) am using doc2vec approach (gensim python library) to find most similar document to a random string. Problem is that whenever I'd like to add a new document to trained model I need to retrain model from scratch.
Is there an approach that stands out in ability to add a new document/vocabulary to a trained model without the need to train from scratch or an approach that would train faster?
I'm overwhelmed by all the approaches to nlp and only started with what I found as the most popular (word2vec/doc2vec) and now I'm looking for direction where to study now. Thanks for any recommendations.
I have the code to classify the images as Nude or Non nude. It is implemented on deep learning with tensor flow python. The code can be founded in Tensorflow Implementation of Yahoo's Open NSFW Model
I want to add some more images in to the dataset on order to do fine tuning. How can I do fine tuning in this implementation by using another dataset.
Just load their model and initialize its weights with the ones they provide, similar to how they do it here. Assuming that you are familiar with tensorflow, you should then proceed to train that model on your images.
Besides this blog post I'm not aware of any other publications the team has made on their work. This is a bit of an issue as they don't state their training parameters (choice of optimizer, learning rate, etc.). If you want to fine-tune this model you will have to experiment a bit in this regard.
Do they give you the original dataset that the provided model is trained off of? If so, you can easily just add your own dataset to their dataset, and train a completely new model based on the combined dataset.
I wrote more about this "combined" dataset, where you can add more or less data, here.
Good Luck!