BERT model classification with many classes

BERT model classification with many classes - python

I want to train a BERT model to perform a multiclass text classification. I use transformers and followed this tutorial (https://towardsdatascience.com/multi-class-text-classification-with-deep-learning-using-bert-b59ca2f5c613) to train it on Google Colab.
The issue is that I have a huge number of classes (about 600) and I feel like it affects the performance that is quite disappointing.
I looked a bit on Stackoverflow and found this thread (Intent classification with large number of intent classes) that answered my question but I don't know how to implement it.
The answer to the similar question was: "If you could classify your intents into some coarse-grained classes, you could train a classifier to specify which of these coarse-grained classes your instance belongs to. Then, for each coarse-grained class train another classifier to specify the fine-grained one. This hierarchical structure will probably improve the results. Also for the type of classifier, I believe a simple fully connected layer on top of BERT would suffice."
Do I have to train my models separately and use "if" conditions to build tbhe workflow or is there a way to train all your BERT models simultaneously and have one unifying model ?
Thanks in advance

Related

How to improve language model ex: BERT on unseen text in training?

so I am using pre-trained language model for binary classification. I fine-tune the model by training on data my downstream task. The results are good almost 98% F-measure.
However, when I remove a specific similar sentence from the training data and add it to my test data, the classifier fails to predict the class of that sentence. For example, sentiment analysis task
"I love the movie more specifically the acting was great"
I removed from training all sentences containing the words " more specifically" and surprisingly in the test set they were all misclassified, so the precision decreased by a huge amount.
Any ideas on how can I further fine-tune/improve my model to work better on unseen text in training to avoid the problem I described above? (of course without feeding the model on sentences containing the words "more specifically"
Note: I observed the same performance regardless of the language model in use (BERT, RoBERTa etc).

I think you might have the problem of overfitting, i.e. your model is focussing on too specific features for it to generalize well. This can lead to results such as yours, where some obscure part of the sentence is the main factor for a correct classification.
There are multiple ways to solve this (see here). One of which is cross-validation where you rotate your validation set, another is to use dropout layers, yet another is to not let your model train too long, which can also lead to overfitting.

Optimum number of target class objects for yolov5 custom model training

I am trying to train a custom object detector, is ther a limit on the number of target class objects that the yolov5 architecture can be trained on.
For example- coco dataset has 80 target class, suppose I have 500 object types to detect, is it advisable to use yolov5.
Can this be explain with reasons.

You can add as many classes you want to any network.
The yolo architecture is known for giving more attention to inference time rather than to performance. Although achieving good results on conventional datasets, the yolo model is built for speed.
But essentially you want a network that has a good backbone (deep and wide) that can really obtain rich features from your image.
From my experience, there is really no straight forward answer. It depends on your dataset as well, if you have large/medium/small objects to detect. I really recommend trying out different models, because every single model, will perform differently on custom datasets. From here you select the best one. State-of-the-art models don't directly relate to the best model on transfer learning and fine-tuning.
The Yolo and all other single shot detectors, for me, were the ones who worked best for fine-tuning purposes (RetinaNet was best for my use cases so far), they are good for hyper parameter tuning because you can train them fast and test what works and what doesn't. With two stage detectors (Faster-RCNN etc) I never achieved overall good results, mainly because the training process is different and much slower.
I recommend you read this article, it explains both architecture types, pros and cons.
Additionally if you want to train a model for more than 500 classes, Tensorflow Object Detection API has pre-trained models for the OpenImages dataset (600 classes), and there is the Detectron 2 on LVIS dataset (1200 classes). I recommend starting with models which were trained on a higher number of classes if you want to fine tune to a similar number of classes in your dataset.

How do you train a pytorch model with multiple outputs

I am trying to train a model with the following data
The image is the input and 14 features must be predicted.
Could I please know how to go about training such a model?
Thank you.

These are not really features as far as I am concerned. These are classes and if I got it correctly, your images sometimes belong to more than one classes.
This is a very broad question but I think here might be a good start to learn more about multi-label image classification.
Note that your model should not be much different than an image classification model that is used for cifar10 challenge, for example. But you need to structure your data and choose your loss function accordingly.

Intent classification with large number of intent classes

I am working on a data set of approximately 3000 questions and I want to perform intent classification. The data set is not labelled yet, but from the business perspective, there's a requirement of identifying approximately 80 various intent classes. Let's assume my training data has approximately equal number of each classes and is not majorly skewed towards some of the classes. I am intending to convert the text to word2vec or Glove and then feed into my classifier.
I am familiar with cases in which I have a smaller number of intent classes, such as 8 or 10 and the choice of machine learning classifiers such as SVM, naive bais or deeplearning (CNN or LSTM).
My question is that if you have had experience with such large number of intent classes before, and which of machine learning algorithm do you think will perform reasonably? do you think if i use deep learning frameworks, still large number of labels will cause poor performance given the above training data?
We need to start labelling the data and it is rather laborious to come up with 80 classes of labels and then realise that it is not performing well, so I want to ensure that I am making the right decision on how many classes of intent maximum I should consider and what machine learning algorithm do you suggest?
Thanks in advance...

First, word2vec and GloVe are, almost, dead. You should probably consider using more recent embeddings like BERT or ELMo (both of which are sensitive to the context; in other words, you get different embeddings for the same word in a different context). Currently, BERT is my own preference since it's completely open-source and available (gpt-2 was released a couple of days ago which is apparently a little bit better. But, it's not completely available to the public).
Second, when you use BERT's pre-trained embeddings, your model has the advantage of seeing a massive amount of text (Google massive) and thus can be trained on small amounts of data which will increase it's performance drastically.
Finally, if you could classify your intents into some coarse-grained classes, you could train a classifier to specify which of these coarse-grained classes your instance belongs to. Then, for each coarse-grained class train another classifier to specify the fine-grained one. This hierarchical structure will probably improve the results. Also for the type of classifier, I believe a simple fully connected layer on top of BERT would suffice.

How do I perform machine learning classification that can output two simultaneous class predictions?

amateur machine learning programmer here. I would like to perform a classification task wherein two simultaneous class predictions could occur.
For instance, in flowers image classification. Apart from being able to classify an image of a rose, or an orchid; I would also like to be able to classify if an image contains both roses and orchids simultaneously. Do I have to train my model to distinguish "Rose + Orchid" as an independent class?
Here's an example image of the task.

In scikit learn all classifiers which have prob_a function have your specification. This function returns the probability of assigning each class to the input x. Hence, you can use SVC, logistic regression, naive Bayes, random forest or any explained classifier in scikit learn (if you are seeking the specified classifier in scikit learn) based on your problem.
When you found the prob_a for each class, if the difference between two most probable class is near to each other, you can introduce the input with the two most probable class.

This is called as multi-label classification problem. There are many approaches to solve this problem. Sklearn documentation about multi-label classification.
An example with sklearn

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.