Are Transformers (positional embedding + encoder) slow to train?

Are Transformers (positional embedding + encoder) slow to train? - python

I am completely new to transformers. I built a transformer-based model that has the encoder and positional embedding parts only. I stacked 12 of them. To classify around 1 million samples of Time series data. the model is very very slow ( around half an hour for each epoch). My GPU: RTX 3080 on a laptop.
Is it normal for Transformers to learn slowly?
Is there any way to improve the performance?
Is there an easy way to tune the hyperparameter with a highly skewed and very noisy dataset?
I tried diffrent learning rates to speed up the process, 0.001 is given me not bead results but a very slow processs. I implemented followe the tensoerflow implementation.

Related

Is there a way we can calculate estimated model training time when applying machine learning with scikitlearn or otherwise?

Many a times specially when the dataset is large or has multiple features, it takes ages (long hours) for sci-kit learn to train the model. Since it is using the computational resources, working on other things during this time on the same machine becomes exceptionally slow, thus, reducing overall productivity.
Is there a way to estimate the time required for training a model? It doesnt have to be actually beforehand but it can be estimated once the training has started.
I have tried scitime but thats a very invasive method. Would prefer a method that is more tightly coupled with sklearn functionality.

Which (ML/DL) model is best for multi class classification for smaller datasets?

I am working with health dataset.
The dataset is about body signals (8 features) and the target variable is body failing Temperature.
There were 6 different temperatures or Multi classes. (targets)
My data set is of shape (1500*9) - Numerical Data
I fitted my data with RMClassifier, but it shows a accuracy of around 80%
But i needed my accuracy & F1 score to be improved even more.
On the other hand I am tweaking some parameters for better accuracy.
Apart from Random Forest, I would like to get some suggestion, which model would be the best choice fr my above problem. Since my dataset is small, I am not sure about selecting the best ML model
I thought of going with boosting,SVM or Neural Nets.
Kindly share your thoughts.

To find the best model for your problem you can use GridSearchCV of Scikit-learn. Use pipeline and configure the GridSearchCV to experiment with different learning methods changing their hyper-parameters. It will find the best ML model for you.
A group of researchers found with quality and quantity data the performance of different ML models vary a little (Hands-On Machine Learning with Scikit-Learn and TensorFlow, first edition, page 23). You should also spend some effort on feature engineering to see if you can increase the number of features. You can get some idea from this Titanic solution

How one can quickly verify that a CNN actually learns?

I tried to build a CNN from scratch based on LeNet architecture from this article
I implemented backdrop and now trying to train it on the MNIST dataset using SGD with 16 batch size. I want to find a quick way to verify that the learning goes well and there are no bugs. For this, I visualize loss for every 100th batch but it takes too long on my laptop and I don't see an overall dynamic (the loss fluctuates downwards, but occasionally jumps up back so I am not sure). Could anyone suggest a proven way to find that the CNN works well without waiting many hours of training?

The MNIST consist of 60k datasets of 28 * 28 pixel.Training a CNN with batch size 16 will have 4000 forward pass per epochs.
Now taking into consideration that your are using LeNet which not a very deep model.
I would suggest you to do followings:
Check your PC specifications such as RAM,Processor,GPU etc.
Try your to train your model on cloud service such Google Colab, Kaggle and others
Try a batch size of 128 or 64
Try to normalize your image data set before training
Training speed also depends on machine learning framework you are using such as Tensorflow, Pytorch etc.
I hope this will help.

How should I approach a 300 classes classification machine learning problem?

I am trying to make a Multi-Class classification application, but my dataset has 300 classes, is it possible to train my model with all these classes with a normal PC?

Sure it is. You can even train imagenet with 1000 categories or more, if you have enough time! ;)
You just have to think about which loss function you want (categorical crossentropy, sparse categorical crossentropy or even binary if you want to penalize each output node independently), apart from that there's not really much difference between 10, 100 or a 1000 classes.
Of course you have to increase your model size to account for more classes, so RAM may be an issue, but then you can always decrease batch size. If you are using images and convnets and your model is still too large, try to downsample the images, use pooling layers or larger strides.
If your computer is too old and slow, you can also try Google Colab which offers free GPU and even TPU online!

It is difficult to answer this question. The training time of your model depends on a number of factors. It might be best to train your model for a certain amount of hours and evaluate the performance. You could make use of fitting a learning curve, which could provide an esitmation of how many data points your require for training to achieve a certain performance. After that you could link the required amount of data points to computation time.
Here is an article provides an algorithm for fitting a learning curve: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307431/.

SciKit One-class SVM classifier training time increases exponentially with size of training data

I am using the Python SciKit OneClass SVM classifier to detect outliers in lines of text. The text is converted to numerical features first using bag of words and TF-IDF.
When I train (fit) the classifier running on my computer, the time seems to increase exponentially with the number of items in the training set:
Number of items in training data and training time taken:
10K: 1 sec, 15K: 2 sec, 20K: 8 sec, 25k: 12 sec, 30K: 16 sec, 45K: 44 sec.
Is there anything I can do to reduce the time taken for training, and avoid that this will become too long when training data size increases to a couple of hundred thousand items ?

Well scikit's SVM is a high-level implementation so there is only so much you can do, and in terms of speed, from their website, "SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation."
You can increase your kernel size parameter based on your available RAM, but this increase does not help much.
You can try changing your kernel, though your model might be incorrect.
Here is some advice from http://scikit-learn.org/stable/modules/svm.html#tips-on-practical-use: Scale your data.
Otherwise, don't use scikit and implement it yourself using neural nets.

Hope I'm not too late. OCSVM, and SVM, is resource hungry, and the length/time relationship is quadratic (the numbers you show follow this). If you can, see if Isolation Forest or Local Outlier Factor work for you, but if you're considering applying on a lengthier dataset I would suggest creating a manual AD model that closely resembles the context of these off-the-shelf solutions. By doing this then you should be able to work either in parallel or with threads.

For anyone coming here from Google, sklearn has implemented SGDOneClassSVM, which "has a linear complexity in the number of training samples". It should be faster for large datasets.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Are Transformers (positional embedding + encoder) slow to train? - python

Related

Is there a way we can calculate estimated model training time when applying machine learning with scikitlearn or otherwise?

Which (ML/DL) model is best for multi class classification for smaller datasets?

How one can quickly verify that a CNN actually learns?

How should I approach a 300 classes classification machine learning problem?

SciKit One-class SVM classifier training time increases exponentially with size of training data

Categories

Resources