Improving BERT by training on additional data [closed]

Improving BERT by training on additional data [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a BERT multilanguage model from Google. And I have a lot of text data in my language (Korean). I want BERT to make better vectors for texts in this language. So I want to additionally train BERT on that text corpus I have. Like if I would have w2v model trained on some data and would want to continue training it. Is it possible with BERT?
There are a lot of examples of "fine-tuning" BERT on some specific tasks like even the original one from Google where you can train BERT further on your data. But as far as I understand it (I might be wrong) we do it within our task-specified model (for classification task for example). So... we do it at the same time as training our classifier (??)
What I want is to train BERT further separately and then get fixed vectors for my data. Not to build it into some task-specified model. But just get vector representation for my data (using get_features function) like they do in here. I just need to train the BERT model additionally on more data of the specific language.
Would be endlessly grateful for any suggestions/links on how to train BURT model further (preferably Tensorflow). Thank you.

Package transformers provides code for using and fine-tuning of most currently popular pre-trained Transformers including BERT, XLNet, GPT-2, ... You can easily load the model and continue training.
You can get the multilingual BERT model:
tokenizer = BertTokenizer.from_pretrained('bert-base-multiligual-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-multiligual-cased')
The tokenizer is used both for tokenizing the input and for converting the sub-words into embedding ids. Calling the model on the subword indices will give you hidden states of the model.
Unfortunately, the package does not implement the training procedure, i.e., the masked language model and the next sentence prediction. You will need to write it yourself, but the training procedure well described in the paper and the implementation will be straightforward.

Related

how to improving the pre-trained embedding

I am currently using the pre-trained elmo model provided by tensorflow_hub.
I want to express words such as technical terms and abbreviations well through Elmo embedding.
Is there a way to improve the pre-trained elmo by additionally learning new documents?

Word Embedding for text classification [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am new in the NLP community and need more light on something.
I saw that Keras has an Embedding layer that is generally used before the LSTM layer. But what algorithm hides behind it? Is it Word2Vec, Glove or something else?
My task is a supervised text classification problem.

The embedding layer is a randomly initialized matrix, with the dimension of (number_of_words_in_vocab * embedding_dimension). The embedding_dimension is custom defined dimension, and an hyper-parmeter that we will have to choose.
Here, the embeddings are updated during back-propagation, and are learnt from your task and task-specific corpus.
However, pre-trained embeddings such as word2vec, glove are learnt in an unsupervised manner on huge corpus. Pre-trianed embeddings provides a good initialization for this embedding layer. Thus, you can use the pre-trained embeddings to initialize this embedding layer, and also choose if you want to freeze these emebeddings or update these embeddings during the back-propagation.

how to build a model for computer vision without using pre trained models [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am a total rookie in computer vision. I am looking to build a model without using pre-trained models for coco dataset or any open-source image datasets. Any articles or references to build such models would be appreciated. I would like to build this model from scratch and make no suggestions on pre-existing trained models or Api are irrelevant to this question. and thanks in advance for any suggestions. the programming language of preference for this project is python

How about this tutorial on keras blogs:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Should be pretty straightforward and it is written in a step by step manner by the author of Keras. It has these three stages, but you only need the first one:
training a small network from scratch (as a baseline)
using the bottleneck features of a pre-trained network
fine-tuning the top layers of a pre-trained network

Tensorflow: API recommendation for convolutional neural net? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have recently been looking into using TensorFlow for creating a custom CNN, and have been attempting to use the tutorials for insight on the most straightforward way to design, train, and deploy an image classification network.
The two approaches that have stood out to me are:
TF Layers API: This API seems to provide the most straightforward and intuitive way of defining a network, layer by layer. That said, the way that they train and evaluate the model uses the tf.learn.Estimator class, which seems a bit limiting in that the network is strictly trained using the Estimator's fit() method and validated using the evaluate() method. This tutorial does not even use a tf.Session.
Low-level API: Defining a network seems a bit more tedious. Also, training and deploying is done in a very manual fashion, but it appears to offer more control.
For a TensorFlow novice looking to implement and train relatively basic CNN's who is looking for the ability to tinker with network architecture and basic hyperparameter tuning, what would be the best API to get familiar with?
Also, if there are any useful tutorials or examples using your preferred interface, links would be greatly appreciated.

Keras is a nice frontend for Tensorflow. It sounds like it should fit your needs. Here is an example of someone training a CNN with Keras.

I like what is described on this page: write your stuff "by hand", but give the model a transparent class interface to the outside. TF graph stuff is handled as properties and set up at construction, and then the model can be used without having to know TF. A bit like Keras, but giving you full control (and forcing you to learn the low level). It lacks Keras' composability, though.
Basically, it recommends you do something along the lines of this:
class Model:
#lazy_property
def prediction(self):
...
#lazy_property
def optimize(self):
# actual TF stuff here, e.g.:
cross_entropy = -tf.reduce_sum(self.target, tf.log(self.prediction))
optimizer = tf.train.RMSPropOptimizer(0.03)
return optimizer.minimize(cross_entropy)
#lazy_property
def error(self):
...
If it interests you, I tried to package up that approach with a common base class and decorators here. I tried to stick to the Keras API, and sessions are explicitly handled with withs. The code, however, has not actually been used in training -- I just wrote it after getting fed up with repeating everything in serveral university projects, and wanting to produce something cleaner.

What method should I use to convert words into features for Machine Learning applications? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am planning on building a gender classifier. I know the two popular models are tf-idf and word2vec.
While tf-idf focuses on the importance of a word in a document and similarity of documents, word2vec focuses more on the relationship between words and similarity between them.
However none of theme seem to be perfect for building vector features to be used for gender classification. Is there any other alternative vectorization model that might suit this task?

Yes, there is another alternative to w2v: GloVe.
GloVe stands for Global Vector Embeddings.
As someone who has used this technique before to good effect, I would recommend GloVe.
GloVe optimally trains neural word embeddings not just by looking at local windows but considering a much larger width (30+ size), thereby embedding a much deeper level of semantics to the embedding.
With glove, it is easy to model relationships such as: X[man] - X[woman] = X[king] - X[queen], where these are all vectors.
Credits: GloVe GitHub page (linked below).
You can train your own GloVe embeddings, or you may use their retrained models available. Even for specific domains, the general models seem to work reasonably well, although you would get a lot more out of your models if you trained them yourself. Please look at the GitHub page for instructions on how to train your own models. It is very easy.
Additional reading:
GloVe: Global Vectors for Word Representation
GloVe repository

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.