I am currently using the pre-trained elmo model provided by tensorflow_hub.
I want to express words such as technical terms and abbreviations well through Elmo embedding.
Is there a way to improve the pre-trained elmo by additionally learning new documents?
Related
I am coding my own models for a time but I saw huggingface and started using it. I wanted to know whether I should use the pretrained model or train model (the same hugging face model) with my own dataset. I am trying to make a question answering model.
I have dataset of 10k-20k questions.
The state-of-the-art approach is to take a pre-trained model that was pre-trained on tasks that are relevant to your problem and fine-tune the model on your dataset.
So assuming you have your dataset in English, you should take a pre-trained model on natural language English. You can then fine-tune it.
This will most likely work better than training from scratch, but you can experiment on your own. You can also load a model without the pre-trained weights in Huggingface.
I have a textual dataset of 15000 rows and a label for each row. since my dataset is in the clinical domain, I want to use BIOBERT pre-trained word embeddings on the textual data using tensorflow and then use it as an input to a CNN network for prediction. Can anyone please show me how to implement this on any textual field using python and tensorflow.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a BERT multilanguage model from Google. And I have a lot of text data in my language (Korean). I want BERT to make better vectors for texts in this language. So I want to additionally train BERT on that text corpus I have. Like if I would have w2v model trained on some data and would want to continue training it. Is it possible with BERT?
There are a lot of examples of "fine-tuning" BERT on some specific tasks like even the original one from Google where you can train BERT further on your data. But as far as I understand it (I might be wrong) we do it within our task-specified model (for classification task for example). So... we do it at the same time as training our classifier (??)
What I want is to train BERT further separately and then get fixed vectors for my data. Not to build it into some task-specified model. But just get vector representation for my data (using get_features function) like they do in here. I just need to train the BERT model additionally on more data of the specific language.
Would be endlessly grateful for any suggestions/links on how to train BURT model further (preferably Tensorflow). Thank you.
Package transformers provides code for using and fine-tuning of most currently popular pre-trained Transformers including BERT, XLNet, GPT-2, ... You can easily load the model and continue training.
You can get the multilingual BERT model:
tokenizer = BertTokenizer.from_pretrained('bert-base-multiligual-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-multiligual-cased')
The tokenizer is used both for tokenizing the input and for converting the sub-words into embedding ids. Calling the model on the subword indices will give you hidden states of the model.
Unfortunately, the package does not implement the training procedure, i.e., the masked language model and the next sentence prediction. You will need to write it yourself, but the training procedure well described in the paper and the implementation will be straightforward.
I tried using the following code snippet. Grounding the pretrained embeddings and learning embeddings only for the new vocab. But the embeddings for the predefined words also got changed.
I am new to Deep Learning and I want to explore Deep Learning for NLP. I went through word embeddings and tested them in gensim word2vec. I also heard about pre-trained models. I am confused about the difference between pre-trained models and training the model yourself, and how to use the results.
I want to apply it in keras because I do not want to write formulas and all in Theano or Tensorflow.
When training word2vec with gensim, the result you achieve is a representation of the words in your vocabulary as vectors. The dimension of these vectors is the size of the neural network.
The pre-trained word2vec models simply contain a list of those vectors that were pre-trained on a large corpus. You will find pre-trained vectors of various sizes.
How to use those vector representations? That depends on what you want to do. Some interesting properties have been shown for these vectors: it has been shown that the vector for 'man' + 'king' - 'woman' will often result in the closest match to the vector 'woman'. You may also consider using the word vectors as input for another neural network/computation model.
Gensim is a very optimized library to perform the CBOW and skip-gram algorithms but if you really want to set up your neural network yourself, you will first have to learn about the structure of CBOW and skip-gram and learn how to code it in keras for example. This should not be particularly complex and a google search for these subjects should provide you with many results to help you along.