Get weight matrices from gensim word2Vec

Get weight matrices from gensim word2Vec - python

I am using gensim word2vec package in python.
I would like to retrieve the W and W' weight matrices that have been learn during the skip-gram learning.
It seems to me that model.syn0 gives me the first one but I am not sure how I can get the other one. Any idea?
I would actually love to find any exhaustive documentation on models accessible attributes because the official one does not seem to be precise (for instance syn0 is not described as an attribute)

The model.wv.syn0 contains the input embedding matrix. Output embedding is stored in model.syn1 when it's trained with hierarchical softmax (hs=1) or in model.syn1neg when it uses negative sampling (negative>0). That's it! When both hierarchical softmax and negative sampling are not enabled, Word2Vec uses a single weight matrix model.wv.syn0 for training.
See also a related discussion here.

Related

Is it possible to update a Doc2Vec Vector?

I am working with a steadily growing corpus. I train my Document Vector with Doc2Vec which is implemented in Python.
Is it possible to update a Document Vector?
I want to use the Document Vector for Document recommendations.

Individual vectors can be updated, but the gensim Doc2Vec model class doesn't have much support for adding more doc-vectors to itself.
It can, however, return individual vectors for new texts that are compatible (comparable) with the existing vectors, via the .infer_vector(words) method. You can retain these vectors in your own data structures for lookup.
When enough new documents have arrived that you think your core model would be better, if trained on all documents, you can re-train the model with all available data, using it as the new base for .infer_vector(). (Note that vectors from the retrained model won't usually be compatible/comparable with those from the prior model: each training session bootstraps a different self-consistent coordinate space.)

How can we use artificial neural networks to find similar documents?

How can we use ANN to find some similar documents? I know its a silly question, but I am new to this NLP field.
I have made a model using kNN and bag-of-words approach to solve my problem. Using that I can get n number of documents (along with their closeness) that are somewhat similar to the input, but now I want to implement the same using ANN and I am not getting any idea.
Thanks in advance for any help or suggestions.

You can use "word embeddings" - technique, that presents words in the dense vector representation. To find similar documents as the vectors, you can simply use cosine similarity.
An example how to build word2vec model using TensorFlow. One more example how to use embeddings layer from Keras.

The way to obtain embeddings for your language is either training them yourself on your corpus of choice (large enough - e.g. wikipedia) or downloading the trained embeddings (for python there are plenty of sources for embeddings trained or loadable with gensim module - which is a de facto standard for Python word2vec).
You can also use GloVe (using glove-python) or FastText word embeddings.
If you are interested you can find more detailed descriptions of embeddings with code examples and source papers.

Have a look at the paper https://arxiv.org/pdf/1805.10685.pdf that gives you a overall idea.
check this link for more references https://github.com/Hironsan/awesome-embedding-models

Multi-output regression

I have been looking in to Multi-output regression the last view weeks. I am working with the scikit learn package. My machine learning problem has an a input of 3 features an needs to predict two output variables. Some ML models in the sklearn package support multioutput regression nativly. If the models do not support this, the sklearn multioutput regression algorithm can be used to convert it. The multioutput class fits one regressor per target.
Does the mulioutput regressor class or supported multi-output regression algorithms take the underlying relationship of the input variables in to account?
Instead of a multi-output regression algorithm should I use a Neural network?

1) For your first question, I have divided that into two parts.
First part has the answer written in the documentation you linked and also in this user guide topic, which states explicitly that:
As MultiOutputRegressor fits one regressor per target it can not take
advantage of correlations between targets.
Second part of first question asks about other algorithms which support this. For that you can look at the "inherently multiclass" part in the user-guide. Inherently multi-class means that they don't use One-vs-Rest or One-vs-One strategy to be able to handle multi-class (OvO and OvR uses multiple models to fit multiple classes and so may not use the relationship between targets). Inherently multi-class means that they can structure the multi-class setting into a single model. This lists the following:
sklearn.naive_bayes.BernoulliNB
sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.naive_bayes.GaussianNB
sklearn.neighbors.KNeighborsClassifier
sklearn.semi_supervised.LabelPropagation
sklearn.semi_supervised.LabelSpreading
sklearn.discriminant_analysis.LinearDiscriminantAnalysis
sklearn.svm.LinearSVC (setting multi_class=”crammer_singer”)
sklearn.linear_model.LogisticRegression (setting multi_class=”multinomial”)
...
...
...
Try replacing the 'Classifier' at the end with 'Regressor' and see the documentation of fit() method there. For example let's take DecisionTreeRegressor.fit():
y : array-like, shape = [n_samples] or [n_samples, n_outputs]
The target values (real numbers).
Use dtype=np.float64 and order='C' for maximum efficiency.
You see that it supports a 2-d array for targets (y). So it may be able to use correlation and underlying relationship of targets.
2) Now for your second question about using neural network or not, it depends on personal preference, the type of problem, the amount and type of data you have, the training iterations you want to do. Maybe you can try multiple algorithms and choose what gives best output for your data and problem.

Gensim save_word2vec_format() vs. model.save()

I am using gensim version 0.12.4 and have trained two separate word embeddings using the same text and same parameters. After training I am calculating the Pearsons correlation between the word occurrence-frequency and vector-length. One model I trained using save_word2vec_format(fname, binary=True) and then loaded using load_word2vec_format the other I trained using model.save(fname) and then loaded using Word2Vec.load(). I understand that the word2vec algorithm is non deterministic so the results will vary however the difference in the correlation between the two models is quite drastic. Which method should I be using in this instance?

EDIT: this was intended as a comment. Don't know how to change it now, sorry
correlation between the word occurrence-frequency and vector-length I don't quite follow - aren't all your vectors the same length? Or are you not referring to the embedding vectors?

implementing a perceptron classifier

Hi I'm pretty new to Python and to NLP. I need to implement a perceptron classifier. I searched through some websites but didn't find enough information. For now I have a number of documents which I grouped according to category(sports, entertainment etc). I also have a list of the most used words in these documents along with their frequencies. On a particular website there was stated that I must have some sort of a decision function accepting arguments x and w. x apparently is some sort of vector ( i dont know what w is). But I dont know how to use the information I have to build the perceptron algorithm and how to use it to classify my documents. Have you got any ideas? Thanks :)

How a perceptron looks like
From the outside, a perceptron is a function that takes n arguments (i.e an n-dimensional vector) and produces m outputs (i.e. an m-dimensional vector).
On the inside, a perceptron consists of layers of neurons, such that each neuron in a layer receives input from all neurons of the previous layer and uses that input to calculate a single output. The first layer consists of n neurons and it receives the input. The last layer consist of m neurons and holds the output after the perceptron has finished processing the input.
How the output is calculated from the input
Each connection from a neuron i to a neuron j has a weight w(i,j) (I'll explain later where they come from). The total input of a neuron p of the second layer is the sum of the weighted output of the neurons from the first layer. So
total_input(p) = Σ(output(k) * w(k,p))
where k runs over all neurons of the first layer. The activation of a neuron is calculated from the total input of the neuron by applying an activation function. An often used activation function is the Fermi function, so
activation(p) = 1/(1-exp(-total_input(p))).
The output of a neuron is calculated from the activation of the neuron by applying an output function. An often used output function is the identity f(x) = x (and indeed some authors see the output function as part of the activation function). I will just assume that
output(p) = activation(p)
When the output off all neurons of the second layer is calculated, use that output to calculate the output of the third layer. Iterate until you reach the output layer.
Where the weights come from
At first the weights are chosen randomly. Then you select some examples (from which you know the desired output). Feed each example to the perceptron and calculate the error, i.e. how far off from the desired output is the actual output. Use that error to update the weights. One of the fastest algorithms for calculating the new weights is Resilient Propagation.
How to construct a Perceptron
Some questions you need to address are
What are the relevant characteristics of the documents and how can they be encoded into an n-dimansional vector?
Which examples should be chosen to adjust the weights?
How shall the output be interpreted to classify a document? Example: A single output that yields the most likely class versus a vector that assigns probabilities to each class.
How many hidden layers are needed and how large should they be? I recommend starting with one hidden layer with n neurons.
The first and second points are very critical to the quality of the classifier. The perceptron might classify the examples correctly but fail on new documents. You will probably have to experiment. To determine the quality of the classifier, choose two sets of examples; one for training, one for validation. Unfortunately I cannot give you more detailed hints to answering these questions due to lack of practical experience.

I think that trying to solve an NLP problem with a Neural Network when you're not familiar with either might be a step too far. That you're doing it in a new language is the least of your worries.
I'll link you to my Neural Computation module slides that gets taught at my university. You'll want the slides from session 1 and session 2 in week 2. Right at the bottom of the page is a link to how to implement a neural network in C. With a few modifications should be able to port it to python. You should note that it details how to implement a multilayer perceptron. You only need to implement a single layer perceptron, so ignore anything that talks about hidden layers.
A quick explanation of x and w. Both x and w are vectors. x is the input vector. x contains normalised frequencies for each word you are concerned about. w contains weights for each word you are concerned with. The perceptron works by multiplying the input frequency for each word by its respective weight and summing them up. It passes the result to a function (typically a sigmoid function) that turns the result into a value between 0 and 1. 1 means the perceptron is positive that the inputs are an instance of the class it represents and 0 means it is sure that the inputs really aren't an example of its class.
With NLP you typically learn about the bag of words model first, before moving on to other, more complex, models. With a neural network, hopefully, it will learn its own model. The problem with this is that the neural network will not give you much of an understanding of NLP, other than documents can be classified by the words they contain, and that usually the number and type of words in a document contains most of the information you need to classify a document -- context and grammar do not add much extra detail.
Anyway, I hope that gives a better place from which to start your project. If you're still stuck on a particular part then ask again and I'll do my best to help.

You should take a look at this survey paper on text classification by Frabizio Sebastiani. It tells you all of the best ways to do text classification.
Now, I'm not going to bother you to read the whole thing, but there's one table near the end, where he compares how lots of different people's techniques stack up on lots of different test corpora. Find it, pick the best one (the best perceptron one, if you assignment is specifically to learn how to do this with perceptron), and read the paper he cites that describes that method in detail.
You now know how to construct a good topical text classifier.
Turning the algorithm that Oswald gave you (and that you posted in your other question) into code is a Small Matter of Programming (TM). And if you encounter unfamiliar terms like TF-IDF while you're working, ask your teacher to help you by explaining those terms.

MultiLayer perceptrons (A specific NeuralNet architecture for general classification problem.) Now available for Python from the GraphLab folks:
https://dato.com/products/create/docs/generated/graphlab.deeplearning.MultiLayerPerceptrons.html#graphlab.deeplearning.MultiLayerPerceptrons

I had a try at implementing something similar the other day. I made some code to recognize english looking text vs non-english. I hadn't done AI or statistics in many years, so it was a bit of a shotgun attempt.
My code is here (don't want to bloat the post): http://cnippit.com/content/perceptron-statistically-recognizing-english
Inputs:
I take a text file, split it up into
tri-grams (eg "abcdef" => ["abc",
"bcd", "cde", "def"])
I calculate the relative frequencies of each, and feed that as the inputs to the perceptron (so there are 26^3 inputs)
Despite me not really knowing what I was doing, it seems to work fairly well. The success depends quite heavily on the training data though. I was getting poor results until I trained it on more french/spanish/german text etc.
It's a very small example though, with lots of "lucky guesses" at values (eg. initial weights, bias, threshold, etc.).
Multiple classes:
If you have multiple classes you want to distinquish between (ie. not as simple as "is A or NOT-A"), then one approach is to use a perceptron for each class. Eg. one for sport, one for news, etc.
Train the sport-perceptron on data grouped as either sport or NOT-sport. Similar for news or Not-news, etc.
When classifying new data, you pass your input to all perceptrons, and whichever one returns true (or "fires"), then that's the class the data belongs to.
I used this approach way back in university, where we used a set of perceptrons for recognizing handwritten characters. It's simple and worked pretty effectively (>98% accuracy if I recall correctly).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.