What versions of spaCy suport en_vectors_web_lg?

What versions of spaCy suport en_vectors_web_lg? - python

I am trying to download en_vectors_web_lg, but keep getting the below error:
ERROR: Could not install requirement en-vectors-web-lg==3.0.0 from https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0 because of HTTP error 404 Client Error: Not Found for url: https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl for URL https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0
Is spacy still supporting en_vectors_web_lg?
I also just updated my spacy to the latest version

The naming conventions changed in v3 and the equivalent model is en_core_web_lg. It includes vectors and you can install it like this:
spacy download en_core_web_lg
I would not recommend downgrading to use the old vectors model unless you need to run old code.
If you are concerned about accuracy and have a decent GPU the transformers model, en_core_web_trf, is also worth considering, though it doesn't include word vectors.

It looks like en_vectors_web_lg is not supported by SpaCy v3.0. The SpaCy v3.0 installation guide offers en_core_web_trf instead, which is a Transformer-based pipeline.

Related

How to fix spaCy en_training incompatible with current spaCy version

UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4.
This can lead to compatibility problems with older versions,
or as new spaCy versions are released, because the model may say it's compatible when it's not.
Consider changing the "spacy_version" in your meta.json to a version range,
with a lower and upper pin. For example: >=3.2.1,<3.3.0
spaCy version 3.2.1
Python version 3.9.7
OS Window

For spacy v2 models, the under-constrained requirement >=2.1.4 means >=2.1.4,<2.2.0 in effect, and as a result this model will only work with spacy v2.1.x.
There is no way to convert a v2 model to v3. You can either use the model with v2.1.x or retrain the model from scratch with your training data.

pip3 install spacy==2.1.4
This can download required

How can i use thinc.types with spacy version 2

I am using spacy version==2.2.4 for name entity recognition and wishes to use the same version for testing custom spacy relation extraction pipeline. But unfortunately, I am facing the below issue while running custom relation extraction model with the above spacy version.
ModuleNotFoundError: No module named 'thinc.types'
I have used spacy github link to train the custom relation extraction pipeline. For training, I have used spacy==3.1.4.
Now, I need to connect two different models whereas Name entity recognition is trained on spacy version 2 whereas spacy relation extraction model works fine with spacy version 3.
I did some debugging and here are my results
I read in spacy github issue 7219 that to use the relation extraction model with spaCy v2, use spacy-transformers==0.6.2. I did exactly the same but no success.There is pypi link about spacy transformers which says that spacy transformers requires spacy>=3.0
I did not stopped researching there and went to another spacy github issue 7910 which says use the thinc version 8.0.3. This version is not compatible with spacy==2.2.4
I am facing the issue to use spaCy v2 for testing custom spaCy relation extraction pipeline. If it is not possible then one of the solution would be to use the same spacy version on both end. I could easily implement this but there is another challenge which comes in between i.e also using neuralcoref in between which cannot be installed with spaCy v3. So any solution to this problem would help in solving that.
I am also thinking about using different environments for (NER + Coreference) and (Relation Extraction). Does this sounds a good solution.

How to resolve Spacy POS Attribute E1005 Error

I was able to install spaCy and download the standard English model (en_core_web_sm).
But by just loading the standard data model, I received the following error message:
import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for ' '.
Tokenizer exceptions are only allowed to specify ORTH and NORM.
I check the Config.CFG but don't see any POS attribute. Any help is greatly appreciated as I searched the Internet for an answer....
PS, using pip freeze, here are some of the libraries
spacy==3.0.6
spacy-legacy==3.0.5
en-core-web-sm==2.2.0

You have a model for spaCy v2 (the model version starts with 2), but you are using spaCy v3. The models are not compatible with different major versions. You need to uninstall the model and then download the new model:
pip uninstall en-core-web-sm
pip -m spacy download en_core_web_sm

Train NGramModel in Python

I am using Python 3.5, installed and managed with Anaconda. I want to train an NGramModel (from nltk) using some text. My installation does not find the module nltk.model
There are some possible answers to this question (pick the correct one, and explain how to do it):
A different version of nltk can be installed using conda, so that it contains the model module. This is not just an older version (it would need to be too old), but a different version containing the model (or model2) branch of the current nltk development.
The version of nltk mentioned in the previous point cannot be installed using conda, but can be installed using pip.
nltk.model is deprecated, better use some other package (explain which package)
there are better options than nltk for training an ngram model, use some other library (explain which library)
none of the above, to train an ngram model the best option is something else (explain what).

try
import nltk
nltk.download('all')
in your notebook

Gensim doc2vec infer_vector method missing

Have a hell of a blocker trying to use Gensim's doc2vec.
I import gensim.models.doc2vec.Doc2Vec and successfully train it on a set of tweets. I am able to pull my document vectors fine, using model['DOC_[0123..]''.
My issue now is that I'm trying to get a vector representation for a new, unseen document so that I can feed that vector back into a classifier. As far as I know, the only method that exists to do this with doc2vec is infer_vector().
HOWEVER, when I try to call this method, I get the following:
AttributeError: 'Doc2Vec' object has no attribute 'infer_vector'
I'm able to use all the other methods described in the doc2vec documentation: https://radimrehurek.com/gensim/models/doc2vec.html
I've tried using different versions of gensim including 0.10.3 (the version released with doc2vec || http://rare-technologies.com/doc2vec-tutorial/) and the 0.13.1 (latest version).
PLEASE HELP.

The latest versions (specifically 0.12.1+) have this method; if you're getting that error, you may be using an older version, from a path/environment/python-interpreter that isn't pulling its libraries from where you expect.
Uninstall gensim and run your python, confirming gensim is actually gone from the python-environment you're using. Then re-install the latest gensim, and the expected version/methods should be available.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.