How to resolve Spacy POS Attribute E1005 Error - python

I was able to install spaCy and download the standard English model (en_core_web_sm).
But by just loading the standard data model, I received the following error message:
import spacy
​
# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for ' '.
Tokenizer exceptions are only allowed to specify ORTH and NORM.
I check the Config.CFG but don't see any POS attribute. Any help is greatly appreciated as I searched the Internet for an answer....
PS, using pip freeze, here are some of the libraries
spacy==3.0.6
spacy-legacy==3.0.5
en-core-web-sm==2.2.0

You have a model for spaCy v2 (the model version starts with 2), but you are using spaCy v3. The models are not compatible with different major versions. You need to uninstall the model and then download the new model:
pip uninstall en-core-web-sm
pip -m spacy download en_core_web_sm

Related

How to fix spaCy en_training incompatible with current spaCy version

UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4.
This can lead to compatibility problems with older versions,
or as new spaCy versions are released, because the model may say it's compatible when it's not.
Consider changing the "spacy_version" in your meta.json to a version range,
with a lower and upper pin. For example: >=3.2.1,<3.3.0
spaCy version 3.2.1
Python version 3.9.7
OS Window
For spacy v2 models, the under-constrained requirement >=2.1.4 means >=2.1.4,<2.2.0 in effect, and as a result this model will only work with spacy v2.1.x.
There is no way to convert a v2 model to v3. You can either use the model with v2.1.x or retrain the model from scratch with your training data.
pip3 install spacy==2.1.4
This can download required

How can i use thinc.types with spacy version 2

I am using spacy version==2.2.4 for name entity recognition and wishes to use the same version for testing custom spacy relation extraction pipeline. But unfortunately, I am facing the below issue while running custom relation extraction model with the above spacy version.
ModuleNotFoundError: No module named 'thinc.types'
I have used spacy github link to train the custom relation extraction pipeline. For training, I have used spacy==3.1.4.
Now, I need to connect two different models whereas Name entity recognition is trained on spacy version 2 whereas spacy relation extraction model works fine with spacy version 3.
I did some debugging and here are my results
I read in spacy github issue 7219 that to use the relation extraction model with spaCy v2, use spacy-transformers==0.6.2. I did exactly the same but no success.There is pypi link about spacy transformers which says that spacy transformers requires spacy>=3.0
I did not stopped researching there and went to another spacy github issue 7910 which says use the thinc version 8.0.3. This version is not compatible with spacy==2.2.4
I am facing the issue to use spaCy v2 for testing custom spaCy relation extraction pipeline. If it is not possible then one of the solution would be to use the same spacy version on both end. I could easily implement this but there is another challenge which comes in between i.e also using neuralcoref in between which cannot be installed with spaCy v3. So any solution to this problem would help in solving that.
I am also thinking about using different environments for (NER + Coreference) and (Relation Extraction). Does this sounds a good solution.

Load custom trained spaCy model

I am trying to load a spaCy text classification model that I trained previously. After training, the model was saved into the en_textcat_demo-0.0.0.tar.gz file.
I want to use this model in a jupyter notebook, but when I do
import spacy
spacy.load("spacy_files/en_textcat_demo-0.0.0.tar.gz")
I get
OSError: [E053] Could not read meta.json from spacy_files/en_textcat_demo-0.0.0.tar.gz
What is the correct way to load my model here?
You need to either unzip the tar.gz file or install it with pip.
If you unzip it, that will result in a directory, and you can give the directory name as an argument to spaCy load.
If you use pip install, it will be put with your other libraries, and you can use the model name like you would with a pretrained spaCy model.

What versions of spaCy suport en_vectors_web_lg?

I am trying to download en_vectors_web_lg, but keep getting the below error:
ERROR: Could not install requirement en-vectors-web-lg==3.0.0 from https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0 because of HTTP error 404 Client Error: Not Found for url: https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl for URL https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0
Is spacy still supporting en_vectors_web_lg?
I also just updated my spacy to the latest version
The naming conventions changed in v3 and the equivalent model is en_core_web_lg. It includes vectors and you can install it like this:
spacy download en_core_web_lg
I would not recommend downgrading to use the old vectors model unless you need to run old code.
If you are concerned about accuracy and have a decent GPU the transformers model, en_core_web_trf, is also worth considering, though it doesn't include word vectors.
It looks like en_vectors_web_lg is not supported by SpaCy v3.0. The SpaCy v3.0 installation guide offers en_core_web_trf instead, which is a Transformer-based pipeline.

Unable to load the spacy model 'en_core_web_lg' on Google colab

I am using spacy in google colab to build an NER model for which I have downloaded the spaCy 'en_core_web_lg' model using
import spacy.cli
spacy.cli.download("en_core_web_lg")
and I get a message saying
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_lg')
However then when i try to load the model
nlp = spacy.load('en_core_web_lg')
the following error is printed:
OSError: [E050] Can't find model 'en_core_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Could anyone help me with this problem?
Running
import spacy.cli
spacy.cli.download("en_core_web_lg")
nlp = spacy.load("en_core_web_lg")
shouldn't yield any errors anymore with recent spaCy versions.
If running the code still gives errors, you should be all set with running in one cell (takes a while, but gives you visual feedback about progress, differently from spacy.cli)
!python -m spacy download en_core_web_lg
Then, *** restart the colab runtime *** via
the colab menu Runtime > Restart runtime, or
use the keyboard shortcut Ctrl+M .
After that, executing
import spacy
nlp = spacy.load('en_core_web_lg')
should work flawlessly.
In Google Colab Notebooks, you should import the model as a package.
However you download and install the model:
!pip install <model_s3_url> # tar.gz file e.g. from release notes like https://github.com/explosion/spacy-models/releases//tag/en_core_web_lg-2.3.1
!pip install en_core_web_lg
import spacy
you don't have permission in Colab to load the model with normal spacy usage:
nlp = spacy.load("en_core_web_lg") # not via packages
nlp = spacy.load("/path/to/en_core_web_lg") #not via paths
nlp = spacy.load("en") # nor via shortcut links
spacy.load()
Instead, import the model and load it directly:
import en_core_web_lg
nlp = en_core_web_lg.load()
Then use as directed:
doc = nlp("This is a sentence. Soon, it will be knowledge.")
It seems the best answer is on this thread: How to install models/download packages on Google Colab?
import spacy.cli
spacy.cli.download("en_core_web_lg")
import en_core_web_lg
nlp = en_core_web_lg.load()
I ran into a similar issue on google colab with:
nlp = spacy.load('en_core_web_md')
I suspect it may have something to do with the size of the model. It worked for me using the small spacy model.
spacy download en_core_web_sm
nlp = spacy.load('en_core_web_sm')

Categories

Resources