Train NGramModel in Python

Train NGramModel in Python - python

I am using Python 3.5, installed and managed with Anaconda. I want to train an NGramModel (from nltk) using some text. My installation does not find the module nltk.model
There are some possible answers to this question (pick the correct one, and explain how to do it):
A different version of nltk can be installed using conda, so that it contains the model module. This is not just an older version (it would need to be too old), but a different version containing the model (or model2) branch of the current nltk development.
The version of nltk mentioned in the previous point cannot be installed using conda, but can be installed using pip.
nltk.model is deprecated, better use some other package (explain which package)
there are better options than nltk for training an ngram model, use some other library (explain which library)
none of the above, to train an ngram model the best option is something else (explain what).

try
import nltk
nltk.download('all')
in your notebook

Related

How can I know which is the lastest Python version compatible for Tensorflow v2.x?

As title. My requirement is very simple. Since I will probably need to use the latest features of Python at my work. I wonder how to know the latest version of Python can be used with Tensorflow v2.x without any trouble regarding compatibility. I must put emphasis on that I need to use the tensorflow.keras module. I don't want to get an error message during the model training. Any advice?
I did try to follow the issue on their GitHub on supporting Python3.9. While the issue is closed, most of the comments there are NOT from the contributors/maintainers. And the last comment is on 2021/6. Is Python3.9 the lastest compatible version to run Tensorflow v2.x?

TensorFlow 2 is supported from Python 3.7 to 3.10 according to their website: https://www.tensorflow.org/install?hl=en

How to fix spaCy en_training incompatible with current spaCy version

UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.1.4.
This can lead to compatibility problems with older versions,
or as new spaCy versions are released, because the model may say it's compatible when it's not.
Consider changing the "spacy_version" in your meta.json to a version range,
with a lower and upper pin. For example: >=3.2.1,<3.3.0
spaCy version 3.2.1
Python version 3.9.7
OS Window

For spacy v2 models, the under-constrained requirement >=2.1.4 means >=2.1.4,<2.2.0 in effect, and as a result this model will only work with spacy v2.1.x.
There is no way to convert a v2 model to v3. You can either use the model with v2.1.x or retrain the model from scratch with your training data.

pip3 install spacy==2.1.4
This can download required

Why do I get an error when importing metrics from Open Telemetry? [duplicate]

I'm just starting in telemetry and I got stuck trying to use metrics in the new versions of opentelemetry-api and opentelemetry-sdk.
What I have found
1 - Documentation
This is a old getting started (do not try those scripts, not up to date)
https://open-telemetry.github.io/opentelemetry-python/getting-started.html
And this is the latest getting started
https://opentelemetry-python.readthedocs.io/en/latest/sdk/sdk.html
As you see, in the latest there is no information about metrics just tracing.
2 - The packages
As you see in this image in the version 1.10a0 of the opentelemetry there is the metrics module, while in the current version 1.4 there is no module metrics (see image).
The problem
To use metrics one must run pip install opentelemetry-instumentation-system-metrics by doing this the pip uninstall opentelemetry api and sdk and reinstall old version. (see image). When it happens I am able to import the metrics module, but the tracing does not work anymore.
Question
Where is the metrics module in the new version of opentelemetry?
How can I instrument metrics in the latest version of opentelemetry?

You can't as of now. There is ongoing prototype https://github.com/open-telemetry/opentelemetry-python/pull/1887 for metrics API&SDK based on specification which itself is not stable yet. There is no guaranteed timeline when metrics will be release for end user instrumentation. You may safely assume that it will take few more months to get stable release but there should be alpha-beta release much earlier.

What versions of spaCy suport en_vectors_web_lg?

I am trying to download en_vectors_web_lg, but keep getting the below error:
ERROR: Could not install requirement en-vectors-web-lg==3.0.0 from https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0 because of HTTP error 404 Client Error: Not Found for url: https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl for URL https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-3.0.0/en_vectors_web_lg-3.0.0-py3-none-any.whl#egg=en_vectors_web_lg==3.0.0
Is spacy still supporting en_vectors_web_lg?
I also just updated my spacy to the latest version

The naming conventions changed in v3 and the equivalent model is en_core_web_lg. It includes vectors and you can install it like this:
spacy download en_core_web_lg
I would not recommend downgrading to use the old vectors model unless you need to run old code.
If you are concerned about accuracy and have a decent GPU the transformers model, en_core_web_trf, is also worth considering, though it doesn't include word vectors.

It looks like en_vectors_web_lg is not supported by SpaCy v3.0. The SpaCy v3.0 installation guide offers en_core_web_trf instead, which is a Transformer-based pipeline.

Update scikit model so it is compatible with newest version

I have a question about scikit models and (retro-)compatibility.
I have a model (saved using joblib) created in Python 3.5 from scikit-learn 0.21.2, which I then analyze with the package shap version 0.30. Since I upgraded to Ubuntu 20.04 I have Python 3.8 (and newer versions of both scikit-learn and shap).
Because of the new packages version I cannot load them with Python 3.8, so I make a virtual environment with Py3.5 and the original package versions.
Now my question is: is there a way to re-dump with joblib the models so I can also open them with Python 3.8? I'd like to re-analyze the model with the newest version of the package shap (but of course it has a scikit version requirement that would break the joblib loading).
Alternatively, what other options do I have? (The only thing I do not want is to re-train the model).

There are no standard solutions within scikit-learn. If your model is supported, you can try sklearn-json.
Although this does not solve your current issue, you can in the future save your models in formats with fewer compatibility issues – see the Interoperable formats section in scikit-learn's Model persistence page.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Train NGramModel in Python - python

try import nltk nltk.download('all') in your notebook

Related

How can I know which is the lastest Python version compatible for Tensorflow v2.x?

How to fix spaCy en_training incompatible with current spaCy version

Why do I get an error when importing metrics from Open Telemetry? [duplicate]

What versions of spaCy suport en_vectors_web_lg?

Update scikit model so it is compatible with newest version

Categories

Resources