Why are some Python package names different than their import name? - python

Some packages are imported with a string which is different from the name of the package on PyPI, e.g.:
$ pip list | grep -i "yaml\|qt"
PyYAML 3.13
QtPy 1.5.2
pyyaml (pip instal pyyaml), but import yaml
qtpy (pip install qtpy), yes import is qtpy but package is QtPy
Several tools can't not handle that, e.g sphinx:
$ make html
WARNING: autodoc: failed to import module 'wireshark' from module 'logcollector.plugins'; the following exception was raised:
No module named 'qtpy'
I don't remember it right now, but same is for tools which scan the requirements.txt file and print warnings that the yaml package isn't installed (but it is and its name is pyyaml).

There are multiple reasons why authors choose to use different names in different environments:
Drop-in replacements: Sometimes it is helpful when you can install a fork and keep the rest of your code the same. I guess the most famous example is pyyaml / yaml. I did it when I created propy3 which can be used as a drop-in replacement for propy. I would say that this is also what happened with pillow.
Convenience: beautifulsoup4 can be imported as bs4 (+ package parking for bs4)
Lost credentials: I don't know of an example where the import name was changed as well, but I think for flask-restx the package name and the import name were changed.
A word of caution
As Ziyad Edher has pointed out in a related discussion, typosquatting is an issue on PyPI (source). If you add packages with different names, this gets more likely.
Other examples
Name in the docs vs "import" package name vs pypi package name vs anaconda packages vs Debian:
scikit-learn vs sklearn vs scikit-learn vs scikit-learn vs python-sklearn and python3-sklearn
OpenCV-Pyton vs cv2 vs opencv-python vs py-opencv vs python-opencv
PyTables vs tables vs tables vs pytables vs python-tables

Because these two concepts are not really related.
One is a python concept of package/module names, the other one a package manager concept.
Look at a simple packaging command with zip:
zip -r MyCoolTool.zip tool.py
The Tool is named tool, which probably is not unique and if you do not know that its MyCoolTool you do not know which tool it is. When I upload it somewhere I name it MyCoolTool, so you now a more unique name, that may be a bit more descriptive.
The other point is, that a pip package may include more modules than just one. PyYAML could for example include a second python module yaml2xml in addtion to yaml.
Finally there can be several implementations. PyYAML sounds like a pure python implementation. Now assume you need a really fast parser, then you may program CYAML with a C-backend, but the same interface at the name yaml.

In case of sphinx you can mock 3rd party packages with: autodoc_mock_imports

Related

Difference between version pip show and importlib.metadata.version

I am creating a module, henceforth called mymodule, which I distribute using a pyproject.toml. This file contains a version number. I would like to write this version number in the logfile of mymodule. In mymodule I use the following snippet (in __init__.py) to obtain the version:
import importlib.metadata
__version__ = importlib.metadata.version(__package__)
del importlib.metadata
However this version is wrong. This appears to be the highest version which I have ever installed. For reference the command python3 -m pip show mypackage does actually show the correct version after installing the module locally. I struggle to explain this difference. Can anyone think of a cause of this discrepancy?
I also ran importlib.metadata.version(mypackage) which returned the same incorrect version.
The problem was related to left over build artifacts from using setup.py. importlib and pkg_resources will detect these artifacts in a local installation and pip will not. Deleting the mypackage.egg-info directory fixed the issue.

Is there a way to rename a python package upon installation?

The Problem
I am working on a project that uses a package in beta with multiple versions (package name: psychxr). After some confusing error messages about missing modules, I have discovered that depending on where I source my installation from I get different package contents.
If I use pip to install psychxr, I get an ovr sub-package. However, if I install from source (via official github repository), I get a libovr sub-package. Is there a way I can rename the source package such that I can get both modules? Alternatively, is there a better way to go about this? Although the packages complete roughly the same task, their implementations are noticeably different, and I'd like access to both.
CMD output of >>>python -c help('psychxr')
Version 1 (OVR)
NAME
psychxr
PACKAGE CONTENTS
ovr (package)
VERSION
0.1.4
Version 2 (LIBOVR)
NAME
psychxr
PACKAGE CONTENTS
libovr (package)
VERSION
0.2.0
Post Script: I do apologize for any misuse of terminology, or illegibility. I'm fairly new to both python and cmd in windows.

How pip determine a python package version

When I use pip to install a package from source, it will generates a version number for the package which I can see using 'pip show '. But I can't find out how that version number is generated and I can't find the version string from the source code. Can someone tell me how the version is generated?
The version number that pip uses comes from the setup.py (if you pip install a file, directory, repo, etc.) and/or the information in the PyPI index (if you pip install a package name). (Since these two must be identical, it doesn't really matter which.)
It's recommended that packages make the same string available as a __version__ attribute on their top-level module/package(s) at runtime that they put in their setup, but that isn't required, and not every package does.
And if the package doesn't expose its version, there's really no way for you to get it. (Well, unless you want to grub through the pip data trying to figure out which package owns a module and then get its version.)
Here's an example:
In the source code for bs4 (BeautifulSoup4), the setup.py file has this line:
version = "4.3.2",
That's the version that's used, directly or indirectly, by pip.
Then, inside bs4/__init__.py, there's this line:
__version__ = "4.3.2"
That means that Leonard Richardson is a nice guy who follows the recommendations, so I can import bs4; print(bs4.__version__) and get back the same version string that pip show beautifulsoup4 gives me.
But, as you can see, they're two completely different strings in completely different files. If he wasn't nice, they could be totally different, or the second one could be missing, or named something different.
The OpenStack people came up with a nifty library named PBR that helps you manage version numbers. You can read the linked doc page for the full details, but the basic idea is that it either generates the whole version number for you out of git, or verifies your specified version number (in the metadata section of setup.cfg) and appends the dev build number out of git. (This relies on you using Semantic Versioning in your git repo.)
Instead of specifying the version number in code, tools such as setuptools-scm may use tags from version control. Sometimes the magic is not directly visible. For example PyScaffold uses it, but in the project's root folder's __init__.py one may just see:
import pkg_resources
try:
__version__ = pkg_resources.get_distribution(__name__).version
except:
__version__ = "unknown"
If, for example, the highest version tag in Git is 6.10.0, then pip install -e . will generate a local version number such as 6.10.0.post0.dev23+ngc376c3c (c376c3c being the short hash of the last commit) or 6.10.0.post0.dev23+ngc376c3c.dirty (if it has uncommitted changes).
For more complicated strings such as 4.0.0rc1, they are usually hand edited in the PKG-INFO file. Such as:
# cat ./<package-name>.egg-info/PKG-INFO
...
Version: 4.0.0rc1
...
This make it unfeasible to obtain it from within any python code.

How do I get the version of an installed module in Python programmatically?

For the modules:
required_modules = ['nose', 'coverage', 'webunit', 'MySQLdb', 'pgdb', 'memcache']
and programs:
required_programs = ['psql', 'mysql', 'gpsd', 'sox', 'memcached']
Something like:
# Report on the versions of programs installed
for module in required_modules:
try:
print module.__version__
except:
exit
Unfortunately, module.__version__ isn't present in all modules.
A workaround is to use a package manager. When you install a library using easy_install or pip, it keeps a record of the installed version. Then you can do:
import pkg_resources
version = pkg_resources.get_distribution("nose").version
I found it quite unreliable to use the various tools available (including the best one pkg_resources mentioned by moraes' answer), as most of them do not cover all cases. For example
built-in modules
modules not installed but just added to the python path (by your IDE for example)
two versions of the same module available (one in python path superseding the one installed)
Since we needed a reliable way to get the version of any package, module or submodule, I ended up writing getversion. It is quite simple to use:
from getversion import get_module_version
import foo
version, details = get_module_version(foo)
See the documentation for details.

Checking a Python module version at runtime

Many third-party Python modules have an attribute which holds the version information for the module (usually something like module.VERSION or module.__version__), however some do not.
Particular examples of such modules are libxslt and libxml2.
I need to check that the correct version of these modules are being used at runtime. Is there a way to do this?
A potential solution wold be to read in the source at runtime, hash it, and then compare it to the hash of the known version, but that's nasty.
Is there a better solutions?
Use pkg_resources. Anything installed from PyPI at least should have a version number.
>>> import pkg_resources
>>> pkg_resources.get_distribution("blogofile").version
'0.7.1'
If you're on python >=3.8 you can use a module from the built-in library for that. To check a package's version (in this example lxml) run:
>>> from importlib.metadata import version
>>> version('lxml')
'4.3.1'
This functionality has been ported to older versions of python (<3.8) as well, but you need to install a separate library first:
pip install importlib_metadata
and then to check a package's version (in this example lxml) run:
>>> from importlib_metadata import version
>>> version('lxml')
'4.3.1'
Keep in mind that this works only for packages installed from PyPI. Also, you must pass a package name as an argument to the version method, rather than a module name that this package provides (although they're usually the same).
I'd stay away from hashing. The version of libxslt being used might contain some type of patch that doesn't effect your use of it.
As an alternative, I'd like to suggest that you don't check at run time (don't know if that's a hard requirement or not). For the python stuff I write that has external dependencies (3rd party libraries), I write a script that users can run to check their python install to see if the appropriate versions of modules are installed.
For the modules that don't have a defined 'version' attribute, you can inspect the interfaces it contains (classes and methods) and see if they match the interface they expect. Then in the actual code that you're working on, assume that the 3rd party modules have the interface you expect.
Some ideas:
Try checking for functions that exist or don't exist in your needed versions.
If there are no function differences, inspect function arguments and signatures.
If you can't figure it out from function signatures, set up some stub calls at import time and check their behavior.
I found it quite unreliable to use the various tools available (including the best one pkg_resources mentioned by this other answer), as most of them do not cover all cases. For example
built-in modules
modules not installed but just added to the python path (by your IDE for example)
two versions of the same module available (one in python path superseding the one installed)
Since we needed a reliable way to get the version of any package, module or submodule, I ended up writing getversion. It is quite simple to use:
from getversion import get_module_version
import foo
version, details = get_module_version(foo)
See the documentation for details.
You can use
pip freeze
to see the installed packages in requirements format.
For modules which do not provide __version__ the following is ugly but works:
#!/usr/bin/env python3.6
import sys
import os
import subprocess
import re
sp = subprocess.run(["pip3", "show", "numpy"], stdout=subprocess.PIPE)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))
or
#!/usr/bin/env python3.7
import sys
import os
import subprocess
import re
sp = subprocess.run(["pip3", "show", "numpy"], capture_output=True)
ver = sp.stdout.decode('utf-8').strip().split('\n')[1]
res = re.search('^Version:\ (.*)$', ver)
print(res.group(1))

Categories

Resources