I have a project that needs some DevOps TLC, so I am finally building my installation script. This will eventually be a package that will be install-able by pip locally, but may not end up in PyPI.
It has a dependency for a module called u2py. It is this package, created for U2 Database operations, not this package, for... something else. The one I want is only ever installed by a 3rd party vendor (Rocket), the one I don't want is in PyPI.
What should be the expected behavior of my package in this case? I will include a blurb about this in my readme doc, but is that sufficient?
I've thought about throwing an exception to identify when the wrong package is present, but that makes me feel weird. It seems that maybe the most pythonic thing is to NOT add this to my install script, and blindly assume import u2py results in a module I can use. If it quacks like a duck, parses DynArrays like a duck, and call()s SUBROUTINEs like a duck, then it's a duck, right? Otherwise, if there is an error the user will just go and actually read the docs.
I've looked a classifiers, but not sure if they apply here.
Ideally there would be a way at install-time (in setup.py) to detect whether the package is being installed into a "u2 environment" or not, and could fail the installation (with an appropriate error message) if that's the case.
With this solution, you won't be able to provide built distributions (wheels) since they don't execute the setup.py file at install-time, but just publishing source distributions should be fine.
It's a case where it would be nice if Python projects had namespaces (pip install com.rocket.u2py and import com.rocket.u2py as u2py).
From my point of view there are 2 aspects to consider: at the project level, at the package level.
1. project (distribution package)
I believe it is a bad practice to force alternative download sources onto the end user of your project. By default, pip should download from PyPI and nowhere else, unless the user decides it themselves (via --find-links or similar options, which you could instruct your users to do in your documentation).
Since it is such a niche dependency, I think I would simply not add it to install_requires. I would assume the end users of your project know about the dependency already and are able to install it themselves directly.
Also I don't believe it is possible to check reliably at install-time if the correct dependency is installed, since setup.py does not always run (overriding the bdist_wheel command can help, but probably not 100% effective).
2. package (importable package)
I am not sure some specific action is needed. The code would most likely fail sooner or later naturally, because of module or function is not importable. Which might be okay-ish, maybe?
But probably detecting if the dependency is installed (and it is the correct one), is relatively easy and would provide a better user experience. Either check that some specific modules or functions are importable. Or inspect the meta-data (import importlib_metadata; importlib_metadata.distribution('u2py').metadata['Author']).
In case of an application, I would try to fail gracefully as soon as possible. In case of a library I would try to find one strategic spot to place the check and raise a custom exception (CannotFindU2pyException).
Links:
Prevent pip from caching a package
https://docs.python.org/3/library/importlib.metadata.html#distribution-metadata
https://github.com/pypa/pip/issues/4187#issuecomment-415067034
Equivalent for `--find-links` in `setup.py`
You can specify the url to the package in install_requires using setuptools (requires pip version 18.1 or greater).
Requirement specifiers
Example
setup.py
import setuptools
setuptools.setup(
name='MyPackage',
version='1.0.0',
# ...
install_requires=[
'requests # https://github.com/psf/requests/archive/v2.22.0.zip'
]
# ...
)
and do python setup.py install
Also
Since version 19.1, pip also supports direct references like so:
SomeProject # file:///somewhere/...
Ref
https://www.python.org/dev/peps/pep-0508/
https://github.com/pypa/pip/pull/4175
Related
I have a python package fsspec. This library provides a way to register external filesystem (backend integration). One way to register the required filesystem is to add it to the entry_points in setuptools.
To accomplish the task, we can manually add the entry_points in the setup.py and then install the package. But what I am looking for is a way to programmatically make the entry pre/post installation of the fsspec package.
I don't believe there is a simple way to update the metadata that normally gets written into your system during a package install. Digging through the code of setuptools or importlib.metadata might tell you how. It may be, that the Distribution object exposed by metadata can be edited in memory.
The point of the approach is, that fsspec does not need to import anything (aside from importlib.metadata itself) to know what extra implementations are available.
However, you might get far enough by using fsspec.register_implementation, which is be done dynamically at runtime. The only downside is, that your package will need to be imported and the function called before you try to access the protocol it provides.
I searched a bit but could not find a clear answer.
The goal is, to have two pip indexes, one is a private index, that will be a first priority. And one is the standard PyPI. The priority is there to prevent the security risk of code injection.
Say I have library named lib, and I configure index_url = http://my_private_pypi_repo and extra_index_url = https://pypi.org/simple
If I pip install lib, and lib exists in both indexes. What index will get the priority? From where it is going to be installed from?
Also, if I pip install lib=0.0.2 but lib exists in my private index at version 0.0.1. Is it going to look at PyPI as well?
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
The short answer is: there is no prioritization and you probably should avoid using --extra-index-url entirely.
This is asked and answered here: https://github.com/pypa/pip/issues/5045#issuecomment-369521345
Question:
I have this in my pip.conf:
[global]
index-url = https://myregistry-xyz.com
extra-index-url = https://pypi.python.org/pypi
Let's assume packageX exists in both registries and I run pip install packageX.
I expect pip to install packageX from https://myregistry-xyz.com, but pip will use https://pypi.python.org/pypi instead.
If I switch the values for index-url and extra-index-url I get the same result. pypi is always prioritized.
Answer:
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
I would also recommend reading this discussion: https://discuss.python.org/t/dependency-notation-including-the-index-url/5659
There are quite a lot of things that are addressed in this discussion, some that is clearly out of scope for this question, but everything is very informative anyway.
In there, there should be the key takeaway for you:
Pip does not really prioritize one index over the other in theory. In practice, because of a coincidence in the way things are implemented in code, it might be that one is always checked first, but it is not a behavior you should rely on.
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
You should setup and curate your own package index (devpi, pydist, jfrog artifactory, sonatype nexus, etc.) and use it exclusively, meaning: never use --extra-index-url. This is the only way you can have exact control over what gets downloaded. This custom repository might function mostly a proxy for the public PyPI, except for a couple of dependencies.
Related:
pip: selecting index url based on package name?
The title of this question feels a bit like an instance of XY problem. If you would elaborate more on what you want to achieve and what your constraints are we may be able to give you a better answer.
That said, sinoroc's suggestion to curate your own package index and use only that is a good one. A few other ideas also come to mind:
Update: Turns out pip may run distributions other than those in the constraints file so this method should probably be considered insecure. Additionally hashes are kind of broken on recent releases of pip.
Using a constraints file with hashes. This file can be generated using pip-tools like pip-compile --generate-hashes assuming you have documented your dependencies in a file named requirements.in. You can then install packages like pip install -c requirements.txt some_package.
Pro: What may be installed is documented alongside your code in your VCS.
Con: Controlling what is downloaded the first time is either tricky or laborious.
Con: Hash checking can be slow.
Con: You run into issues more frequently than when not using hashes. Some can be worked around others cannot; it is for instance not possible to combine constraints like -e file://` with hashes.
Use an alternative packaging tool like pipenv. It works similarly to the previous suggestion.
Pro: Easy to use
Con: Harder to integrate into your workflow if it does not fit naturally.
Curate packages locally. Packages and dependencies can be downloaded like pip download --dest some_dir some_package and installed like pip install --no-index --find-links some_dir.
Pro: What may be installed can be documented alongside your code, if you track the artifacts in VCS e.g. git lfs.
Con: Either all packages are downloaded or none are.
Use a hermetic build system. I know bazel advertise this as a feature, not sure about others like pants and buck.
Pro: May be the ultimate solution if you want control over your builds.
Con: Does not integrate well with open source python ecosystem afaik.
Con: A lot of overhead.
1: https://en.wikipedia.org/wiki/XY_proble
I maintain a Python utility that allows bpy to be installable as a Python module. Due to the hugeness of the spurce code, and the length of time it takes to download the libraries, I have chosen to provide this module as a wheel.
Unfortunately, platform differences and Blender runtime expectations makes support for this tricky at times.
Currently, one of my big goals is to get the Blender addon scripts directory to install into the correct location. The directory (simply named after the version of Blender API) has to exist in the same directory as the Python executable.
Unfortunately the way that setuptools works (or at least the way that I have it configured) the 2.79 directory is not always placed as a sibling to the Python executable. It fails on Windows platforms outside of virtual environments.
However, I noticed in setuptools documentation that you can specify eager_resources that supposedly guarantees the location of extracted files.
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
There was a lot of hand waving and jargon in the documentation, and 0 examples. I'm really confused as to how to structure my setup.py file in order to guarantee the resource extraction. Currently, I just label the whole 2.79 directory as "scripts" in my setuptools Extension and ship it.
Is there a way to write my setup.py and package my module so as to guarantee the 2.79 directory's location is the same as the currently running python executable when someone runs
py -3.6.8-32 -m pip install bpy
Besides simply "hacking it in"? I was considering writing a install_requires module that would simply move it if possible but that is mangling with the user's file system and kind of hacky. However it's the route I am going to go if this proves impossible.
Here is the original issue for anyone interested.
https://github.com/TylerGubala/blenderpy/issues/13
My build process is identical to the process descsribed in my answer here
https://stackoverflow.com/a/51575996/6767685
Maybe try the data_files option of distutils/setuptools.
You could start by adding data_files=[('mydata', ['setup.py'],)], to your setuptools.setup function call. Build a wheel, then install it and see if you can find mydata/setup.py somewhere in your sys.prefix.
In your case the difficult part will be to compute the actual target directory (mydata in this example). It will depend on the platform (Linux, Windows, etc.), if it's in a virtual environment or not, if it's a global or local install (not actually feasible with wheels currently, see update below) and so on.
Finally of course, check that everything gets removed cleanly on uninstall. It's a bit unnecessary when working with virtual environments, but very important in case of a global installation.
Update
Looks like your use case requires a custom step at install time of your package (since the location of the binary for the Python interpreter relative to sys.prefix can not be known in advance). This can not be done currently with wheels. You have seen it yourself in this discussion.
Knowing this, my recommendation would be to follow the advice from Jan Vlcinsky in his comment for his answer to this question:
Post install script after installing a wheel.
Add an extra setuptools console entry point to your package (let's call it bpyconfigure).
Instruct the users of your package to run it immediately after installing your package (pip install bpy && bpyconfigure).
The purpose of bpyconfigure should be clearly stated (in the documentation and maybe also as a notice shown in the console right after starting bpyconfigure) since it would write into locations of the file system where pip install does not usually write.
bpyconfigure should figure out where is the Python interpreter, and where to write the extra data.
The extra data to write should be packaged as package_data, so that it can be found with pkg_resources.
Of course bpyconfigure --uninstall should be available as well!
Imagine a project MyLibrary which used to have its own requirements.txt file specifying all the versions needed by each of the dependencies...
lib_a==0.1
lib_b==0.11
lib_c==0.1.1
lib_d==0.1.2
lib_e==0.1.8
And a project ChildProject which happens to have the same kind of setup, with its own requirements.txt file and everything.
ChildProject uses MyLibrary as it needs some common functionality it has. The problem with this two, is that ChildProject has a library which is also specified in MyLibrary, but with a different version which causes conflict and causing the build to fail.
What I've done to get rid of the problem is to erase the dependencies in MyLibrary and specify the minimum and maximum versions for each of the libraries, specifying those in the setup_requires property within the setup() method...
setup(
setup_requires=['pbr', 'pytest-runner'],
install_requires=[
'lib_a>=0,<1',
'lib_b>=0,<2',
'lib_c>=0,<3',
'lib_d>=0,<4',
'lib_e>=0,<5'
],
pbr=True,
)
And here is where I get lost...
Should I remove requirements.txt in MyLibrary and leave all the versioning to child projects using ?
If so, how do I know that ChildProject is specifying all of the needed dependencies? What if I miss to specify lib_a in ChildProject?
Does the latest version that complies with the setup_requires constraints gets automatically installed or how does it work? (I ask this because AFAIK, install_requires just specified the constraints but doesn't include any library whatsoever in the project).
General suggestions for managing deps versions:
libraries dont't pin versions (i.e. either install_requires doesn't have version at all, or loose restrictions, i.e. <4). That's what you have already
applications can do whatever needed. In reality, it's highly recommended to pin your dependencies to some exact version (ant better yet — provide hash, to save yourself from forged libs). Reason for this — you can not guarantee 3rd-party libraries to follow semver. Which means that having >2, <3 in your requirements.txt may lead to broken build/deployment, because 3rd party lib released 2.5 which appears to be backward-incompatible with 2.4. So, you must do you best to avoid breaking builds by just re-building in different time. In other words, your build should be idempotent on PyPI state.
In general — you pin version to some state, test your application and commit/save/build/however you deliver. Some time later, you're revising versions (i.e. to update framework or address security patch), updating version in requirements.txt, testing your app with new deps state, if there's no conflicts/broken parts, you "freeze" that state with pinned versions, and build/deploy/etc. This kind of loop gives you space to occasionally update your requirements to stay up to date, and at the same time you have code that will not be broken by just re-installing dependencies.
If you're looking to easier dep management with version, I suggest taking a look at pipenv
Trying to install nosetests as per the learnpythonthehardway tutorial, I'm having problems. Any clues on what I should try next?
$ easy_install nose
Searching for nose
Best match: nose 1.1.2
Processing nose-1.1.2-py2.6.egg
nose 1.1.2 is already the active version in easy-install.pth
Installing nosetests-2.6 script to /usr/local/bin
error: /usr/local/bin/nosetests-2.6: Permission denied`
One question about installing that I have: if I have something saved in a random location on my computer, can it be imported into a python script regardless of where it is? So if I execute runthis.py that's in a folder called "projects", and I have from setup tools import setup as the first line of the program, does setup tools have to be anywhere in particular (such as the "projects" folder) for python to find it?
Are you able to use sudo?
If so, simply use sudo easy_install nose to install as root.
If not, you'll need to install somewhere you can write to, not the default location which you don't have permission to modify. This can be done easily in the traditional way, or using virtualenv which can a bit trickier to get set up initially.
As for the second question, no, python will only find things that are in directories found in sys.path, which is set to the contents of the PYTHONPATH environment variable plus the installed python's own library directories by default.
It is often (highly!) advisable to set up your own "local" repository of packages, for whatever language system (be it Python or otherwise) that you are using. Leave the "system installed" packages, whatever they might be, completely alone ... in case some uber-important system tool (the package-manager, anyone?) might also be using them and might be dependent on them.
The means of doing this vary from language to language, but they'll be documented somewhere all the same. You might even find that the "distro" that you're using has already anticipated this requirement and has set-aside some agreed upon location, e.g. "/usr/local/..." just for your own personal use.