My understanding is when you make your requirements.txt file, you should always freeze your package versions like so:
scikit-learn==0.24.0
and avoid doing:
scikit-learn>=0.24.0
since new versions of scikit-learn might deprecate some functions that you use in your code and that other packages might have dependency clashes with scikit-learn.
Is this assumption correct?
I see that in pandas==1.0.1 the dependencies in the wheel file downloaded are listed as:
python-dateutil>=2.6.1
pytz>=2017.2
numpy>=1.13.3
pytest>=4.0.2
pytest-xdist
hypothesis>=3.58
Is there a reason this was done?
Related
When I am trying to install some packages with pip, sometimes some dependency fails to install, and I need to sort out why. A fundamental question is "why did I need to install package X at all?", but I cannot find any way to answer this from the pip install output, even with -vvv. Pip tells me what it is installing, but it doesn't say why it is doing it.
There are various tools to introspect dependency trees of packages, e.g. pipdeptree, especially when they are already installed, but these don't help me when the installation has failed. And it seems that internally pip must already have solved these dependencies and know why it has chosen to install a particular package. So how can I get it to share this information with me at install time?
Edit: It already shows this information when telling you what dependencies are already satisfied, e.g.
Requirement already satisfied: pillow>=6.2.0 in /data2/users/bfarmer/envs/bfarmer_dev_py 39/lib/python3.9/site-packages (from matplotlib>=1.3.1->stf-modelling) (8.0.1)
Requirement already satisfied: certifi>=2020.06.20 in /data2/users/bfarmer/envs/bfarmer_ dev_py39/lib/python3.9/site-packages (from matplotlib>=1.3.1->stf-modelling) (2020.12.5)
(at least I assume that it is what it is telling me at the end of those lines). But when I need this information the most, i.e. when something goes wrong, I get nothing:
Collecting PIL
Downloading http://ehp.bom.gov.au/ehprepo/pypi/simple/pil/PIL-1.1.7.tar.gz (506 kB)
|████████████████████████████████| 506 kB 8.1 MB/s
ERROR: Command errored out with exit status 1:
...
<traceback etc. follows>
In this example I am left wondering why the heck some package wanted PIL when pillow is already there. PIL is basically dead, so I need to update whatever package has a dependency on PIL to use pillow instead. But I have no idea what package that might be, and cannot figure out any way to find out. This seems like basic information, there must surely be a way to get it.
It kind of seems like no, pip cannot do this. I found this issue about it here:
https://github.com/pypa/pip/issues/53
It sounds like they are working on it, but nothing exists currently. I am still interested in workaround/third party solutions though, or advice from other developers about what they typically do in this situation. It kind of seems like I just have to manually trawl through all the dependencies of my dependencies, which just seems stupid. Maybe I can hack something in to the pip source...
I am asking myself, which version of the library pip will install in this scenario:
requirements.txt contains:
numpy<=1.14
scikit-learn
Now imagine, that scikit-learn depends on numpy>=1.10.
If I start pip install -r requirements.txt now, how will pip install the dependencies?
Does it parse the whole dependency structure before installing and finds a valid version of numpy?
Does it just parse the file and dependencies sequentially (package by package) and tries to go for the best "last" dependency?
In my example this would be:
numpy==1.14
numpy==latest
The essential question is: In which order will pip install its dependencies? How does it determine the proper version, respecting all cross dependencies?
EDIT: My initial guess would be, that it has an internal list with valid version and cancels out invalid versions by parsing all dependencies before installing. Then it takes the highest valid remaining version of each package.
First thing to know: Most likely the order will change soon. pip is currently (today: September 2020) slowly rolling out a new dependency resolver. It can be used today already but is not the default. So depending which dependency resolver you are using, results might differ.
A couple of pointers:
pip's documentation chapter "Installation Order"
pip's documentation chapter "Changes to the pip dependency resolver in 20.2 (2020)"
I have a dependency tree of modules that works like this (→ indicating a dependency):
a → b, c
b → ruamel.yaml >= 0.16.5
c → ruamel.yaml < 0.16.6, >=0.12.4
It's very clear to me that ruamel.yaml 0.16.5 will resolve both of these dependencies correctly. However, when I pip install a, I get the following logs:
Collecting ruamel.yaml>=0.16.5
Downloading ruamel.yaml-0.16.10-py2.py3-none-any.whl (111 kB)
And then later:
ERROR: <package c> 0.4.0 has requirement ruamel.yaml<0.16.6,>=0.12.4, but you'll have ruamel-yaml 0.16.10 which is incompatible.
So pip has completely ignored the grandchild dependencies when choosing which packages to install. But it realises that it has messed up at the end. Why is pip not choosing the correct package here. Is there a way to help it work better?
I believe this is a well known problem that is currently being worked on. Message from one week ago: http://pyfound.blogspot.com/2020/03/new-pip-resolver-to-roll-out-this-year.html
In the meantime there are some measures that can be taken to try and mitigate this kind of issues:
Revert the order of the dependencies (in your example a could list c before b)
Use an additional requirements.txt or constraints.txt file
Depending on the actual needs, an alternative tool could help (I believe poetry, pipenv, and most likely others as well might have better dependency resolvers, but they are not a one-to-one replacement for pip)
It appears to be already possible to test pip's future dependency resolver today:
Install pip from source
Run path/to/python -m pip install --unstable-feature=resolver ...
In a way it also seems to be possible to somewhat test this dependency resolver in current releases or pip via the pip check command.
Some more references on the topic:
https://pradyunsg.me/blog/2020/03/27/pip-resolver-testing/
https://discuss.python.org/t/an-update-on-pip-and-dependency-resolution/1898/2
This question already has answers here:
Pip install from pypi works, but from testpypi fails (cannot find requirements)
(2 answers)
Closed 2 years ago.
TL;DR Even though I've specified dependencies with install_requires in setup.py, the install through pip fails because some dependencies can't be found.
I've developed a package which I intend to distribute via PyPi. I've created a built distribution wheel and uploaded it to testPyPI to see if everything is working with the upload and if the package can be installed from a user perspective.
However, when I try to pip install the package inside a vanilla python 2.7 environment, the installation process fails while installing the dependencies.
My package depends on these packages (which I added to the setup.py file accordingly):
...
install_requires=['numpy','gdal','h5py','beautifulsoup4','requests','tables','progress'],
...
So when I run pip install, everything looks normal for a moment, until I receive this error:
Could not find a version that satisfies the requirement progress (from #NAME#) (from versions: )
No matching distribution found for progress (from #NAME#)
When I remove the progress dependency (I could live without it), same thing happens for pytables:
Could not find a version that satisfies the requirement tables (from #NAME#) (from versions: )
No matching distribution found for tables (from #NAME#)
If I run pip install tables and pip install progress manually beforehand, everything works as expected.
So how can I assure that if someone downloads my package, all missing dependencies are installed with it?
Related bonus question:
Can I include a wheel file in my package (maybe through MANIFEST.in) and install it as dependency if the module is not available? If so, how?
And I think I've found the answer to my question myself.
When installing a package from testPyPI, the dependencies are also installed from there. And it seems, that while there are many packages available, pytables and progress are apparently missing. This caused the installation to fail.
Naturally, manually installing with pip install gets the package from the "normal" PyPi, which of course works. This obviously added to my confusion.
Here's a look at the output from pip install when installing the package from the testPyPi:
Downloading https://test-files.pythonhosted.org/packages/4f/96/b3329750a04fcfc316f15f658daf6d81acc3ac61e3db390abe8954574c18/nump
y-1.9.3.tar.gz (4.0MB)
while installing the wheel directly, it looks slightly different:
Downloading https://files.pythonhosted.org/packages/2e/91/504e434d3b95d943caab926f33dee5691768fbb622bc290a0fa6df77e1d8/numpy-1.1
4.2-cp27-none-win32.whl (9.8MB)
Additionally, running
pip install --index-url https://test.pypi.org/simple/ tables
produces the same error as described in my question.
I'm trying to write a python package that can be installed from PyPI and am having trouble getting my head around how exactly to structure setup.py and requirements.txt properly.
I know that they have different semantics and different purposes with setup.py defining whats needed, and requirements.txt given exact versions. I also know that you shouldn't read requirements.txt into setup.py.
So what I need to know is how to structure setup.py and requirements.txt so that when my package is installed from PyPI the reight requirements are installed.
In my example, I need django-haystack (the latest version is 2.5.1), but my code is only compatible with django-haystack version 2.5.0, so my setup.py and requirements.txt are as shown below:
setup.py:
setup(
name='my_package',
install_requires = [
'django-haystack',
],
)
requirements.txt:
django-haystack==2.5.0
How can I structure my setup code so that when this is installed, django-haystack==2.5.0 is installed not the latest?
First, a warning: specify explicit version requirements in a setup.py file without a range will guarantee frustration for end-users in the future.
You can simply do it like so in the setup.py file.
setup(
name='my_package',
install_requires=[
'django-haystack==2.5.0',
],
)
However, if another user wish to use another package that requires django-haystack latest version, they won't be able to install your package as defined due to version conflict issues. Of course, if the package at hand is so flaky that it can't even attempt to use semantic versioning then there isn't really much can be done.
Now if all you are after is a reproducible build, the requirements.txt method can be used for explicit version requirements for all packages within your environment, which is out of band from the typical package dependency structure, however it won't suffer from the potentially crippling lockdown from conflicting requirements that aren't actually in conflict. zc.buildout is an alternative, but much more heavier but it does a lot more than just Python.