I have come across some nasty dependencies.
Looking around I found solutions like upgrade this, downgrade that...
Some solutions work for some but not for others.
Is there a more 'rounded' way to tackle this issue?
To test this:
I wrote a script to create a virtual environment, attempt to install a requirements file, then delete this environment which I run to quickly test if changes to requirements result in a successful installation or not.
I wrote a "test app" which uses the basic functionalities of all the libraries I use in one place, to see if despite a successful installation, there are dependencies that pip is unaware of (due to problematic 3rd party library architecture) that break. I can edit the dependencies of this app, then commit to run build actions that run it and see if it completes or crashes.
This way I can upgrade and add libraries more rapidly.
If all libraries used semantic versioning, declared all their dependencies via requirement files and did not define upper versions in requirement files, this would be a lot simpler. Unfortunately, one of the DB-vendors I use (which I shall not name) is very problematic and has a problematic Python library (amongst many other problems).
First you need to understand that pip can resolve problems one at a time and when you put it in a corner, it can't go further.
But, if you give to pip the 'big problem' it has a nice way to try to resolve it. It may not always work, but for most cases it will.
The solutions you normally find out there are in some cases a coincidence. Someone has an environment similar to another person and one solution works for both.
But if the solution takes into consideration 'your present environment' then the solution should work for more people than just 'the coincidences'.
Disclaimer: below are linux/terminal commands.
Upgrade pip. We want the smartest pip we can get.
pip install --upgrade pip
Extract the list of packages you want to install.
In my case (these and many others, trimmed for brevity)
google-cloud-texttospeech attrdict google-cloud-language transformers
Give them all at once to pip.
pip install google-cloud-texttospeech attrdict google-cloud-language transformers
It will try all the combinations of versions and dependencies' versions until it finds something suitable. This will potentially download a ton of packages just to see their dependencies, so you only want to make this once.
If happy with the result, extract the requirements file.
pip freeze > requirements.txt
This contains all the packages installed, we are not interested in all.
And from it, extract the specific versions of your desired packages.
cat requirements.txt | egrep -i "google-cloud-texttospeech|attrdict|google-cloud-language|transformers"
Note: The command above may not list all your required packages, just the ones that appear in the requirements.txt, so check that all show up.
attrdict==2.0.1
google-cloud-language==1.2.0
google-cloud-texttospeech==2.12.3
transformers==2.11.0
Now you can put that on a file like resolved-dependencies.txt
And next time, install the packages directly with the valid & compatible version with.
pip install -r resolved-dependencies.txt
Related
I have a requirements.txt file for a Python code base. The file has everything specified:
pytz==2017.2
requests==2.18.4
six==1.11.0
I am adding a new package. Should I list its version? If yes, how do I pick a version to specify?
Check out the pip docs for more info, but basically you do not need to specify a version. Doing so can avoid headaches though, as specifying a version allows you to guarantee you do not end up in dependency hell.
Note that if you are creating a package to be deployed and pip-installed, you should use the install-requires metadata instead of relying on requirements.txt.
Also, it's a good idea to get into the habit of using virtual environments to avoid dependency issues, especially when developing your own stuff. Anaconda offers a simple solution with the conda create command, and virtualenv works great with virtualenvwrapper for a lighter-weight solution. Another solution, pipenv, is quite popular.
Specifying a version is not a requirement though it does help a lot in the future. Some versions of packages will not work well with other packages and their respective versions. It is hard to predict how changes in the future will effect these interrelationships. This is why it is very beneficial to create a snapshot in time (in your requirements.txt) showing which version interrelationships do work.
To create a requirements.txt file including the versions of the packages that you're using do the following. In your console/ terminal cd into the location that you would like your requirement.txt to be and enter:
pip freeze > requirements.txt
This will automatically generate a requirement.txt file including the packages that you have installed with their respective versions.
A tip - you should aim to be using a virtual environment for each individual project that you'll be working on. This creates a 'bubble' for you to work within and to install specific package versions in, without it effecting your other projects. It will save you a lot of headaches in the future as your packages and versions will be kept project specific. I suggest using Anacondas virtual environment.
No, there is no need to specify a version. It's probably a good idea to specify one, though.
If you want to specify a version but you don't know which version to specify, try using pip freeze, which will dump out a list of all the packages you currently have installed and what their versions are.
I'm using pip to install modules from a requirements file produced with pip freeze. However the problem sometimes it's unable to install or download one module and then everything fails and doesn't install anything. Is there a way to make it install the modules that satisfy the requirements?
With pip only, I would say no. pip and Python packages generally are designed to work in such a way that you might need dependencies installed in order to install the package itself. Thus, they don't have an option to try despite of failures.
However, pip install -r requirements.txt simply goes through the file line-by-line. You can iterate the every single item yourself and call pip install for it, without caring the result (was the installation successfully or not). With shell scripting this could be done e.g.:
cat requirements.txt|xargs pip install
The example does not understand comments, spaces, etc. so you might need to how something more complex in place for a real-life scenario.
Alternative you can simply run pip in loop until it gives a successful return value.
But as a real solution I would recommend you to set up your own Python package mirror server, or a local cache - which would be another question.
If i type pip freeze > requirements.txt, the resulting file looks similar to this:
argparse==1.2.1
h5py==2.2.0
wsgiref==0.1.2
Some libraries are under ongoing development. This happened to me regarding h5py, which is now (as of this writing) available in version 2.2.1. Thus, using pip install -r requirements.txt throws an error, saying version 2.2.0 of h5py was not found:
No distributions matching the version for h5py==2.2.0 (from -r requirements.txt (line 2))
Is it considered good practice to maintain the requirements via pip freeze at all? Obviously, I can not rely on specific version numbers being still available in the future. I would like to deploy my applications in the future, even if they are several years old, without compatibility problems regarding version numbers. Is there a way to make the output of pip freeze future-safe?
I thought about manipulating the output file of pip freeze by using the greater than symbol >= instead of the equals symbol ==, so the output would look like the following:
argparse>=1.2.1
h5py>=2.2.0
wsgiref>=0.1.2
But I can imagine that this will break my applications if any of the libraries breaks backward-compatibility in a future version.
To answer the first question, yes, it is pretty common to use pip freeze to manage requirements. If your project is packaged you can also set dependencies directly in the setup.py file.
You can set the requirements to greater than or equal to version x, but as you speculate, this can turn around and bite you if a dependency makes changes to their api that break your required functionality. You can also ensure that an installed dependency is less than a certain version. i.e. If you're on version 1.0 of a package and would like minor updates but a major release scares you (whether its released yet or not) you can require example>=1.0.0,<2.0.0
More info on requirements files
In the end, pip freeze is just a tool to show you what you currently have installed, it doesn't know, or care, if it works for you. What you use to replicate environments based on this data also doesn't really matter; version conflicts, updates breaking backwards compatibility, bugs and other such issues in dependencies will (at least once) cause you grief. Keeping tabs on the state of your project's major dependencies and doing automated testing with new versions will save you a lot of time and headache (or at least headache).
The following works:
pip install git+git://github.com/pydata/pandas#master
But the following doesn't:
pip install -e git+git://github.com/pydata/pandas#master
The error is:
--editable=git+git://github.com/pydata/pandas#master is not the right format; it must have #egg=Package
Why?
Also, I read that the -e does the following:
--egg
Install as self contained egg file, like easy_install does.
what is the value of this? When would this be helpful? (I always work on a virtualenv and install through pip)
Generally, you don't want to install as a .egg file. However, there are a few rare cases where you might. For example:
It's one of a handful of packages that needs to override a built-in package, and knows how to do so when installed as a .egg. With Apple Python, readline is such a package. I don't know of any other common exceptions.
The egg has binary dependencies that point to other eggs on PyPI, and can serve as a binary dependency for yet other eggs on PyPI. This is pretty rare nowadays, because it doesn't actually work in many important cases.
You want a package embedded in a single file that you can copy-and-paste, FTP, etc. from one installation to another.
You want a package that you can install into another installation straight out of site-packages.
The package is badly broken (and you can't fix it, for whatever reason), so that setup.py install does not work, but it can properly build an egg and run out of an egg.
Meanwhile, if you want to use editable mode, the package, and all other packages it depends on, have to be egg-compatible, whether or not you install them as eggs; pip will add #egg=<project name> to the VCS URL for each one, and if any of them don't understand that, it will fail.
I am not a total newbie but I am trying to install modules for quite a long time and at this point i would like to have a fresh start and install python and all the modules I need so i really understand them. My problem is that some of them import, but most of them install either to the wrong site-packages or dont import maybe because i messed up my system/python. Also I tried the PYTHONPATH and PATH to set this up right, but it never worked.
So my questions are:
Is there a way to ensure I can clean everything up and start from zero ?
Ideally this would be without having to set up Mac OSX new.
Is there a way to install all the modules in the correct place (whatever the directory is I dont care, it should just work)?
Is there a good step-by-step description on how installing modules works. And I dont mean just the info to use easy_install, pip install etc, but a way to fully understand what I need to consider, where I need to put them, why these modules are recognized in certain directories, how the system finds them and most important what are the most common pitfalls and how to avoid them.
I also tried Macports and various other similiar ways to install but even if some of them worked and while I am sure that these are really great tools, most I had to hack to work.
So if someone can recommend a good and stable way to install a lot of modules at once, this would be incredibly useful.
Thanks a lot !!!!
And sorry for the long questions.
Buildout and virtualenv should be what you are looking for.
Buildout helps you configure a python installation and virtualenv allows you to isolate multiple different configurations from each other.
Here's a nice blog post explaining how to use them together.
Also, see this other question: Buildout and Virtualenv
You can safely install an up-to-date Python 2 and/or Python 3 on OS X using the python.org installers here. They will coexist with any other Pythons you have installed or that were shipped by Apple with OS X. To install packages, for each Python instance first install Distribute which will install a version-specific easy_install command, i.e. easy_install-2.7 or easy_install-3.2. Many people prefer to use pip to manage packages; you can use easy_install to install a version-specific copy of it. If you want something fancier, you could also install virtualenv but, with the isolation provided by Python framework builds on OS X, that isn't as necessary as on most other platforms.
Is there a way to install all the modules in the correct place?
Download and untar/gunzip/etc the module source (Most of the modules ares available in gzip form at http://pypi.python.org/pypi), then run configure with --prefix set to the same thing for every install:
[ 11:06 jon#hozbox.com ~ ]$ ./configure --prefix=/usr/local
/usr/local is usually the default, but it doesn't hurt to specify it and will ensure that every module you install will be placed in /usr/local/lib/python/...
Is there a good step-by-step description on how installing modules works?
The Python website has a great page called: Installing Python Modules
http://pypi.python.org/pypi
http://docs.python.org/install/index.html