The installation of the virtual environment of my python application takes too much time during deployment due to a large amount of dependencies. In order to minize that time, I want to include the dependencies residing in the virtual environment in git, so that they are already there on deployment.
The main issue with that is that dependencies with C code need to be rebuild because of architecture differences between machines.
Is there a way to rebuild all dependencies that need compilation in my virtual environment?
wheel format is what you need
Popular example is lxml, which when installing from source on Linux takes about 3 minutes to get downloaded, compiled and installed.
Using wheel format and installing from local wheel file for lxml installs within fraction of a second.
For detailed instructions how I use it see corrected link to Detailed SO answer how to configure pip incl. instructions how to take advantage of wheels
For more information:
pythonwheels page listing already available wheels.
wheel ReadTheDocs
using pip to build a wheel
Some notes:
pure python packages can be distributed in wheel format regardless of target platform (apart from being possibly python version dependent).
compiled python packages shall be build on the same platform, where you are going to install them. There might be some cross-compile options, but I do not have real experience with that.
some do consider wheel of "package format of the future", others claim, that it is supposed to be build on your own side and use your own wheels. The later case is for lxml not being provided as a wheel - see launchpad issue related to lxml in wheel format. Consider adding there yourself as an affected person, if you care.
Once you manage using wheels the first time, you will love it.
Related
The advantage of wheels over eggs is clear (see section why not egg? https://pypi.python.org/pypi/wheel).
However, it is not entirely clear to me what is the advantage of using wheels over tar.gz. I might be missing something obvious like "they are the same".
As I see it both can be installed directly using pip (even in Windows), have similar size and when packaging require a similar effort.
It sounds to me like the kind of questions you might get when justifying a packaging methodology.
EDIT:
Just found an example where tar.gz might be better than wheels. CherryPy (https://pypi.python.org/pypi/CherryPy) provides wheels for Python 3.x only, so if you want to have a local repository to serve CherryPy for Python 2.7 and 3.x dependencies, it seems to make more sense to store the tarball. Is this correct? (just to add a couple of "case-based" justification to the discussion)
This answered it for me (directly from the wheel PEP):
Python needs a package format that is easier to install than sdist.
Python's sdist packages are defined by and require the distutils and
setuptools build systems, running arbitrary code to build-and-install,
and re-compile, code just so it can be installed into a new
virtualenv. This system of conflating build-install is slow, hard to
maintain, and hinders innovation in both build systems and installers.
Wheel attempts to remedy these problems by providing a simpler
interface between the build system and the installer. The wheel binary
package format frees installers from having to know about the build
system, saves time by amortizing compile time over many installations,
and removes the need to install a build system in the target
environment.
https://www.python.org/dev/peps/pep-0427/#rationale
Note the tarballs we're speaking of are what are referred to as "sdists" above.
From Python Wheels
Advantages of wheels
• Faster installation for pure python and native C extension packages.
• Avoids arbitrary code execution for installation. (Avoids setup.py)
• Installation of a C extension does not require a compiler on Windows or OS X.
• Allows better caching for testing and continuous integration.
• Creates .pyc files as part of installation to ensure they match the python interpreter used.
• More consistent installs across platforms and machines.
Make sure wheel is installed.
python3 -m pip install wheel
I'm writing a source bundle (not a fully packaged module, but some scripts with dependencies) to be installed and executed inside a framework application (Specifically, Amazon SageMaker's TensorFlow serving container - running Python 3.5).
One of my dependencies is matplotlib, which in turn needs kiwisolver, which has C++ components.
It seems like my target container doesn't have wheel installed by default, because when I supply just a requirements.txt file I get the error described in "Why is python setup.py saying invalid command 'bdist_wheel' on Travis CI?".
I think I got it working by supplying a setup.py instead, with setup_requires=["wheel"] as advised in the answers to that Travis CI question.
My Python packaging-fu is weak, so my question is: Who should be specifying this dependency, because it seems like it shouldn't be me?
Should kiwisolver be advertising that it needs wheel?
Does a framework application/environment installing user code modules via requirements.txt have an implicit contract to make wheel available in the environment, for some reason in Python's packaging ethos?
Maybe it really is on me to know that, since I'm indirectly consuming a module like kiwisolver, my package requires wheel for setup and a straight pip install -r requirements.txt won't work?
Even better if somebody can explain whether this answer is changing with PEP 518 and the deprecation of setup_requires :S
Usually wheel could be considered a build-time dependency and not an install-time dependency. But actually, wheel is just a way of distributing Python projects (libraries or applications), so it usually isn't a mandatory dependency.
The one system building the library (kiwisolver) might have a need to have the wheel tool installed. But if I am not mistaken recent versions of pip have wheel already bundled in, so nowadays there is often no need to install it explicitly.
In many cases there are wheels already built available on PyPI. But sometimes there are no wheels compatible with the target system (Python interpreter version, operating system, CPU bitness). In your case here, kiwisolver has a wide range of wheels available but not for Python 3.5.
So it seems like the system you want to install kiwisolver on, is not compatible with any of the wheels available on PyPI. So pip has to build it locally. Usually pip tries to build a wheel first, but as far as I know it's not a deal-breaker if a wheel cannot be built then pip usually just continues and installs the project without going to the wheel intermediary step.
But still pip has to be able to build the library, which might require some C/C++ compilers or that other unusual conditions are met on the local system. Which is why distributing libraries as wheel is very comfortable, since the build step is already done.
So to sum it up, from my point of view, no one really has to declare wheel as a dependency or install wheel unless they actually want to build wheels. But wheel really is just an intermediary optional step. It's a way of distributing Python projects (libraries or applications). I don't see the absolute need for adding wheel to setuptools' setup_requires (which is deprecated or on close to it) nor to pyproject.toml's build-system.requires, it's more of a (very common, and quasi standard) convenience.
Now what would I do in your situation?
Before installing from the requirements.txt file that contains kiwisolver (directly or indirectly) either make sure that pip is up-to-date or explicitly install wheel, and:
Use a version of Python for which wheels are already available on PyPI.
If you want to stay on Python 3.5:
Make sure the target system is able to build kiwisolver itself (maybe it requires a C/C++ compiler plus some other native libraries).
The following takes place in a Python 3 virtual environment.
I just authored a little package that requires numpy. So, in setup.py, I wrote install_requires=['numpy']. I ran python3 setup.py install, and it took something like two minutes -- I got the full screen dump of logs, warnings, and configurations that normally comes with a numpy installation.
Then, I created a new virtual environment, and this time simply wrote pip3 install numpy -- which took only a few seconds -- and then ran python3 setup.py install, and I was done almost immediately.
What's the difference between the two, and why was pip3 install numpy so much faster? Should I thus include a requirements.txt just so people can pip-install the requirements rather than using setuptools?
Note that when I wrote pip3 install numpy, I got the following:
Collecting numpy
Using cached numpy-1.12.0-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.12.0
Is it possible that this was so much faster because the numpy wheel was already cached?
pip install uses wheel packages, which were designed partly with the purpose of speeding up the installation process.
The Rationale section of PEP 427, which introduced the wheel format, states:
Python needs a package format that is easier to install than sdist.
Python's sdist packages are defined by and require the distutils and
setuptools build systems, running arbitrary code to build-and-install,
and re-compile, code just so it can be installed into a new
virtualenv. This system of conflating build-install is slow, hard to
maintain, and hinders innovation in both build systems and installers.
Wheel attempts to remedy these problems by providing a simpler
interface between the build system and the installer. The wheel binary
package format frees installers from having to know about the build
system, saves time by amortizing compile time over many installations,
and removes the need to install a build system in the target
environment.
Installing from a wheel is faster since it is a Built Distribution format:
Built Distribution
A Distribution format containing files and metadata that only need to be moved to the correct location on the target system, to be
installed. Wheel is such a format, whereas distutil’s Source
Distribution is not, in that it requires a build step before it can be
installed. This format does not imply that python files have to be
precompiled (Wheel intentionally does not include compiled python
files).
Since numpy's source distribution contains significant amount of C code, compiling it takes noticeable time, which you observed when you installed it via bare setuptools. pip avoided the compilation of C code since the wheel came with binary code (already compiled for your system).
I'm an author of a pure Python library that aims to be also convenient to use from a command line. For Windows users it would be nice just installing the package from an .exe or .msi package.
However I cannot get the installer to install package dependencies (especially the dependency on setuptools itself, so that running the software fails with an import error on pkg_resources). I don't believe that providing an easy .exe installer makes much sense, if the user then needs to manually install setuptools and other libraries on top. I'd rather tell them how to add easy_install to their PATH and go through this way (http://stackoverflow.com/questions/1449494/how-do-i-install-python-packages-on-windows).
I've build .exe packages in the past, but don't remember if that ever worked the way I'd preferred it to.
It is quite common to distribute packages that have dependencies, especially those as you have, but I understand your wish to make installation as simple as possible.
Have a look at deployment bootstrapper, a tool dedicated to solving the problem of delivering software including its prerequisites.
Regardless of what packaging method you eventually choose, maintain your sanity by staying away from including MSIs in other MSI in any way. That just does not work because of transactional installation requirements and locking of the Windows Installer database.
I recently began learning Python, and I am a bit confused about how packages are distributed and installed.
I understand that the official way of installing packages is distutils: you download the source tarball, unpack it, and run: python setup.py install, then the module will automagically install itself
I also know about setuptools which comes with easy_install helper script. It uses eggs for distribution, and from what I understand, is built on top of distutils and does the same thing as above, plus it takes care of any dependencies required, all fetched from PyPi
Then there is also pip, which I'm still not sure how it differ from the others.
Finally, as I am on a windows machine, a lot of packages also offers binary builds through a windows installer, especially the ones that requires compiling C/Fortran code, which otherwise would be a nightmare to manually compile on windows (assumes you have MSVC or MinGW/Cygwin dev environment with all necessary libraries setup.. nonetheless try to build numpy or scipy yourself and you will understand!)
So can someone help me make sense of all this, and explain the differences, pros/cons of each method. I'd like to know how each keeps track of packages (Windows Registry, config files, ..). In particular, how would you manage all your third-party libraries (be able to list installed packages, disable/uninstall, etc..)
I use pip, and not on Windows, so I can't provide comparison with the Windows-installer option, just some information about pip:
Pip is built on top of setuptools, and requires it to be installed.
Pip is a replacement (improvement) for setuptools' easy_install. It does everything easy_install does, plus a lot more (make sure all desired distributions can be downloaded before actually installing any of them to avoid broken installs, list installed distributions and versions, uninstall, search PyPI, install from a requirements file listing multiple distributions and versions...).
Pip currently does not support installing any form of precompiled or binary distributions, so any distributions with extensions requiring compilation can only be installed if you have the appropriate compiler available. Supporting installation from Windows binary installers is on the roadmap, but it's not clear when it will happen.
Until recently, pip's Windows support was flaky and untested. Thanks to a lot of work from Dave Abrahams, pip trunk now passes all its tests on Windows (and there's a continuous integration server helping us ensure it stays that way), but a release has not yet been made including that work. So more reliable Windows support should be coming with the next release.
All the standard Python package installation mechanisms store all metadata about installed distributions in a file or files next to the actual installed package(s). Distutils uses a distribution_name-X.X-pyX.X.egg-info file, pip uses a similarly-named directory with multiple metadata files in it. Easy_install puts all the installed Python code for a distribution inside its own zipfile or directory, and places an EGG-INFO directory inside that directory with metadata in it. If you import a Python package from the interactive prompt, check the value of package.__file__; you should find the metadata for that package's distribution nearby.
Info about installed distributions is only stored in any kind of global registry by OS-specific packaging tools such as Windows installers, Apt, or RPM. The standard Python packaging tools don't modify or pay attention to these listings.
Pip (or, in my opinion, any Python packaging tool) is best used with virtualenv, which allows you to create isolated per-project Python mini-environments into which you can install packages without affecting your overall system. Every new virtualenv automatically comes with pip installed in it.
A couple other projects you may want to be aware of as well (yes, there's more!):
distribute is a fork of setuptools which has some additional bugfixes and features.
distutils2 is intended to be the "next generation" of Python packaging. It is (hopefully) adopting the best features of distutils/setuptools/distribute/pip. It is being developed independently and is not ready for use yet, but eventually should replace distutils in the Python standard library and become the de facto Python packaging solution.
Hope all that helped clarify something! Good luck.
I use windows and python. It is somewhat frustrating, because pip doesn't always work to install things. Python is moving to pip, so I still use it. Pip is nice, because you can uninstall items and use
pip freeze > requirements.txt
pip install -r requirements.txt
Another reason I like pip is for virtual environments like venv with python 3.4. I have found venv a lot easier to use on windows than virtualenv.
If you cannot install a package you have to find the binary for it. http://www.lfd.uci.edu/~gohlke/pythonlibs/
I have found these binaries to be very useful.
Pip is trying to make something called a wheel for binary installations.
pip install wheel
wheel convert path\to\binary.exe
pip install converted_wheel.whl
You will also have to do this for any required libraries that do not install and are required for that package.