Why pipenv is not using its wheel cache?

Why pipenv is not using its wheel cache? - python

Whenever I install a new package (without using the --skip-lock option), pipenv downloads and (in case of non-binary dependencies) compiles all the packages from the scratch, even though the wheels are already cached in ~/.cache/pipenv. This makes the whole development process slow, since I have a lot of packages that need to be compiled from the source.
Currently I download and compile my packages using pip, use pypi-server to run a local package server, and point my pipenv to it (using [[source]]). But I'm wondering if there is a better way.

Related

Slow xmlsec package build with pip install

I'm using xmlsec 1.3.3 in my python web application.
Every time I run a clean pip install, this is the package it hangs on, for about 5 minutes.
The package size is 15KB and pip show a using cached... message, so I guess the time is taken by building some specific security libraries.
Is there a way to do a clean pip install, but without rebuilding the xmlsec related libraries?

xmlsec is distributed in source code only but it's written in C so pip needs to compile it on every fresh installation. It's not possible to not compile it.
You can pre-compile it yourself if you use one specific platform and always install from your package instead of PyPI.

Install a new package from requirement.txt without upgrading the dependencies which are already satisfied

I am using requirement.txt to specify the package dependencies that are used in my python application. And everything seems to work fine for packages of which either there are no internal dependencies or for the one using the package dependencies which are not already installed.
The issue occurs when i try to install a package which has a nested dependency on some other package and an older version of this package is already installed.
I know i can avoid this while installing a package manually bu using pip install -U --no-deps <package_name>. I want to understand how to do this using the requirement.txt as the deployment and requirement installation is an automated process.
Note:
The already installed package is not something i am directly using in my project but is part of a different project on the same server.
Thanks in advance.

Dependency resolution is a fairly complicated problem. A requirements.txt just specifies your dependencies with optional version ranges. If you want to "lock" your transitive dependencies (dependencies of dependencies) in place you would have to produce a requirements.txt that contains exact versions of every package you install with something like pip freeze. This doesn't solve the problem but it would at least point out to you on an install which dependencies conflict so that you can manually pick the right versions.
That being said the new (as of writing) officially supported tool for managing application dependencies is Pipenv. This tool will both manage the exact versions of transitive dependencies for you (so you won't have to maintain a "requirements.txt" manually) and it will isolate the packages that your code requires from the rest of the system. (It does this using the virtualenv tool under the hood). This isolation should fix your problems with breaking a colocated project since your project can have different versions of libraries than the rest of the system.
(TL;DR Try using Pipenv and see if your problem just disappears)

Build conda package upon installation

So I have published a conda package (link).
This package contains .c extensions (coming from cython code), which need to be compiled when the package is installed. My problem is that none of the extensions are compiled when running the install command
conda install -c nicolashug scikit-surprise
Compiling the extensions can be done by simply running
python setup.py install
which is exactly what pip does. The package is on PyPI and works fine.
As far as I understand, this setup.py command is only called when I build the conda package using conda build: the meta.yaml file (created with conda skeleton) contains
build:
script: python setup.py install --single-version-externally-managed--record=record.txt
But I need this to be done when the package is installed, not built.
Reading the conda docs, it looks like the install process is merely a matter of copying files:
Installing the files of a conda package into an environment can be thought of as changing the directory to an environment, and then downloading and extracting the .zip file and its dependencies
That would mean I would have to build the package for all platforms and architectures, and then upload them to conda... Which is impossible to me.
So, is there a way to build the package when it is installed, just like pip does?

As far as I know, there is no way to have the compilation happen on the user's machine when installing a conda package. Indeed, the whole idea of a conda package is that you do the compiling so that I don't have to on my machine, and all that's distributed is the compiled library. On Windows in particular, setting up compilers so they work properly (with Python) is a big big PITA, which is one of the biggest reasons for conda (and also wheels installed by pip).
If you don't have access to a particular OS directly, you can use Continuous Integration (CI) services, such as Appveyor (Windows), Travis CI (Linux/macOS), or CircleCI (Linux/macOS) to build packages and upload them to Anaconda cloud (or to PyPI for that matter). These services integrate directly with GitHub and other code-sharing services, and are generally free for FOSS projects. That way, you can build packages on each commit, on each tag, or some other variation that you desire.
In the end, you may save more time by setting up these services, because you won't have to provide compiler support for users who can't install a source package from PyPI.

Why is `pip3 install numpy` much faster than setting it in `install_requires`?

The following takes place in a Python 3 virtual environment.
I just authored a little package that requires numpy. So, in setup.py, I wrote install_requires=['numpy']. I ran python3 setup.py install, and it took something like two minutes -- I got the full screen dump of logs, warnings, and configurations that normally comes with a numpy installation.
Then, I created a new virtual environment, and this time simply wrote pip3 install numpy -- which took only a few seconds -- and then ran python3 setup.py install, and I was done almost immediately.
What's the difference between the two, and why was pip3 install numpy so much faster? Should I thus include a requirements.txt just so people can pip-install the requirements rather than using setuptools?
Note that when I wrote pip3 install numpy, I got the following:
Collecting numpy
Using cached numpy-1.12.0-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.12.0
Is it possible that this was so much faster because the numpy wheel was already cached?

pip install uses wheel packages, which were designed partly with the purpose of speeding up the installation process.
The Rationale section of PEP 427, which introduced the wheel format, states:
Python needs a package format that is easier to install than sdist.
Python's sdist packages are defined by and require the distutils and
setuptools build systems, running arbitrary code to build-and-install,
and re-compile, code just so it can be installed into a new
virtualenv. This system of conflating build-install is slow, hard to
maintain, and hinders innovation in both build systems and installers.
Wheel attempts to remedy these problems by providing a simpler
interface between the build system and the installer. The wheel binary
package format frees installers from having to know about the build
system, saves time by amortizing compile time over many installations,
and removes the need to install a build system in the target
environment.
Installing from a wheel is faster since it is a Built Distribution format:
Built Distribution
A Distribution format containing files and metadata that only need to be moved to the correct location on the target system, to be
installed. Wheel is such a format, whereas distutil’s Source
Distribution is not, in that it requires a build step before it can be
installed. This format does not imply that python files have to be
precompiled (Wheel intentionally does not include compiled python
files).
Since numpy's source distribution contains significant amount of C code, compiling it takes noticeable time, which you observed when you installed it via bare setuptools. pip avoided the compilation of C code since the wheel came with binary code (already compiled for your system).

Python packages installation in Windows

I recently began learning Python, and I am a bit confused about how packages are distributed and installed.
I understand that the official way of installing packages is distutils: you download the source tarball, unpack it, and run: python setup.py install, then the module will automagically install itself
I also know about setuptools which comes with easy_install helper script. It uses eggs for distribution, and from what I understand, is built on top of distutils and does the same thing as above, plus it takes care of any dependencies required, all fetched from PyPi
Then there is also pip, which I'm still not sure how it differ from the others.
Finally, as I am on a windows machine, a lot of packages also offers binary builds through a windows installer, especially the ones that requires compiling C/Fortran code, which otherwise would be a nightmare to manually compile on windows (assumes you have MSVC or MinGW/Cygwin dev environment with all necessary libraries setup.. nonetheless try to build numpy or scipy yourself and you will understand!)
So can someone help me make sense of all this, and explain the differences, pros/cons of each method. I'd like to know how each keeps track of packages (Windows Registry, config files, ..). In particular, how would you manage all your third-party libraries (be able to list installed packages, disable/uninstall, etc..)

I use pip, and not on Windows, so I can't provide comparison with the Windows-installer option, just some information about pip:
Pip is built on top of setuptools, and requires it to be installed.
Pip is a replacement (improvement) for setuptools' easy_install. It does everything easy_install does, plus a lot more (make sure all desired distributions can be downloaded before actually installing any of them to avoid broken installs, list installed distributions and versions, uninstall, search PyPI, install from a requirements file listing multiple distributions and versions...).
Pip currently does not support installing any form of precompiled or binary distributions, so any distributions with extensions requiring compilation can only be installed if you have the appropriate compiler available. Supporting installation from Windows binary installers is on the roadmap, but it's not clear when it will happen.
Until recently, pip's Windows support was flaky and untested. Thanks to a lot of work from Dave Abrahams, pip trunk now passes all its tests on Windows (and there's a continuous integration server helping us ensure it stays that way), but a release has not yet been made including that work. So more reliable Windows support should be coming with the next release.
All the standard Python package installation mechanisms store all metadata about installed distributions in a file or files next to the actual installed package(s). Distutils uses a distribution_name-X.X-pyX.X.egg-info file, pip uses a similarly-named directory with multiple metadata files in it. Easy_install puts all the installed Python code for a distribution inside its own zipfile or directory, and places an EGG-INFO directory inside that directory with metadata in it. If you import a Python package from the interactive prompt, check the value of package.__file__; you should find the metadata for that package's distribution nearby.
Info about installed distributions is only stored in any kind of global registry by OS-specific packaging tools such as Windows installers, Apt, or RPM. The standard Python packaging tools don't modify or pay attention to these listings.
Pip (or, in my opinion, any Python packaging tool) is best used with virtualenv, which allows you to create isolated per-project Python mini-environments into which you can install packages without affecting your overall system. Every new virtualenv automatically comes with pip installed in it.
A couple other projects you may want to be aware of as well (yes, there's more!):
distribute is a fork of setuptools which has some additional bugfixes and features.
distutils2 is intended to be the "next generation" of Python packaging. It is (hopefully) adopting the best features of distutils/setuptools/distribute/pip. It is being developed independently and is not ready for use yet, but eventually should replace distutils in the Python standard library and become the de facto Python packaging solution.
Hope all that helped clarify something! Good luck.

I use windows and python. It is somewhat frustrating, because pip doesn't always work to install things. Python is moving to pip, so I still use it. Pip is nice, because you can uninstall items and use
pip freeze > requirements.txt
pip install -r requirements.txt
Another reason I like pip is for virtual environments like venv with python 3.4. I have found venv a lot easier to use on windows than virtualenv.
If you cannot install a package you have to find the binary for it. http://www.lfd.uci.edu/~gohlke/pythonlibs/
I have found these binaries to be very useful.
Pip is trying to make something called a wheel for binary installations.
pip install wheel
wheel convert path\to\binary.exe
pip install converted_wheel.whl
You will also have to do this for any required libraries that do not install and are required for that package.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.