How to list in Python project all libraries and their version?
(similar to Java maven pom.xml and Node.js npm 'package.json')
There is definitely way to specify that on command line,
e.g. pip install pandas==0.23, but that only hopefully will be listed in README, but most likely be forgotten.
Also as there is pip and conda, do they address this? Or maybe some other 3rd tool tries to unify for whatever pip or conda usage?
There are two main mechanisms to specify libraries for a package/repository:
For an out-of-the-box package that can be installed with all dependencies, use setup.py/pyproject.toml. These are used by package managers to automatically install required dependencies on package installation.
For a reproducible installation of a given setup of possibly many packages, use requirements.txt. These are used by package managers to explicitly install required dependencies.
Notably, the two handle different use-cases: Package dependencies define the general dependencies, only restricting dependency versions to specific ranges when needed for compatibility. Requirements define the precise dependencies, restricting all direct and indirect dependency versions to replicate some pre-existing setup.
Related
Similar to how Node.js automatically adds dependencies to package-lock.json, is there a way I can automatically add requirements to my requirements.txt file for Python?
Since you mentioned Node.js specifically, the Python project that comes closest to what you're looking for is probably Pipenv.
Blurb from the Pipenv documentation:
Pipenv is a dependency manager for Python projects. If you're familiar with Node.js's npm or Ruby’s bundler, it is similar in spirit to those tools. While pip can install Python packages, Pipenv is recommended as it’s a higher-level tool that simplifies dependency management for common use cases.
It's quite a popular package among developers as the many stars on GitHub attest.
Alternatively, you can use a "virtual environment" in which you only install the external dependencies that your project needs. You can either use the venv module from the standard library or the Virtualenv package from PyPI, which offers certain additional features (that you may or may not need). With either of those, you can then use Python's (standard) package manager Pip to update the requirements file:
pip freeze >requirements.txt
This is the "semi-automatic" way, so to speak. Personally, I prefer to do this manually. That's because in a typical development environment ("virtual" or not), you also install packages that are only required for development tasks, such as running tests or building the documentation. They don't need to be installed along with your package on end-user machines, so shouldn't be in requirements.txt. Popular packaging tools such as Flit and Poetry manage these "extra dependencies" separately, as does Pip.
If you are using Linux you can create an alias like this:
alias req='pip3 freeze > ~/requerments.txt'
And then when you want to install new package use this command:
pip3 install <package> | req
I think, to-requirements.txt is what you need:
pip install to-requirements.txt
requirements-txt setup
After that installed packages will be appended to requirements.txt. And uninstalled packages will be removed.
It might require root access if you install it on system-wide Python interpreter. Add sudo if it failes.
I am using requirement.txt to specify the package dependencies that are used in my python application. And everything seems to work fine for packages of which either there are no internal dependencies or for the one using the package dependencies which are not already installed.
The issue occurs when i try to install a package which has a nested dependency on some other package and an older version of this package is already installed.
I know i can avoid this while installing a package manually bu using pip install -U --no-deps <package_name>. I want to understand how to do this using the requirement.txt as the deployment and requirement installation is an automated process.
Note:
The already installed package is not something i am directly using in my project but is part of a different project on the same server.
Thanks in advance.
Dependency resolution is a fairly complicated problem. A requirements.txt just specifies your dependencies with optional version ranges. If you want to "lock" your transitive dependencies (dependencies of dependencies) in place you would have to produce a requirements.txt that contains exact versions of every package you install with something like pip freeze. This doesn't solve the problem but it would at least point out to you on an install which dependencies conflict so that you can manually pick the right versions.
That being said the new (as of writing) officially supported tool for managing application dependencies is Pipenv. This tool will both manage the exact versions of transitive dependencies for you (so you won't have to maintain a "requirements.txt" manually) and it will isolate the packages that your code requires from the rest of the system. (It does this using the virtualenv tool under the hood). This isolation should fix your problems with breaking a colocated project since your project can have different versions of libraries than the rest of the system.
(TL;DR Try using Pipenv and see if your problem just disappears)
I've had a quick look around, but because of terminology like dependencies and packages being used in different ways, it's quite tricky to pin down an answer.
I'm building a mixed-language source (Fortran, some C and Python) and the Fortran calls a Python script which depends on the networkx Python package in the PyPI. Normally, I just have networkx installed anyway, so it isn't a problem for me when rebuilding.
However, for distribution, I want the best way to:
Install pip or equivalent, if it is not installed.
Possibly install virtualenv and create a virtual environment, if appropriate.
Download and install networkx using the --user option with pip.
Is there a standard way? Or should I just use CMake dependencies with custom commands that install pip etc.?
it depends. for "manual" install, you definitely should detect if all required (to build) tools are installed, and issue an error if they don't. then use execute_process() to run pip and whatever you want.
from other side, if you are going to produce a real package for some particular Linux, you just pack your binaries and require (via corresponding syntax of particular package format like *.rpm or *.deb that your package depends on some other packages. so, you can be sure that they will be installed w/ (or even before) your package.
I'd like to distribute a whole virtualenv, or a bunch of Python wheels of exact versions with their runtime dependencies, for example:
pycurl
pycurl.so
libcurl.so
libz.so
libssl.so
libcrypto.so
libgssapi_krb5.so
libkrb5.so
libresolv.so
I suppose I could rely on the system to have libssl.so installed, but surely not libcurl.so of the correct version and probably not Kerberos.
What is the easiest way to package one library in a wheel with all the run-time dependency?
Or is that a fool's errand and I should package entire virtualenv?
How to do that reliably?
P.S. compiling on the fly is not an option, some modules are patched.
AFAIK, there is no good standard way to portably install dependencies with your package. Continuum has made conda for precisely this purpose. The numpy guys wrote their own distutils submodule in their package to install some complicated dependencies, and now at least some of them advocate conda as a solution. Unfortunately, you may have to make conda packages for some of these dependencies yourself.
If you're fine without portability, then targeting the package manager of the target machines will obviously work. Otherwise, for a portable package manager, conda is the only option I know of.
Alternatively, from your post ("compiling on the fly is not an option") it sounds like portability may not be an issue for you, in which case you could also install all the requirements to a prefix directory (most installers I've come across support a configure --prefix=/some/dir/ option). If you have a guaranteed single architecture, you could probably prefix-install all your dependencies to a single directory and pass that around like a file. The conda approach would probably be cleaner, but I've used prefix installs quite a bit and they tend to be one of the easiest solutions to get going.
Edit:
As for conda, it is simultaneously a package-manager and a "virtualenv"-like environment/python install. While virtualenv is added on top of an existing python install, conda takes over the whole install, so you can be more sure that all the dependencies are accounted for. Compared to pip, it is designed for adding generalized non-Python dependencies, instead of just compiling C/Cpp exentions. For more info, I would see:
pip vs conda (also recommends buildout as a possibility)
conda as a python install
As for how to use conda for your purpose, the docs explain how to create a recipe:
Conda build framework
Building a package requires a recipe. A recipe is flat directory which
contains the following files:
meta.yaml (metadata file)
build.sh (Unix build script which is executed using bash)
bld.bat (Windows build script which is executed using cmd)
run_test.py (optional Python test file)
patches to the source (optional, see below)
other resources, which are not included in the source and cannot be
generated by the build scripts.
The same recipe should be used to build a package on all platforms.
When building a package, the following steps are invoked:
read the metadata
download the source (into a cache)
extract the source in a source directory
apply the patches
create a build environment (build dependencies are installed here)
run the actual build script. The current working directory is the source
directory with environment variables set. The build script installs into
the build environment
do some necessary post processing steps: shebang, rpath, etc.
add conda metadata to the build environment
package up the new files in the build environment into a conda package
test the new conda package:
create a test environment with the package (and its dependencies)
run the test scripts
There are example recipes for many conda packages in the conda-recipes
<https://github.com/continuumio/conda-recipes>_ repo.
The :ref:conda skeleton <skeleton_ref> command can help to make skeleton recipes for common
repositories, such as PyPI <https://pypi.python.org/pypi>_.
Then, as a client, you would install the package similar to how you would install from pip
Lastly, docker may also be interesting to you, though I haven't seen it much used for Python.
You may want to look into PEX: https://pex.readthedocs.io/en/stable/whatispex.html
'Files with the .pex extension – “PEX files” or ”.pex files” – are self-contained executable Python virtual environments. PEX files make it easy to deploy Python applications: the deployment process becomes simply scp.'
I recently began learning Python, and I am a bit confused about how packages are distributed and installed.
I understand that the official way of installing packages is distutils: you download the source tarball, unpack it, and run: python setup.py install, then the module will automagically install itself
I also know about setuptools which comes with easy_install helper script. It uses eggs for distribution, and from what I understand, is built on top of distutils and does the same thing as above, plus it takes care of any dependencies required, all fetched from PyPi
Then there is also pip, which I'm still not sure how it differ from the others.
Finally, as I am on a windows machine, a lot of packages also offers binary builds through a windows installer, especially the ones that requires compiling C/Fortran code, which otherwise would be a nightmare to manually compile on windows (assumes you have MSVC or MinGW/Cygwin dev environment with all necessary libraries setup.. nonetheless try to build numpy or scipy yourself and you will understand!)
So can someone help me make sense of all this, and explain the differences, pros/cons of each method. I'd like to know how each keeps track of packages (Windows Registry, config files, ..). In particular, how would you manage all your third-party libraries (be able to list installed packages, disable/uninstall, etc..)
I use pip, and not on Windows, so I can't provide comparison with the Windows-installer option, just some information about pip:
Pip is built on top of setuptools, and requires it to be installed.
Pip is a replacement (improvement) for setuptools' easy_install. It does everything easy_install does, plus a lot more (make sure all desired distributions can be downloaded before actually installing any of them to avoid broken installs, list installed distributions and versions, uninstall, search PyPI, install from a requirements file listing multiple distributions and versions...).
Pip currently does not support installing any form of precompiled or binary distributions, so any distributions with extensions requiring compilation can only be installed if you have the appropriate compiler available. Supporting installation from Windows binary installers is on the roadmap, but it's not clear when it will happen.
Until recently, pip's Windows support was flaky and untested. Thanks to a lot of work from Dave Abrahams, pip trunk now passes all its tests on Windows (and there's a continuous integration server helping us ensure it stays that way), but a release has not yet been made including that work. So more reliable Windows support should be coming with the next release.
All the standard Python package installation mechanisms store all metadata about installed distributions in a file or files next to the actual installed package(s). Distutils uses a distribution_name-X.X-pyX.X.egg-info file, pip uses a similarly-named directory with multiple metadata files in it. Easy_install puts all the installed Python code for a distribution inside its own zipfile or directory, and places an EGG-INFO directory inside that directory with metadata in it. If you import a Python package from the interactive prompt, check the value of package.__file__; you should find the metadata for that package's distribution nearby.
Info about installed distributions is only stored in any kind of global registry by OS-specific packaging tools such as Windows installers, Apt, or RPM. The standard Python packaging tools don't modify or pay attention to these listings.
Pip (or, in my opinion, any Python packaging tool) is best used with virtualenv, which allows you to create isolated per-project Python mini-environments into which you can install packages without affecting your overall system. Every new virtualenv automatically comes with pip installed in it.
A couple other projects you may want to be aware of as well (yes, there's more!):
distribute is a fork of setuptools which has some additional bugfixes and features.
distutils2 is intended to be the "next generation" of Python packaging. It is (hopefully) adopting the best features of distutils/setuptools/distribute/pip. It is being developed independently and is not ready for use yet, but eventually should replace distutils in the Python standard library and become the de facto Python packaging solution.
Hope all that helped clarify something! Good luck.
I use windows and python. It is somewhat frustrating, because pip doesn't always work to install things. Python is moving to pip, so I still use it. Pip is nice, because you can uninstall items and use
pip freeze > requirements.txt
pip install -r requirements.txt
Another reason I like pip is for virtual environments like venv with python 3.4. I have found venv a lot easier to use on windows than virtualenv.
If you cannot install a package you have to find the binary for it. http://www.lfd.uci.edu/~gohlke/pythonlibs/
I have found these binaries to be very useful.
Pip is trying to make something called a wheel for binary installations.
pip install wheel
wheel convert path\to\binary.exe
pip install converted_wheel.whl
You will also have to do this for any required libraries that do not install and are required for that package.