Why are python packages installed from a different index-url with pip?

Why are python packages installed from a different index-url with pip? - python

I noticed on this page https://anaconda.org/pypi/urllib3 that the pip command to install the package was slightly different than normal:
pip install -i https://pypi.anaconda.org/pypi/simple urllib3
Digging a bit through pip's help I figured out the following which basically says that things are usually installed from https://pypi.python.org/simple.
Why is there a separate Python repository that Anaconda uses? I would've expected that you simply pip install anything but this seems to suggest there is a level of choice between the following two.
https://pypi.python.org/simple
https://pypi.anaconda.org/pypi/simple
Package Index Options (including deprecated options):
-i, --index-url Base URL of Python Package Index (default
https://pypi.python.org/simple). This should point to a
repository compliant with PEP 503 (the simple
repository API) or a local directory laid out in the
same format.

Why is there a separate Python repository that Anaconda uses?
Because Continuum IO (the maintainers of conda and Anaconda) decided that they wanted to have their own pip repository, I suppose. As far as I know, there's no difference between the two, except that possibly some package versions are different between the two repositories, or perhaps one repository has some packages that aren't present in the other.
In any case, in my experience, the pip that's installed by default with Anaconda searches the https://pypi.python.org/simple repository by default, and one has to manually include the -i option to get to the Anaconda pip repository.

Related

Why use both conda and pip? [duplicate]

This question already has answers here:
What is the difference between pip and conda?
(14 answers)
Closed 2 years ago.
In this article, the author suggests the following
To install fuzzy matcher, I found it easier to conda install the
dependencies (pandas, metaphone, fuzzywuzzy) then use pip to install
fuzzymatcher. Given the computational burden of these algorithms you
will want to use the compiled c components as much as possible and
conda made that easiest for me.
Can someone explain why he is suggesting to use Conda to install dependencies and then use pip to install the actual package i.e fuzzymatcher? Why can't we just use Conda for both? Also, how do we know if we are using the compiled C packages as he suggested?

Other answers have addressed how Conda does a better job managing non-Python dependencies. As for why use Pip at all, in this case it's not complicated: the fuzzymatcher package was not available on Conda when the article was written (18 Feb 2020). The first and only version of the package was uploaded on Conda Forge on 1 Dec 2020.
Unless one wants an version older (< 0.0.5), one can now just use Conda. Going forward, Conda Forge's Autotick Bot will automatically submit pull requests and build any new versions of the package whenever they get pushed to PyPI.

conda is the package manager (installer and uninstaller) for Anaconda or Miniconda.
pip is the package manager for Python.
Depend on your system environment and additional settings, pip and conda may install onto the same Python installation folder ($PYTHONPATH/Lib/site-packages or %PYTHONPATH%\Lib\site-packages). Hence both conda and pip usually work well together.
However, conda and pip get their Python packages from different channels or websites.
conda searches and downloads from the official channel: https://repo.anaconda.com/pkgs/
This packages are supported officially by Anaconda and hence maintained in that channel.
However, we may not find every Python packages or packages of newer versions than those in the official channel. That is why sometimes we may install Python packages from "conda-forge" or "bioconda". These are the unofficial channels maintained by developers and other friendly users.
We could specify other channel like these:
conda install <package1> --channel conda-forge
conda install <package2> --channel bioconda
pip searches and download from pypi
We should be able to download every publicly available Python packages there.
These packages are generated and uploaded by developers and friendly users.
The dependency setting in each package may not be fully tested nor verified.
These packages may not support older or newer version of Python.
Hence, if you are using Anaconda or Miniconda, you should use conda. If you could not find specific packages from the official channels, you may try conda-forge or bioconda. Finally get it from pypi.
However, if you do not use Anaconda, then stick with pip.
For advanced users, you may download the most latest libraries from their source (such as github, gitlab, etc.) However there is a catch:
Some Python packages are written in pure Python. In this case, you should not have issue to install these packages into your system.
Some Python packages are written in C, C++, Go, etc. In this case, you would need
A supported compiler for your system as well as your Python environment (32- or 64-bit, versions).
Python header files, linkable Python libraries and archives specific for your installed Python version. Anaconda includes these in its installation.
How do we know if a Python package needs a particular compiler?
It may not be easy for people to find out. However, you could find out in the following means (possibly order):
Look at the landing page (or README.nd or README.txt files) in the source repository.
For example, if you go to Pandas's source repository, it show that it needs cython, hence the installation would need a C compiler.
Look at the setup.py in the source repository.
For example, if you go to numpy's setup.py, it needs a C compiler.
Look at the amount of source code that are written using programming languages that need compilation (such as C, C++, Go, etc.) For example, numpy library is written using 35.7% of C, 1.0% of C++, etc. However, this is only a guide as these source code may be only testing routines.
Ask in stackoverflow.

For the compiled C packages, you could import a package, see where it's located, and check the package itself to see what it imports. At some point, you would read into an import of a compiled module (.so extension on *nix). There's possibly an easier way, but that may depend on at what point in the import sequence of the package the compiled module is loaded.
Fuzzymatcher may not be available through Conda, or only an outdated version, or only a version that matches an outdated set of dependencies. Then you may end up with an out-of-date set of packages. Pip may have a more recent version of fuzzymatcher, and likely cares less (for better or worse) on the versions of various other packages in your environment. I'm not familiar with fuzzymatcher, so I can't give you an exact reason: you'd have to ask the author.
Note that the point of that paragraph, on installing the necessary packages with Conda, is that some packages require (C) libraries (not necessary compiled packages, though these will depend on these libraries) that may not be installed by default on your system. Conda will install these for you; Pip will not.

How to install python packages in a virtual environment without downloading them again?

It's a great hassle when installing some packages in a VE and conda or pip downloads them again even when I already have it in my base environment. Since I have limited internet bandwidth and I'm assuming I'll work with many different VE's, it will take a lot of time to download basic packages such as OpenCV/Tensorflow.

By default, pip caches anything it downloads, and will used the cached version whenever possible. This cache is shared between your base environment and all virtual environments. So unless you pass the --no-cache-dir option, pip downloading a package means it has not previously downloaded a compatible version of that package. If you already have that package installed in your base environment or another virtual environment and it downloads it anyway, this probably means one or more of the following is true:
You installed your existing version with a method other than pip.
There is a newer version available, and you didn't specify, for example, pip install pandas=1.1.5 (if that's the version you already have elsewhere). Pip will install the newest compatible version for your environment, unless you tell it otherwise.
The VE you're installing to is a different Python version (e.g. created with Pyenv), and needs a different build.
I'm less familiar with the specifics of conda, and I can't seem to find anything in its online docs that focuses on the default caching behavior. However, a how-to for modifying the cache location seems to assume that the default behavior is similar to how pip works. Perhaps someone else with more Anaconda experience can chime in as well.
So except for the caveats above, as long as you're installing a package with the same method you did last time, you shouldn't have to download anything.
If you want to simplify the process of installing all the same packages (that were installed via pip) in a new VE that you already have in another environment, pip can automate that too. Run pip freeze > requirements.txt in the first environment, and copy the resulting file to your newly created VE. There, run pip install -r requirements.txt and pip will install all the packages that were installed (via pip) in the first environment. (Note that pip freeze records version numbers as well, so this won't install newer versions that may be available -- whether this is a good or bad thing depends on your needs.)

"command-not-found==0.2.44" in pip's freeze

The output of pip freeze on my machine has included the following odd line:
command-not-found==0.2.44
When trying to install requirements on a different machine, I got the obvious No distributions at all found for command-not-found==0.2.44. Is this a pip bug? Or is there any real python package of that name, one which does not exist in pypi?

Indeed, as mentioned in the follow up comments, Ubuntu has a python package, installed via dpkg/apt that is called "python-commandnotfound"
$apt-cache search command-not-found
command-not-found - Suggest installation of packages in interactive bash sessions
command-not-found-data - Set of data files for command-not-found.
python-commandnotfound - Python 2 bindings for command-not-found.
python3-commandnotfound - Python 3 bindings for command-not-found.
As this is provided via apt, and not available in the pypi repo, you won't be able to install it via pip, but pip will see that it is installed. For the purposes of showing installed packages, pip doesn't care if a package is installed via apt, easy_install, pip, manually, etc.
In short, if you actually need it on another host (which I assume you don't) you'll need to apt-get install python-commandnotfound.

Why use pip over easy_install?

A tweet reads:
Don't use easy_install, unless you
like stabbing yourself in the face.
Use pip.
Why use pip over easy_install? Doesn't the fault lie with PyPI and package authors mostly? If an author uploads crap source tarball (eg: missing files, no setup.py) to PyPI, then both pip and easy_install will fail. Other than cosmetic differences, why do Python people (like in the above tweet) seem to strongly favor pip over easy_install?
(Let's assume that we're talking about easy_install from the Distribute package, that is maintained by the community)

From Ian Bicking's own introduction to pip:
pip was originally written to improve on easy_install in the following ways
All packages are downloaded before installation. Partially-completed installation doesn’t occur as a result.
Care is taken to present useful output on the console.
The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
Error messages should be useful.
The code is relatively concise and cohesive, making it easier to use programmatically.
Packages don’t have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
Native support for other version control systems (Git, Mercurial and Bazaar)
Uninstallation of packages.
Simple to define fixed sets of requirements and reliably reproduce a set of packages.

Many of the answers here are out of date for 2015 (although the initially accepted one from Daniel Roseman is not). Here's the current state of things:
Binary packages are now distributed as wheels (.whl files)—not just on PyPI, but in third-party repositories like Christoph Gohlke's Extension Packages for Windows. pip can handle wheels; easy_install cannot.
Virtual environments (which come built-in with 3.4, or can be added to 2.6+/3.1+ with virtualenv) have become a very important and prominent tool (and recommended in the official docs); they include pip out of the box, but don't even work properly with easy_install.
The distribute package that included easy_install is no longer maintained. Its improvements over setuptools got merged back into setuptools. Trying to install distribute will just install setuptools instead.
easy_install itself is only quasi-maintained.
All of the cases where pip used to be inferior to easy_install—installing from an unpacked source tree, from a DVCS repo, etc.—are long-gone; you can pip install ., pip install git+https://.
pip comes with the official Python 2.7 and 3.4+ packages from python.org, and a pip bootstrap is included by default if you build from source.
The various incomplete bits of documentation on installing, using, and building packages have been replaced by the Python Packaging User Guide. Python's own documentation on Installing Python Modules now defers to this user guide, and explicitly calls out pip as "the preferred installer program".
Other new features have been added to pip over the years that will never be in easy_install. For example, pip makes it easy to clone your site-packages by building a requirements file and then installing it with a single command on each side. Or to convert your requirements file to a local repo to use for in-house development. And so on.
The only good reason that I know of to use easy_install in 2015 is the special case of using Apple's pre-installed Python versions with OS X 10.5-10.8. Since 10.5, Apple has included easy_install, but as of 10.10 they still don't include pip. With 10.9+, you should still just use get-pip.py, but for 10.5-10.8, this has some problems, so it's easier to sudo easy_install pip. (In general, easy_install pip is a bad idea; it's only for OS X 10.5-10.8 that you want to do this.) Also, 10.5-10.8 include readline in a way that easy_install knows how to kludge around but pip doesn't, so you also want to sudo easy_install readline if you want to upgrade that.

Another—as of yet unmentioned—reason for favoring pip is because it is the new hotness and will continue to be used in the future.
The infographic below—from the Current State of Packaging section in the The Hitchhiker's Guide to Packaging v1.0—shows that setuptools/easy_install will go away in the future.
Here's another infographic from distribute's documentation showing that Setuptools and easy_install will be replaced by the new hotness—distribute and pip. While pip is still the new hotness, Distribute merged with Setuptools in 2013 with the release of Setuptools v0.7.

Two reasons, there may be more:
pip provides an uninstall command
if an installation fails in the middle, pip will leave you in a clean state.

REQUIREMENTS files.
Seriously, I use this in conjunction with virtualenv every day.
QUICK DEPENDENCY MANAGEMENT TUTORIAL, FOLKS
Requirements files allow you to create a snapshot of all packages that have been installed through pip. By encapsulating those packages in a virtualenvironment, you can have your codebase work off a very specific set of packages and share that codebase with others.
From Heroku's documentation https://devcenter.heroku.com/articles/python
You create a virtual environment, and set your shell to use it. (bash/*nix instructions)
virtualenv env
source env/bin/activate
Now all python scripts run with this shell will use this environment's packages and configuration. Now you can install a package locally to this environment without needing to install it globally on your machine.
pip install flask
Now you can dump the info about which packages are installed with
pip freeze > requirements.txt
If you checked that file into version control, when someone else gets your code, they can setup their own virtual environment and install all the dependencies with:
pip install -r requirements.txt
Any time you can automate tedium like this is awesome.

pip won't install binary packages and isn't well tested on Windows.
As Windows doesn't come with a compiler by default pip often can't be used there. easy_install can install binary packages for Windows.

UPDATE: setuptools has absorbed distribute as opposed to the other way around, as some thought. setuptools is up-to-date with the latest distutils changes and the wheel format. Hence, easy_install and pip are more or less on equal footing now.
Source: http://pythonhosted.org/setuptools/merge-faq.html#why-setuptools-and-not-distribute-or-another-name

As an addition to fuzzyman's reply:
pip won't install binary packages and isn't well tested on Windows.
As Windows doesn't come with a compiler by default pip often can't be
used there. easy_install can install binary packages for Windows.
Here is a trick on Windows:
you can use easy_install <package> to install binary packages to avoid building a binary
you can use pip uninstall <package> even if you used easy_install.
This is just a work-around that works for me on windows.
Actually I always use pip if no binaries are involved.
See the current pip doku: http://www.pip-installer.org/en/latest/other-tools.html#pip-compared-to-easy-install
I will ask on the mailing list what is planned for that.
Here is the latest update:
The new supported way to install binaries is going to be wheel!
It is not yet in the standard, but almost. Current version is still an alpha: 1.0.0a1
https://pypi.python.org/pypi/wheel
http://wheel.readthedocs.org/en/latest/
I will test wheel by creating an OS X installer for PySide using wheel instead of eggs. Will get back and report about this.
cheers - Chris
A quick update:
The transition to wheel is almost over. Most packages are supporting wheel.
I promised to build wheels for PySide, and I did that last summer. Works great!
HINT:
A few developers failed so far to support the wheel format, simply because they forget to
replace distutils by setuptools.
Often, it is easy to convert such packages by replacing this single word in setup.py.

Just met one special case that I had to use easy_install instead of pip, or I have to pull the source codes directly.
For the package GitPython, the version in pip is too old, which is 0.1.7, while the one from easy_install is the latest which is 0.3.2.rc1.
I'm using Python 2.7.8. I'm not sure about the underlay mechanism of easy_install and pip, but at least the versions of some packages may be different from each other, and sometimes easy_install is the one with newer version.
easy_install GitPython

Python package install using pip or easy_install from repos

The simplest way to deal with python package installations, so far, to me, has been to check out the source from the source control system and then add a symbolic link in the python dist-packages folder.
Clearly since source control provides the complete control to downgrade, upgrade to any branch, tag, it works very well.
Is there a way using one of the Package installers (easy_install or pip or other), one can achieve the same.
easy_install obtains the tar.gz and install them using the setup.py install which installs in the dist-packages folder in python2.6. Is there a way to configure it, or pip to use the source version control system (SVN/GIT/Hg/Bzr) instead.

Using pip this is quite easy. For instance:
pip install -e hg+http://bitbucket.org/andrewgodwin/south/#egg=South
Pip will automatically clone the source repo and run "setup.py develop" for you to install it into your environment (which hopefully is a virtualenv). Git, Subversion, Bazaar and Mercurial are all supported.
You can also then run "pip freeze" and it will output a list of your currently-installed packages with their exact versions (including, for develop-installs, the exact revision from the VCS). You can put this straight into a requirements file and later run
pip install -r requirements.txt
to install that same set of packages at the exact same versions.

If you download or check out the source distribution of a package — the one that has its "setup.py" inside of it — then if the package is based on the "setuptools" (which also power easy_install), you can move into that directory and say:
$ python setup.py develop
and it will create the right symlinks in dist-packages so that the .py files in the source distribution are the ones that get imported, rather than copies installed separately (which is what "setup.py install" would do — create separate copies that don't change immediately when you edit the source code to try a change).
As the other response indicates, you should try reading the "setuptools" documentation to learn more. "setup.py develop" is a really useful feature! Try using it in combination with a virtualenv, and you can "setup.py develop" painlessly and without messing up your system-wide Python with packages you are only developing on temporarily:
http://pypi.python.org/pypi/virtualenv

easy_install has support for downloading specific versions. For example:
easy_install python-dateutil==1.4.0
Will install v1.4, while the latest version 1.4.1 would be picked if no version was specified.
There is also support for svn checkouts, but using that doesn't give you much benefits from your manual version. See the manual for more information above.
Being able to switch to specific branches is rarely useful unless you are developing the packages in question, and then it's typically not a good idea to install them in site-packages anyway.

easy_install accepts a URL for the source tree too. Works at least when the sources are in Subversion.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.