Creating Anaconda environment satisfying prerequisites below - python

I have installed conda 4.5.12 and managed to install an environment with a .yml file flawlessly.
Now I need to set an up environment to supply myself a simulation of this project in here.
In it's prerequisites list there are these components;
Python 2: HDF5, OpenCV 2 interfaces for python.
C++: HDF5, OpenCV 2, Boost
Lua JIT and Torch 7.
Torch 7 packages: class, GPU support cunn and cutorch, Matlab support mattorch, JSON support lunajson, Torch image library image
Please note that mattorch is an outdated packages which is no longer maintained.
So my question is mainly whether I can generate a yaml file to cover up this list & create an virtual environment to start up development while researching about.
You can find the Git Hub branch in [HERE]

If all these packages are available on conda or pipy, then you can indeed just make a yml and install that.
From personal experience, it's often a good idea to gradually add packages and often test installing the conda environment. In this way, you can better identify if a dependency conflict arrises and you need to set some manual versions.

Related

How to create conda environment with locally compiled python?

How should one create a new environment with a locally compiled/built python? All the information in the internet guides are about installing a python with a specific version, or installing local packages.
Any idea how to install a custom built python? Should I build a conda package and install it locally? (-c)
Thanks!
It would have to be built as a Conda package, otherwise other Conda packages will not respect it as satisfying the python dependency requirement. The Conda Forge recipe for Python is licensed under BSD-3, so that may be a good starting point. It is a bit complicated, but not sure there is a way to make the compilation trivial.

Why use both conda and pip? [duplicate]

This question already has answers here:
What is the difference between pip and conda?
(14 answers)
Closed 2 years ago.
In this article, the author suggests the following
To install fuzzy matcher, I found it easier to conda install the
dependencies (pandas, metaphone, fuzzywuzzy) then use pip to install
fuzzymatcher. Given the computational burden of these algorithms you
will want to use the compiled c components as much as possible and
conda made that easiest for me.
Can someone explain why he is suggesting to use Conda to install dependencies and then use pip to install the actual package i.e fuzzymatcher? Why can't we just use Conda for both? Also, how do we know if we are using the compiled C packages as he suggested?
Other answers have addressed how Conda does a better job managing non-Python dependencies. As for why use Pip at all, in this case it's not complicated: the fuzzymatcher package was not available on Conda when the article was written (18 Feb 2020). The first and only version of the package was uploaded on Conda Forge on 1 Dec 2020.
Unless one wants an version older (< 0.0.5), one can now just use Conda. Going forward, Conda Forge's Autotick Bot will automatically submit pull requests and build any new versions of the package whenever they get pushed to PyPI.
conda is the package manager (installer and uninstaller) for Anaconda or Miniconda.
pip is the package manager for Python.
Depend on your system environment and additional settings, pip and conda may install onto the same Python installation folder ($PYTHONPATH/Lib/site-packages or %PYTHONPATH%\Lib\site-packages). Hence both conda and pip usually work well together.
However, conda and pip get their Python packages from different channels or websites.
conda searches and downloads from the official channel: https://repo.anaconda.com/pkgs/
This packages are supported officially by Anaconda and hence maintained in that channel.
However, we may not find every Python packages or packages of newer versions than those in the official channel. That is why sometimes we may install Python packages from "conda-forge" or "bioconda". These are the unofficial channels maintained by developers and other friendly users.
We could specify other channel like these:
conda install <package1> --channel conda-forge
conda install <package2> --channel bioconda
pip searches and download from pypi
We should be able to download every publicly available Python packages there.
These packages are generated and uploaded by developers and friendly users.
The dependency setting in each package may not be fully tested nor verified.
These packages may not support older or newer version of Python.
Hence, if you are using Anaconda or Miniconda, you should use conda. If you could not find specific packages from the official channels, you may try conda-forge or bioconda. Finally get it from pypi.
However, if you do not use Anaconda, then stick with pip.
For advanced users, you may download the most latest libraries from their source (such as github, gitlab, etc.) However there is a catch:
Some Python packages are written in pure Python. In this case, you should not have issue to install these packages into your system.
Some Python packages are written in C, C++, Go, etc. In this case, you would need
A supported compiler for your system as well as your Python environment (32- or 64-bit, versions).
Python header files, linkable Python libraries and archives specific for your installed Python version. Anaconda includes these in its installation.
How do we know if a Python package needs a particular compiler?
It may not be easy for people to find out. However, you could find out in the following means (possibly order):
Look at the landing page (or README.nd or README.txt files) in the source repository.
For example, if you go to Pandas's source repository, it show that it needs cython, hence the installation would need a C compiler.
Look at the setup.py in the source repository.
For example, if you go to numpy's setup.py, it needs a C compiler.
Look at the amount of source code that are written using programming languages that need compilation (such as C, C++, Go, etc.) For example, numpy library is written using 35.7% of C, 1.0% of C++, etc. However, this is only a guide as these source code may be only testing routines.
Ask in stackoverflow.
For the compiled C packages, you could import a package, see where it's located, and check the package itself to see what it imports. At some point, you would read into an import of a compiled module (.so extension on *nix). There's possibly an easier way, but that may depend on at what point in the import sequence of the package the compiled module is loaded.
Fuzzymatcher may not be available through Conda, or only an outdated version, or only a version that matches an outdated set of dependencies. Then you may end up with an out-of-date set of packages. Pip may have a more recent version of fuzzymatcher, and likely cares less (for better or worse) on the versions of various other packages in your environment. I'm not familiar with fuzzymatcher, so I can't give you an exact reason: you'd have to ask the author.
Note that the point of that paragraph, on installing the necessary packages with Conda, is that some packages require (C) libraries (not necessary compiled packages, though these will depend on these libraries) that may not be installed by default on your system. Conda will install these for you; Pip will not.

Why using Anaconda environments to install tensorflow on Windows?

In tensorflow installation guide it is said, that I should use "environment" to install tensorflow: https://www.tensorflow.org/install/install_windows#installing_with_anaconda
Why? Can't I just install with pip?
If installed with environment, should I "activate" it each time I use tensorflow?
If I use tensorflow from within other thing like keras and/or PyCharm, then how can I activate environment?
The question is about Windows. I assume you installed python using anaconda. Then you have a default environment, called root. You can create as many environments as you want, think of each as a separate installation of python. Using conda or pip installs stuff at your current installation. Conda stuff is kind of pre-compiled to work with your machine/anaconda environment, while pip stuff is usually compiled on the spot. I assume compiling tensorflow might not be completely trivial...
'Activate' changes from one environment to the other, so unless you have multiple environments you shouldn't need it. You run all these on command prompt.
Bottom line is, unless you have multiple environments (I highly recommend it so you can try different things) I cannot see you using activate. Install tensorflow and keras on the same one and only root environment you have. You should be able to access both (it is also possible just installing keras would install tensorflow, if its a dependancy)
If you see no prompt, it is the default, root environment. You can see all your environments with: conda info --envs But unless you create some environment (using e.g. conda create --name py Python=2) you probably only have root. One of the nice things with environments is you can have one with Python=2 (latest python 2), one with Python=3, another with Python=2.7 etc
On your follow-up, If you have multiple environments, you can switch between them on Pycharm by changing the interpreter. On the image you see me selecting e.g. py2_olv
Professional answer:
Quote from https://machinelearningspace.com/installing-tensorflow-2-0-in-anaconda-environment/:
What is Anaconda and why I recommend it?
...
[dropped intro to Anaconda]
...
For a Python developer or a data science researcher, using Anaconda
has a lot of advantages, such as independently installing/updating
packages without ruining the system. So, we no need to worry about the
system library or anything like that. This can save time and energy
for other things.
Anaconda can be used across different platforms, Windows, macOS, and
Linux. If we want to use a different Python version or package
libraries, just create a different environment and play around without
any risk of crashing the system library.
####
Unprofessional research:
Now in addition my own research. I am not a professional, I have little knowledge of the seemingly chaotic world of different install methods. This refers to some first research at https://superuser.com/questions/1572640/do-i-need-to-install-cuda-separately-after-installing-the-nvidia-display-driver/1572762#1572762. Mind that I am guessing a lot here. Please comment if I am wrong.
We see that at the moment, Pytorch supports version 10.2, Tensorflow supports 10.1, and it is not just the version that differs: mind that "CUDA Toolkit" (standalone) and cudatoolkit (conda binary install) are different! One is a a standalone / executable install, the other is a binary install. And tensorflow needs tensorflow-gpu to reach the standalone cuda install.
Therefore you should consider a separate environment for both Tensorflow and Pytorch, since any update of the conda cudatoolkit to version 11.0 could harm the dependency condition of Pytorch (Though this is not completely right. Pytorch uses a cuda that is installed inside Pytorch. It is still the approach to understand the recommended different envs). For tensorflow, you have to install version CUDA Toolkit 10.1 although 11.0 is already available, so that your whole card must run on a lower version than possible only to support Tensorflow - even if some games would like to have version 11.0.
Unprofessional answer:
If all of the dependencies are so important and so easily wrong when updated separately, like you could do with pip, any install that you do by yourself using pip might crash your sensitive tensorflow install. Therefore it is recommended to keep to a full service approach which Anaconda offers, where all dependencies are kept right, even if you enter conda install --all. That is why you better search for an Anaconda guide, for example https://machinelearningspace.com/installing-tensorflow-2-0-in-anaconda-environment/.
If you would have read through the entire document, it would have stated that the Anaconda installation is community supported, not officially supported. They want you to install TensorFlow using native pip through Python 3.5.x. That being said, from personal experience, I will tell you that if you are looking to run basic level TensorFlow Python scripts, such as training and testing an MNIST model, a Windows installation will be fine, or using a model that has already been trained for some purpose will also be fine. However, if you want to train advanced models such as Inception, which are the state-of-the-art image classifiers with less than 5% error for normal images, Windows is not suitable. You should try using Linux installation for any training purposes. I would recommend using VirtualBox, having used it in the past.
As for activating the environment, as long as, in any script / in the bash, you include the line "import tensorflow as tf", you should be fine, at least for native pip installation.
Good luck!

Does Conda replace the need for virtualenv?

I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?
Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>
Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.
I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.
Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.
Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.
Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org
Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv
1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip
Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.
I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.

What is the difference between pip and conda?

I know pip is a package manager for python packages. However, I saw the installation on IPython's website use conda to install IPython.
Can I use pip to install IPython? Why should I use conda as another python package manager when I already have pip?
What is the difference between pip and conda?
Quoting from the Conda blog:
Having been involved in the python world for so long, we are all aware of pip, easy_install, and virtualenv, but these tools did not meet all of our specific requirements. The main problem is that they are focused around Python, neglecting non-Python library dependencies, such as HDF5, MKL, LLVM, etc., which do not have a setup.py in their source code and also do not install files into Python’s site-packages directory.
So Conda is a packaging tool and installer that aims to do more than what pip does; handle library dependencies outside of the Python packages as well as the Python packages themselves. Conda also creates a virtual environment, like virtualenv does.
As such, Conda should be compared to Buildout perhaps, another tool that lets you handle both Python and non-Python installation tasks.
Because Conda introduces a new packaging format, you cannot use pip and Conda interchangeably; pip cannot install the Conda package format. You can use the two tools side by side (by installing pip with conda install pip) but they do not interoperate either.
Since writing this answer, Anaconda has published a new page on Understanding Conda and Pip, which echoes this as well:
This highlights a key difference between conda and pip. Pip installs Python packages whereas conda installs packages which may contain software written in any language. For example, before using pip, a Python interpreter must be installed via a system package manager or by downloading and running an installer. Conda on the other hand can install Python packages as well as the Python interpreter directly.
and further on
Occasionally a package is needed which is not available as a conda package but is available on PyPI and can be installed with pip. In these cases, it makes sense to try to use both conda and pip.
Disclaimer: This answer describes the state of things as it was a decade ago, at that time pip did not support binary packages. Conda was specifically created to better support building and distributing binary packages, in particular data science libraries with C extensions. For reference, pip only gained widespread support for portable binary packages with wheels (pip 1.4 in 2013) and the manylinux1 specification (pip 8.1 in March 2016). See the more recent answer for more history.
Here is a short rundown:
pip
Python packages only.
Compiles everything from source. EDIT: pip now installs binary wheels, if they are available.
Blessed by the core Python community (i.e., Python 3.4+ includes code that automatically bootstraps pip).
conda
Python agnostic. The main focus of existing packages are for Python, and indeed Conda itself is written in Python, but you can also have Conda packages for C libraries, or R packages, or really anything.
Installs binaries. There is a tool called conda build that builds packages from source, but conda install itself installs things from already built Conda packages.
External. conda is an environment and package manager. It is included in the Anaconda Python distribution provided by Continuum Analytics (now called Anaconda, Inc.).
conda is an environment manager written in Python and is language-agnostic. conda environment management functions cover the functionality provided by venv, virtualenv, pipenv, pyenv, and other Python-specific package managers. You could use conda within an existing Python installation by pip installing it (though this is not recommended unless you have a good reason to use an existing installation). As of 2022, conda and pip are not fully aware of one another package management activities within a virtual environment, not are they interoperable for Python package management.
In both cases:
Written in Python
Open source (conda is BSD and pip is MIT)
Warning: While conda itself is open-source, the package repositories are hosted by Anaconda Inc and have restrictions around commercial usage.
The first two bullet points of conda are really what make it advantageous over pip for many packages. Since pip installs from source, it can be painful to install things with it if you are unable to compile the source code (this is especially true on Windows, but it can even be true on Linux if the packages have some difficult C or FORTRAN library dependencies). conda installs from binary, meaning that someone (e.g., Continuum) has already done the hard work of compiling the package, and so the installation is easy.
There are also some differences if you are interested in building your own packages. For instance, pip is built on top of setuptools, whereas conda uses its own format, which has some advantages (like being static, and again, Python agnostic).
The other answers give a fair description of the details, but I want to highlight some high-level points.
pip is a package manager that facilitates installation, upgrade, and uninstallation of python packages. It also works with virtual python environments.
conda is a package manager for any software (installation, upgrade and uninstallation). It also works with virtual system environments.
One of the goals with the design of conda is to facilitate package management for the entire software stack required by users, of which one or more python versions may only be a small part. This includes low-level libraries, such as linear algebra, compilers, such as mingw on Windows, editors, version control tools like Hg and Git, or whatever else requires distribution and management.
For version management, pip allows you to switch between and manage multiple python environments.
Conda allows you to switch between and manage multiple general purpose environments across which multiple other things can vary in version number, like C-libraries, or compilers, or test-suites, or database engines and so on.
Conda is not Windows-centric, but on Windows it is by far the superior solution currently available when complex scientific packages requiring compilation are required to be installed and managed.
I want to weep when I think of how much time I have lost trying to compile many of these packages via pip on Windows, or debug failed pip install sessions when compilation was required.
As a final point, Continuum Analytics also hosts (free) binstar.org (now called anaconda.org) to allow regular package developers to create their own custom (built!) software stacks that their package-users will be able to conda install from.
(2021 UPDATE)
TL;DR Use pip, it's the official package manager since Python 3.
pip
basics
pip is the default package manager for python
pip is built-in as of Python 3.0
Usage: python3 -m venv myenv; source myenv/bin/activate; python3 -m pip install requests
Packages are downloaded from pypi.org, the official public python repository
It can install precompiled binaries (wheels) when available, or source (tar/zip archive).
Compiled binaries are important because many packages are mixed Python/C/other with third-party dependencies and complex build chains. They MUST be distributed as binaries to be ready-to-use.
advanced
pip can actually install from any archive, wheel, or git/svn repo...
...that can be located on disk, or on a HTTP URL, or a personal pypi server.
pip install git+https://github.com/psf/requests.git#v2.25.0 for example (it can be useful for testing patches on a branch).
pip install https://download.pytorch.org/whl/cpu/torch-1.9.0%2Bcpu-cp39-cp39-linux_x86_64.whl (that wheel is Python 3.9 on Linux).
when installing from source, pip will automatically build the package. (it's not always possible, try building TensorFlow without the google build system :D)
binary wheels can be python-version specific and OS specific, see manylinux specification to maximize portability.
conda
You are NOT permitted to use Anaconda or packages from Anaconda repositories for commercial use, unless you acquire a license.
Conda is a third party package manager from conda.
It's popularized by anaconda, a Python distribution including most common data science libraries ready-to-use.
You will use conda when you use anaconda.
Packages are downloaded from the anaconda repo.
It only installs precompiled packages.
Conda has its own format of packages. It doesn't use wheels.
conda install to install a package.
conda build to build a package.
conda can build the python interpreter (and other C packages it depends on). That's how an interpreter is built and bundled for anaconda.
conda allows to install and upgrade the Python interpreter (pip does not).
advanced
Historically, the selling point of conda was to support building and installing binary packages, because pip did not support binary packages very well (until wheels and manylinux2010 spec).
Emphasis on building packages. Conda has extensive build settings and it stores extensive metadata, to work with dependencies and build chains.
Some projects use conda to initiate complex build systems and generate a wheel, that is published to pypi.org for pip.
easy_install/egg
For historical reference only. DO NOT USE
egg is an abandoned format of package, it was used up to mid 2010s and completely replaced by wheels.
an egg is a zip archive, it contains python source files and/or compiled libraries.
eggs are used with easy_install and the first releases of pip.
easy_install was yet another package manager, that preceded pip and conda. It was removed in setuptools v58.3 (year 2021).
it too caused a lot of confusion, just like pip vs conda :D
egg files are slow to load, poorly specified, and OS specific.
Each egg was setup in a separate directory, an import mypackage would have to look for mypackage.py in potentially hundreds of directories (how many libraries were installed?). That was slow and not friendly to the filesystem cache.
Historically, the above three tools were open-source and written in Python.
However the company behind conda updated their Terms of Service in 2020 to prohibit commercial usage, watch out!
Funfact: The only strictly-required dependency to build the Python interpreter is zlib (a zip library), because compression is necessary to load more packages. Eggs and wheels packages are zip files.
Why so many options?
A good question.
Let's delve into the history of Python and computers. =D
Pure python packages have always worked fine with any of these packagers. The troubles were with not-only-Python packages.
Most of the code in the world depends on C. That is true for the Python interpreter, that is written in C. That is true for numerous Python packages, that are python wrappers around C libraries or projects mixing python/C/C++ code.
Anything that involves SSL, compression, GUI (X11 and Windows subsystems), math libraries, GPU, CUDA, etc... is typically coupled with some C code.
This creates troubles to package and distribute Python libraries because it's not just Python code that can run anywhere. The library must be compiled, compilation requires compilers and system libraries and third party libraries, then once compiled, the generated binary code only works for the specific system and python version it was compiled on.
Originally, python could distribute pure-python libraries just fine, but there was little support for distributing binary libraries. In and around 2010 you'd get a lot of errors trying to use numpy or cassandra. It downloaded the source and failed to compile, because of missing dependencies. Or it downloaded a prebuilt package (maybe an egg at the time) and it crashed with a SEGFAULT when used, because it was built for another system. It was a nightmare.
This was resolved by pip and wheels from 2012 onward. Then wait many years for people to adopt the tools and for the tools to propagate to stable Linux distributions (many developers rely on /usr/bin/python). The issues with binary packages extended to the late 2010s.
For reference, that's why the first command to run is python3 -m venv myvenv && source myvenv/bin/activate && pip install --upgrade pip setuptools on antiquated systems, because the OS comes with an old python+pip from 5 years ago that's buggy and can't recognize the current package format.
Conda worked on their own solution in parallel. Anaconda was specifically meant to make data science libraries easy to use out-of-the-box (data science = C and C++ everywhere), hence they had to come up with a package manager specifically meant to address building and distributing binary packages, conda.
If you install any package with pip install xxx nowadays, it just works. That's the recommended way to install packages and it's built-in in current versions of Python.
Not to confuse you further,
but you can also use pip within your conda environment, which validates the general vs. python specific managers comments above.
conda install -n testenv pip
source activate testenv
pip <pip command>
you can also add pip to default packages of any environment so it is present each time so you don't have to follow the above snippet.
Quote from Conda for Data Science article onto Continuum's website:
Conda vs pip
Python programmers are probably familiar with pip to download packages from PyPI and manage their requirements. Although, both conda and pip are package managers, they are very different:
Pip is specific for Python packages and conda is language-agnostic, which means we can use conda to manage packages from any language
Pip compiles from source and conda installs binaries, removing the burden of compilation
Conda creates language-agnostic environments natively whereas pip relies on virtualenv to manage only Python environments
Though it is recommended to always use conda packages, conda also includes pip, so you don’t have to choose between the two. For example, to install a python package that does not have a conda package, but is available through pip, just run, for example:
conda install pip
pip install gensim
pip is a package manager.
conda is both a package manager and an environment manager.
Detail:
Dependency check
Pip and conda also differ in how dependency relationships within an environment are fulfilled. When installing packages, pip installs dependencies in a recursive, serial loop. No effort is made to ensure that the dependencies of all packages are fulfilled simultaneously. This can lead to environments that are broken in subtle ways, if packages installed earlier in the order have incompatible dependency versions relative to packages installed later in the order. In contrast, conda uses a satisfiability (SAT) solver to verify that all requirements of all packages installed in an environment are met. This check can take extra time but helps prevent the creation of broken environments. As long as package metadata about dependencies is correct, conda will predictably produce working environments.
References
Understanding Conda and Pip
Quoting from Conda: Myths and Misconceptions (a comprehensive description):
...
Myth #3: Conda and pip are direct competitors
Reality: Conda and pip serve different purposes, and only directly compete in a small subset of tasks: namely installing Python packages in isolated environments.
Pip, which stands for Pip Installs Packages, is Python's officially-sanctioned package manager, and is most commonly used to install packages published on the Python Package Index (PyPI). Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).
In short, pip is a general-purpose manager for Python packages; conda is a language-agnostic cross-platform environment manager. For the user, the most salient distinction is probably this: pip installs python packages within any environment; conda installs any package within conda environments. If all you are doing is installing Python packages within an isolated environment, conda and pip+virtualenv are mostly interchangeable, modulo some difference in dependency handling and package availability. By isolated environment I mean a conda-env or virtualenv, in which you can install packages without modifying your system Python installation.
Even setting aside Myth #2, if we focus on just installation of Python packages, conda and pip serve different audiences and different purposes. If you want to, say, manage Python packages within an existing system Python installation, conda can't help you: by design, it can only install packages within conda environments. If you want to, say, work with the many Python packages which rely on external dependencies (NumPy, SciPy, and Matplotlib are common examples), while tracking those dependencies in a meaningful way, pip can't help you: by design, it manages Python packages and only Python packages.
Conda and pip are not competitors, but rather tools focused on different groups of users and patterns of use.
For WINDOWS users
"standard" packaging tools situation is improving recently:
on pypi itself, there are now 48% of wheel packages as of sept. 11th 2015 (up from 38% in may 2015 , 24% in sept. 2014),
the wheel format is now supported out-of-the-box per latest python 2.7.9,
"standard"+"tweaks" packaging tools situation is improving also:
you can find nearly all scientific packages on wheel format at http://www.lfd.uci.edu/~gohlke/pythonlibs,
the mingwpy project may bring one day a 'compilation' package to windows users, allowing to install everything from source when needed.
"Conda" packaging remains better for the market it serves, and highlights areas where the "standard" should improve.
(also, the dependency specification multiple-effort, in standard wheel system and in conda system, or buildout, is not very pythonic, it would be nice if all these packaging 'core' techniques could converge, via a sort of PEP)
(2022 UPDATE) This answer was derived from the one above by #user5994461
You can use pip for package management. Pip is the official built-in package manager for Python.org since Python 3.
pip is not a virtual environment manager.
pip
basics
pip is the default package manager for python
pip is built-in as of Python 3.0
Usage: python3 -m venv myenv; source myenv/bin/activate; python3 -m pip install requests
Packages are downloaded from pypi.org, the official public python repository
It can install precompiled binaries (wheels) when available, or source (tar/zip archive).
Compiled binaries are important because many packages are mixed Python/C/other with third-party dependencies and complex build chains. They MUST be distributed as binaries to be ready-to-use.
advanced
pip can actually install from any archive, wheel, or git/svn repo...
...that can be located on disk, or on a HTTP URL, or a personal pypi server.
pip install git+https://github.com/psf/requests.git#v2.25.0 for example (it can be useful for testing patches on a branch).
pip install https://download.pytorch.org/whl/cpu/torch-1.9.0%2Bcpu-cp39-cp39-linux_x86_64.whl (that wheel is Python 3.9 on Linux).
when installing from source, pip will automatically build the package. (it's not always possible, try building TensorFlow without the google build system :D)
binary wheels can be python-version specific and OS specific, see manylinux specification to maximize portability.
conda
conda is an open source environment manager AND package manager maintained by the open source community. It is separate from Anaconda, Inc. and does not require a commercial license to use.
conda is also bundled into Anaconda Navigator, a popular commercial Python distribution from Anaconda, Inc. Anaconda) that includes most common data science and Python developer libraries ready-to-use.
You will use conda when you use Anaconda Navigator GUI.
Packages may be downloaded from conda-forge, anaconda repo4, and other public and private conda package "channels" (aka repos).
It only installs precompiled packages.
conda has its own package format. It doesn't use wheels.
conda install to install a package.
conda build to build a package.
conda can build the python interpreter (and other C packages it depends on). That's how an interpreter is built and bundled for Anaconda Navigator.
conda allows to install and upgrade the Python interpreter (pip does not).
advanced
Historically, one selling point of conda was to support building and installing binary packages, because pip did not support binary packages very well (until wheels and manylinux2010 spec).
Emphasis on building packages. conda has extensive build settings and it stores extensive metadata, to work with dependencies and build chains.
Some projects use conda to initiate complex build systems and generate a wheel, that is published to pypi.org for pip.
conda emphasizes building and managing virtual environments. conda is by design a programming language-agnostic virtual environment manager. conda can install and manage other package managers such as npm, pip, and other language package managers.
Can I use Anaconda Navigator packages for commercial use?
The new language states that use by individual hobbyists, students, universities, non-profit organizations, or businesses with less than 200 employees is allowed, and all other usage is considered commercial and thus requires a business relationship with Anaconda. (as of Oct 28, 2020)
IF you are a large developer organization, i.e., greater than 200 employees, you are NOT permitted to use Anaconda or packages from Anaconda repository for commercial use, unless you acquire a license.
Pulling and using (properly open-sourced) packages from conda-forge repository do not require commercial licenses from Anaconda, Inc. Developers are free to build their own conda packages using the packaging tools provided in the conda-forge infrastructure.
easy_install/egg
For historical reference only. DO NOT USE
egg is an abandoned format of package, it was used up to mid 2010s and completely replaced by wheels.
an egg is a zip archive, it contains python source files and/or compiled libraries.
eggs are used with easy_install and the first releases of pip.
easy_install was yet another package manager, that preceded pip and conda. It was removed in setuptools v58.3 (year 2021).
it too caused a lot of confusion, just like pip vs conda :D
egg files are slow to load, poorly specified, and OS specific.
Each egg was setup in a separate directory, an import mypackage would have to look for mypackage.py in potentially hundreds of directories (how many libraries were installed?). That was slow and not friendly to the filesystem cache.
Funfact: The only strictly-required dependency to build the Python interpreter is zlib (a zip library), because compression is necessary to load more packages. Eggs and wheels packages are zip files.
Why so many options?
A good question.
Let's delve into the history of Python and computers. =D
Pure python packages have always worked fine with any of these packagers. The troubles were with not-only-Python packages.
Most of the code in the world depends on C. That is true for the Python interpreter, that is written in C. That is true for numerous Python packages, that are python wrappers around C libraries or projects mixing python/C/C++ code.
Anything that involves SSL, compression, GUI (X11 and Windows subsystems), math libraries, GPU, CUDA, etc... is typically coupled with some C code.
This creates troubles to package and distribute Python libraries because it's not just Python code that can run anywhere. The library must be compiled, compilation requires compilers and system libraries and third party libraries, then once compiled, the generated binary code only works for the specific system and python version it was compiled on.
Originally, python could distribute pure-python libraries just fine, but there was little support for distributing binary libraries. In and around 2010 you'd get a lot of errors trying to use numpy or cassandra. It downloaded the source and failed to compile, because of missing dependencies. Or it downloaded a prebuilt package (maybe an egg at the time) and it crashed with a SEGFAULT when used, because it was built for another system. It was a nightmare.
This was resolved by pip and wheels from 2012 onward. Then wait many years for people to adopt the tools and for the tools to propagate to stable Linux distributions (many developers rely on /usr/bin/python). The issues with binary packages extended to the late 2010s.
For reference, that's why the first command to run is python3 -m venv myvenv && source myvenv/bin/activate && pip install --upgrade pip setuptools on antiquated systems, because the OS comes with an old python+pip from 5 years ago that's buggy and can't recognize the current package format.
Continuum Analytics (later renamed Anaconda, Inc.) worked on their own solution (released as Anaconda Navigator) in parallel. Anaconda Navigator was specifically meant to make data science libraries easy to use out-of-the-box (data science = C and C++ everywhere), hence they came up with a package manager specifically meant to address building and distributing binary packages, and built it into the environment manager, conda.
If you install any package with pip install xxx nowadays, it usually just works. pip is a recommended way to install packages that is built into current versions of Python.
To answer the original question,
For installing packages, PIP and Conda are different ways to accomplish the same thing. Both are standard applications to install packages. The main difference is the source of the package files.
PIP/PyPI will have more "experimental" packages, or newer, less common, versions of packages
Conda will typically have more well established packages or versions
An important cautionary side note: If you use both sources (pip and conda) to install packages in the same environment, this may cause issues later.
Recreate the environment will be more difficult
Fix package incompatibilities becomes more complicated
Best practice is to select one application, PIP or Conda, to install packages, and use that application to install any packages you need.
However, there are many exceptions or reasons to still use pip from within a conda environment, and vice versa.
For example:
When there are packages you need that only exist on one, and the
other doesn't have them.
You need a certain version that is only available in one environment
Can I use pip to install iPython?
Sure, both (first approach on page)
pip install ipython
and (third approach, second is conda)
You can manually download IPython from GitHub or PyPI. To install one
of these versions, unpack it and run the following from the top-level
source directory using the Terminal:
pip install .
are officially recommended ways to install.
Why should I use conda as another python package manager when I already have pip?
As said here:
If you need a specific package, maybe only for one project, or if you need to share the project with someone else, conda seems more appropriate.
Conda surpasses pip in (YMMV)
projects that use non-python tools
sharing with colleagues
switching between versions
switching between projects with different library versions
What is the difference between pip and conda?
That is extensively answered by everyone else.
pip is for Python only
conda is only for Anaconda + other scientific packages like R dependencies etc. NOT everyone needs Anaconda that already comes with Python. Anaconda is mostly for those who do Machine learning/deep learning etc. Casual Python dev won't run Anaconda on his laptop.
I may have found one further difference of a minor nature. I have my python environments under /usr rather than /home or whatever. In order to install to it, I would have to use sudo install pip. For me, the undesired side effect of sudo install pip was slightly different than what are widely reported elsewhere: after doing so, I had to run python with sudo in order to import any of the sudo-installed packages. I gave up on that and eventually found I could use sudo conda to install packages to an environment under /usr which then imported normally without needing sudo permission for python. I even used sudo conda to fix a broken pip rather than using sudo pip uninstall pip or sudo pip --upgrade install pip.

Categories

Resources