do PIP and Conda have conflicts?

do PIP and Conda have conflicts? - python

In some documents and resources, it has been noted that installing packages using pip after installing conda could cause some problems like crashing conda and so on. could anybody tell me when and in which cases is that happening? I tried pip for some packages due to the smallness of conda's repository and I found no problem using both together.
I'm using pip 21.1.2 and conda 4.10.1.

The point is that pip is less strict wrt. to dependency checks, so it's clearly better to prioritize conda installs over pip:
"Pip and conda differ in how dependency relationships within an environment are fulfilled. When installing packages, pip installs dependencies in a recursive, serial loop. No effort is made to ensure that the dependencies of all packages are fulfilled simultaneously. This can lead to environments that are broken in subtle ways, if packages installed earlier in the order have incompatible dependency versions relative to packages installed later in the order. In contrast, conda uses a satisfiability (SAT) solver to verify that all requirements of all packages installed in an environment are met. This check can take extra time but helps prevent the creation of broken environments." (https://www.anaconda.com/blog/understanding-conda-and-pip)
However, some packages are only available via PyPI, and you have to resort to pip. In that case stick to the best practices as recommended in the conda docs.
(https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#using-pip-in-an-environment)
So far I never experienced any inconsistencies with conda and pip installs myself, and when you are a serious programmer and have your test suites properly set up, you'll anyhow get aware of issues well in advance and before they can cause any trouble.

Related

How can we conda install packages have been downloaded successfully, ignoring download failed packages?

everyone:
Because of the speed of network, when I conda install some packages, there will exist some related packages can not be downloaded completely. But we can not install packages have been downloaded successfully without other "related" packages(maybe "related" means the best march in version, but not necessary).
For example, When I install pytorch, it need numpy-1.14.2, but I am with numpy-1.15.1. I don't need verson 1.14.2 numpy in practice.
So I am a little confused how to make "conda" trying to install packages have been downloaded successfully, ignoring download failed packages?
Thanks!
EricKani

From the conda documentation there are two options that may help https://docs.conda.io/projects/conda/en/latest/commands/install.html
--no-update-deps
Do not update or change already-installed dependencies.
--no-deps
Do not install, update, remove, or change dependencies. This WILL lead to broken environments and inconsistent behavior. Use at your own
risk.
I believe by default conda tries with --no-update-deps first and then if that fails tries to update deps; giving it that option will make sure some version of each needed package is installed, if not necessarily the latest.
You could try --no-deps as well, which will literally prevent conda from installinh ANYTHING other than the exact packages you tell it to, but things may not work with that.

Poetry - force install when versions are incompatible?

Poetry has a very good version solver, too good sometimes :) I'm trying to use poetry in a project that uses two incompatible packages. However they are incompatible only by declaration as one of them is no longer developed, but otherwise they work together just fine.
With pip I'm able to install these in one environment (with an error printed) and it works. Poetry will declare that the dependencies versions can't be resolved and refuse to install anything.
Is there a way to force poetry to install these incompatible dependencies? Thank you!

No.
Alternative solutions might be:
contacting the offending package's maintainers and asking for a fix + release
forking the package and releasing a fix yourself
vendoring the package in your source code - there is no need to install it if it's already there, and many of the usual downsides to vendoring disappear if the project in question is not maintained anymore
installing the package by hand post poetry install with an installer that has the option to ignore the dependency resolver, like pip (similar to what you're already doing)

What to do when pip & conda overlap?

I have a reasonable understanding of the difference between conda install & pip install; How pip installs python only packages & conda can install non-python binaries. However, there is some overlap between these two. Which leads me to ask:
What's the rule of thumb for whether to use conda or pip when both offer a package?
For example, TensorFlow is available on both repositories but from the tensorflow docs:
within Anaconda, we recommend installing TensorFlow with the
pip install command, not with the conda install command.
But, there are many other packages that overlap, like numpy, scipy etc.
However, this Stackoverflow answer suggests that conda install should be the default & pip should only be used if a package is unavailable from conda. Is this true even for TensorFlow or other python-only packages?

The Tensorflow maintainers actually publish the wheels of TensorFlow on PyPI that's why it's the recommended official way. The conda packages are created by the Anaconda staff and/or the community. That doesn't mean the conda packages are bad, it just means that the TensorFlow maintainers don't participate there (officially). Basically they are just saying: "If you have trouble installing it with pip the TensorFlow devs will try to help you. But we don't officially support the conda packages so if something goes wrong with the conda package you need to ask the conda-package maintainers. You've been warned."
In the more general case:
For Python-only packages you should always use conda install. There might be exceptions, for example if there is no conda-package at all or the conda package is out-of-date (and nobody is releasing a new version of that package) and you really need that package/version.
However it's different for packages that require compilation (e.g. C-Extensions, etc.). It's different because with pip you can install a package either:
as pre-compiled wheel
as package, compiled on your computer
While conda just provides the
compiled conda package
With compiled packages you have to be careful with binary compatibility. That means that a package is compiled against specific binary interface of another library - which could depend on the version of the libraries or the compilation flags, etc.
With conda you have to take the package as-is, which means that you have to assume that the packages are binary-compatible. If they aren't it won't work (segfault or linking errors or whatever).
If you use pip and can choose which wheel (if any) to install or compile it against the available libraries on your computer. That means it's less likely that you get a binary-incompatibility. That is (or was) a big problem if you install conda packages from different conda-channels. Because they might simply be binary-incompatible (e.g. conda-forge and the anaconda-channel have or had a few problems there).
However it should probably be decided on a case-by-case basis. I had no problems with my tensorflow conda environment where I installed all packages from the conda-forge channel, including tensorflow. However I have heard that several people had trouble with tensorflow in mixed conda-forge and anaconda channel environments. For example NumPy from the main channel and TensorFlow from the conda-forge channel might just be binary-incompatible.
My rule of thumb is:
If it's a Python-only package just install it (it's unlikely to make trouble). Use the conda package when possible but it won't make (much) trouble if you use pip. If you install it using pip it's not managed by conda so it's possible it won't be recognized as available dependency and you have to update it yourself, but that's about all the difference.
If it's a compiled package (like a C extension or a wrapper around a C library or such like) it becomes a bit more complicated. If you want to be "careful" or you have reason to expect problems:
Always create a new environment if you need to test compiled packages from different channels and/or conda and pip. It's easy to discard a messed up conda environment but it's a lot more annoying to fix an environment that you depend on.
If possible install all compiled packages using conda install from one and only one channel (if possible the main anaconda channel).
If not possible try to mix main anaconda channel compiled packages with conda packages from a different channel.
If that doesn't work try to mix conda compiled packages and pip compiled-packages (pre-compiled wheels or self-compiled installers).
You asked about why you cannot install packages from PyPI with conda. I don't know the exact reasons but pip mostly provides the package and you have to install it yourself. With conda you get an already compiled and installed package that is just "copied" without installation. That requires that the package is installed on different operating systems (Mac, Windows, Linux) and on platforms (32-bit, 64-bit), against different Python versions (2.7, 3.5, 3.6) and possibly against different NumPy versions. That means conda has to provide several packages instead of just one. That takes resources (space for the final installed packages and time for the installation) which probably aren't available or feasible. Aside from that there is probably no converter for a pypi package to a conda recipe aside from all the specifics you have to know about a package (compilation, installation) to make it work. That's just my guess though.

Does Conda replace the need for virtualenv?

I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?

Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>

Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.

I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.

Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.

Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.

Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org

Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv

1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip

Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.

I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.

applying debian packages instead of pip packages

I've a strange problem with co-existence of debian package and pip package. For example, I've python-requests (deb version 0.8.2) installed. Then when i install the requests (pip version 2.2.1), the system only apply the deb version instead of pip new version. Does anyone can resolve this problem? Thank you in advance.

In regard to installing python packages by system packages and pip, you have to define clear plan.
Personally, I follow these rules:
Install only minimal set of python packages by system installation packages
There I include supervisord in case, I am not on too old system.
Do not install pip or virtualenv by system package.
Especially with pip in last year there were many situations, when system packages were far back behind what was really needed.
Use Virtualenv and prefer to install packages (by pip) in here
This will keep your system wide Python rather clean. It takes a moment to get used, but it is rather easy to follow, especially, if you use virtualenvwrapper which helps a lot during development.
Prepare conditions for quick installation of compiled packages
Some packages require compilation and this often fails on missing dependencies.
Such packages include e.g. lxml, pyzmq, pyyaml.
Make sure, which ones you are going to use, prepare packages in the system and you are able to install them into virtualenv.
Fine-tuning speed of installation of compiled packages
There is great package format (usable by pip) called wheel. This allows to install a package (like lxml) to install on the same platform within fraction of a second (compared to minutes of compilation). See my answer at SO on this topic

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.