What does conda do when "solving environment"

What does conda do when "solving environment" - python

Whenever I run conda install/remove/update <package>, it tells me it's "Solving environment" for some time before telling me the list of things it's going to download/install/update. Presumably it's looking for dependencies for <package>, but why does it sometimes remove packages after doing this operation? For example, as I was trying to install Mayavi, it decided it needed to remove Anaconda Navigator.
Furthermore it does not provide an option to perform only a subset of the suggested operations. Is there a way to specify that I don't want a package removed?

You can add --debug option to the conda command and see the output from console(or terminal). For example, type conda update --debug numpy.
From the output, we can see that the client requests repodata.json from channel list and do some computation locally in the Solving Environment Step.

As a side note on the "Solving Environment" step...
Lack of administrator privileges may affect whether or where you can install python packages.
I observed that my installs would hang on the "Solving Environment" step and never get through when attempting to install packages while logged in as a non-administrator.
Getting switched to admin was possible for me on the machine I was stuck on, so I just did that and it solved the problem.
Commenter explains workaround when this is not possible.

JUST WAIT! I wasted hours trying to fix this. It turns out, it just took around 45 minutes :/

The short answer is: use mamba as a drop-in replacement for conda, it's much much faster at solving environments, no more waiting for minutes. mamba has been officially endorsed by the conda team.
Mamba also allows you to configure more precisely which packages you require to be installed and allows you to pin versions, as conda does. For a more detailed comparison of conda and mamba see this Stackoverflow answer: https://stackoverflow.com/a/68043228/7483211
The long answer is: Solving conda environments with more than a few packages that each have dependencies on their own quickly ends up becoming a quite complicated SAT problem (see Boolean satisfiability problem and dependency hell)
With good algorithms, even fairly big SAT problems can be solved fast. In contrast to mamba's solver which is written in C++ and designed to be fast, it seems that conda's solver is not very high performance. It worked well enough when people used small environments in the past, but with bigger and bigger environments, conda has started to struggle.
I made the switch about a year ago and I have not once looked back. The open source project I'm working for (Nextstrain) has also started to recommend mamba in place of conda for new users. I have not seen anyone advocating against using mamba in place of conda.

conda install --prune <<package>> helped me to install the right channel.
Suspecting environment used are for zipline and channel used not compatible with existing one. prune takes a lot of time but helped me in solving the environment issues.

Related

How to install and use PyPy for my Python script?

Let me start by clarifying that I am relatively new to coding and a noob when it comes to everything that goes beyond python coding. For a project of mine which involves simulation, I really need to decrease the running time. After some research I got the impression that using PyPy interpreter would be a possible solution for my problem.
I use Spyder & Anaconda and I have been trying some stuff to implement PyPy, but I have found that my understanding is not sufficient and it has been rather time consuming without success. I have also installed VScode, for which I did succeed in loading and using the PyPy interpreter. However, I need to use several packages that I use in my original Python script; pandas, numpy and scipy. If it is even possible to use these packages, I have no clue how to install these for PyPy.
I have read on this website that it is recommended to use conda forge (right?) for these situations, but my Anaconda is often buggy for some reason. It would be amazing if someone can give a step-by-step guide or some advise on how to tackle this problem, either in VScode or Spyder&Anaconda.
I have downloaded PyPy 7.3.9.
Thanks in advance.

The recommended way to get binary packages like NumPy and Pandas is to use the conda-forge packages via the conda. There is a blog post about it, the short version is
# create an environment
conda create -c conda-forge -n my-pypy-env pypy python=3.8
# activate it
conda activate my-pypy-env
#install some things
conda install numpy pandas
# run your script
python my_script.py
With that, PyPy will be no faster than CPython when running scripts that make heavy use of NumPy and Pandas data structures since they are written in C. In fact, the hoops PyPy must jump through to use these data structures in Python means that PyPy could be significantly slower. We have a plan for that called HPy, but it will take a while to happen.

How to check if python environment is stable and consistent

I recently ran into a problem with a Conda environments and I believe I solved it using this. Is there a way to check if what I did actually solved the problem. Can I check if there are still any inconsistencies?
This was meant to be a general question and I did not think it would be dependent on the actual issue I had. It all started when I was having issues with Keras and TensorFlow for GPU in a Conda environment. I came to the conclusion that something was corrupt (I couldn't even run basic Keras commands) and decided to try to update everything and if that didn't fix my issue I would reinstall the packages I needed in that environment. As soon as I tried to update TensorFlow for GPU, the first thing that appeared was:
The environment is inconsistent, please check the package plan carefully The following packages are causing the inconsistency:
Followed by a bunch of packages and version numbers. As I said before I believe I solved the inconsistency issues by force updating Conda but am uncertain how to check if the environment is now consistent.

Updating a specific module with Conda removes numerous packages

I have recently started using the Anaconda Python distribution as it offers a lot of Data Analysis libraries out of the box. And using conda to create environments and install packages is also a breeze. But I have faced some serious issues when I want to update Python itself or any other module, I am informed beforehand that a LOT of my existing libraries will be removed.
For example, this is what I get when I use conda update [package_name]
$ conda update pandas
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: C:\Users\User\Anaconda3
added / updated specs:
- matplotlib
The following packages will be REMOVED:
[Almost half of my existing packages]
The following packages will be UPDATED:
[Some packages including my desired library, in this case, pandas]
I have searched the web on how to update packages and Python using conda and almost everywhere I saw that conda update [package name] was suggested. But why doesn't it work for me? I mean it will work but at the expense of tons of important libraries that I need.
So I have tried using the Anaconda Navigator to update the desired libraries (like matplotlib and pandas) hoping that the removal of existing libraries might be a command line issue on my computer. But I had seriously messed up my base (root) environment by updating pandas using Navigator. I didn't get any warnings that a lot of my modules will be removed so I thought I was doing fine. But after the update was done and I wrote some matplotlib code, I wasn't able to run it. I got errors that resembled something that indicated matplotlib was a "non-conda module". So I had to do conda install --revision n to go back to a state where I had my modules.
Right now, the only way for me to update any package or Python is to do this:
conda install pandas=[package_version_that_is_higher_than_mine]
But there's got to be a reason why I am facing this issue. Any help is absolutely appreciated.
EDIT: It turns out that the issue is mainly when I am trying to update using the base environment. When I use my other conda environments, the conda update [package_name] or conda update --all works fine.

Anaconda (as distinct from Conda) is designed to be used as a fixed set of package builds that have been vetted for compatibility (see "What's in a Name? Clarifying the Anaconda Metapackage). When you try to introduce new packages or package upgrades into that context, Conda can be rather unpredictable as to how it will solve that. I think it helps to keep in mind that commands like conda (install|upgrade|remove) mean requesting a distinct environment as a whole, and do not represent low-level commands to change a single package.
Conda does offer some options to get this more low-level behavior. One thing to try is the --freeze-installed flag, which would do what you're asking for. Recent versions of Conda do this by default in the first round of solves, and if it doesn't work then it attempts a full solve. There is also the more dangerous and brute force --no-dep flag, which won't do a solve at all and just install the package. The documentation for this literally says,
"This WILL lead to broken environments and inconsistent behavior. Use at your own risk."
Typically, if you want to use newer packages, it is better to create a new env (conda create -n my_env [pkg1 pkg2 ...]) because the fact is that you no longer want the Anaconda distribution, but instead a custom one with newer versions. My personal view is that most non-beginners should be using Miniconda and relegate their base env to only having conda, while being very liberal about creating envs for projects that have different package requirements. If you ever need a true Anaconda distribution, there's always the anaconda package for that.

Does Conda replace the need for virtualenv?

I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?

Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>

Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.

I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.

Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.

Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.

Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org

Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv

1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip

Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.

I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.

Anaconda and VirtualEnv

I have a virtualenv running python 2.7.7. It has a pretty extensive set of libraries which support a pretty complicated set of proprietary modules. In other words, the virtualenv needs to maintain its integrity. That is of course the whole point of virtualenv.
Recently, I encountered a number of problems that are very easily solved by using Anaconda. I tried it out in a test environment and it worked quite well. Now I'm tasked with incorporating this new configuration into production. It isn't clear to me how to incorporate Anaconda into a virtualenv, or whether this is even a good idea. In fact, it almost seems to me like I should use the anaconda install as the new source and desconstruct the old virtualenv... merging the libraries it held into the conda.
Does anyone have a recommendation as to the best approach? If merging the environments is called for, can anyone point to an explanation of how to go about it?

It doesn't really make sense to merge Anaconda and a virtualenv, as Anaconda is a completely independent installation of Python. You can do it, typically by setting your PYTHONPATH, but things have a good chance of breaking when you do this sort of thing, and I would recommend against it.
If there are libraries in your virtualenv, you can use them with Anaconda by making conda packages for them. They may already have conda packages (search with conda search and search https://binstar.org/). Otherwise, you can build a package using a conda recipe. See http://conda.pydata.org/docs/build.html and https://github.com/conda/conda-recipes for some example recipes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.