I've been asked to look at some dev ops stuff regarding python and I'm a bit stuck. The network I'm working is not internet connected so I've been setting up Nexus repositories to bring in dependencies for docker, java and pypi that the other developers can access and pull down locally. However, they have started using conda more and more and we are on a fixed version on our dev network to match a delivery network.
I'm trying to use nexus' conda repos although every time I try and install something it tries to update everything else, including the python and conda versions which are:
conda version : 4.8.3
conda-build version : 3.18.11
python version : 3.8.3.final.0
I've edited my .condarc file to read:
channels:
- http://master:8041/repository/anaconda-proxy/main/
- http://master:8041/repository/conda-forge/
remote_read_timeout_secs: 1200.0
auto_update_conda: false
channel_priority: false
However every time i try to install something to cache the dependencies I get an huge list of updates. For example:
conda install cudatoolkit
<snip>
The following packages will be downloaded:
package | build
---------------------------|-----------------
alabaster-0.7.12 | py_0 16 KB http://master:8041/repository/anaconda-proxy/main
anaconda-client-1.7.2 | py38_0 172 KB http://master:8041/repository/anaconda-proxy/main
anaconda-project-0.8.4 | py_0 210 KB http://master:8041/repository/anaconda-proxy/main
argh-0.26.2 | py38_0 36 KB http://master:8041/repository/anaconda-proxy/main
.....
Any advice would be great. I've added the auto_update_conda and channel_priority flags but to no avail. Thanks in advance.
Additonal info:
I'm a Java developer and I only use a bit of python, so I'm not massively familar with the anaconda setup so apologies if this is simpler than I'm making it.
How Conda Solves
Conda always first attempts to solve the install directive without changing existing packages (i.e., it runs first with a --freeze-installed flag) and will only proceed to a full solve (what you are seeing) if it can't find any version of your requested package that already has all its dependencies satisfied in the environment. That is, this result implies that what you are asking for is not possible. Or at least not via the CLI if you want a valid environment.1
At the core of the issue is that even if there is only a single dependency that needs updating, there is no intermediate mode to indicate that you want to minimize the total number of changes (which I think would actually be a nice enhancement). Conda only has two solving modes:
Change nothing else (--freeze-installed).
All dependencies are allowed to update (--update-deps).
The exception to this are the aggressive_update_packages and the auto_update_conda, which it will always attempt to update whenever the environment is mutated. But it seems you've already realized those can be disabled through configuration settings.2
Manual Dependency Updating
This doesn't mean what you are hoping to accomplish is impossible, but that there isn't a clean way to automate it via the CLI. Instead, you might need to manually track down the dependencies that need updating (e.g., conda search cudatoolkit --info), update them first (conda install with specific versions), and then try installing your package again. I would strongly recommend first settling on the exact version of cudatoolkit you plan to install, otherwise conda search cudatoolkit --info will be too much info.
Package Pinning
For packages that you really do want absolutely fixed there is package pinning. You could do this for conda, python, and other core packages.
Base Environment
I find it a bit odd that the base environment (the one that has the conda package) is being mutated at all. Instead, I would expect software engineers to always use non-base environments for development and production. It is easy to create new environments, one can define them with version controlled YAML files, use them modularly by creating them on a per project or per task-type basis, and they can be mutated without worrying about affecting the Conda infrastructure. However, I'm not entirely clear on your setup, so this comment may not apply.
[1] If one doesn't care about validity (probably not a good idea for production) then there is always the --no-deps flag.
[2] The default aggressive_update_packages packages are ones that frequently become vulnerable to exploits (e.g., openssl), so carefully consider the implications of leaving them outdated.
Related
I'm working outward from this conda-build example to eventually build a conda package of my own. (If you try it out, note that the meta.yaml in the example is out of date and you need to use a different meta.yaml; details in this issue.)
The source code in this conda-build example is an existing project called click, which seems to have a very specific structure with elements like tox.ini and setup.py and setup.cfg. It's hard for me to find definitive guidance on Conda's requirements or expectations about the structure of the source code anywhere in the conda-build docs, so I've just been changing one thing at a time starting from the working example and checking if it still works.
Each conda build command takes several minutes. It makes debugging slow and I've gotten impatient. How can I speed up conda build so that I can easily experiment with different inputs? There are tips to speed up conda environment solving here, but I'm not solving an environment; I'm building a package.
My package is pure Python, so I don't need to bother with any compiler details.
I use boa, which is an add-on to conda-build that will use Mamba as the solver instead (much faster solves). Once installed, one uses:
conda mambabuild
instead of
conda build
Not just me, but the entire Conda Forge CI has used boa for several months now.
I notice if I am trying to remove huge conda packages that occupy hundreds of megabytes in space, running conda remove <package> will take forever. Some examples of these huge packages are pystan, spacy-model-en_core_web_lg.
It is stuck at with no error messages;
Collecting package metadata (repodata.json): done
Solving environment:
Any hints how to fix this problem?
I am using anaconda, python 3.8, windows 10.
Conda's remove operation still needs to satisfy all the other specifications for the environment, so Conda invokes its solver and this can be complicated. Essentially, it re-solves the entire environment sans the specified package, compares that against the existing state, then makes a plan based on the difference.
I very much doubt there is anything directly impactful about size of package, which OP alludes to. Instead, things that negatively impact solving are:
having a large environment (e.g., anaconda package is installed)
channel mixing - in particular, including the conda-forge channel at equal or higher priority as defaults in an environment with the anaconda package; that package and all its dependencies are intended to be sourced from the anaconda channel
having an underspecified environment (see conda env export --from-history to see your explicit specifications); e.g., an environment with a python=3.8 specification will be easier on the solver than just a python specification
In general, using smaller specialized (e.g., per-project) environments, rather than large monolithic ones helps avoid such problems. The anaconda package is particularly problematic.
Try Mamba
Other than adopting better practices, one can also get significantly faster solves with Mamba, a drop-in compiled replacement for conda. Try it out:
## install Mamba in base env
conda install -n base conda-forge::mamba
## use it like you would the 'conda' command
mamba remove -n foo bar
This question already has answers here:
What is the difference between pip and conda?
(14 answers)
Closed 2 years ago.
In this article, the author suggests the following
To install fuzzy matcher, I found it easier to conda install the
dependencies (pandas, metaphone, fuzzywuzzy) then use pip to install
fuzzymatcher. Given the computational burden of these algorithms you
will want to use the compiled c components as much as possible and
conda made that easiest for me.
Can someone explain why he is suggesting to use Conda to install dependencies and then use pip to install the actual package i.e fuzzymatcher? Why can't we just use Conda for both? Also, how do we know if we are using the compiled C packages as he suggested?
Other answers have addressed how Conda does a better job managing non-Python dependencies. As for why use Pip at all, in this case it's not complicated: the fuzzymatcher package was not available on Conda when the article was written (18 Feb 2020). The first and only version of the package was uploaded on Conda Forge on 1 Dec 2020.
Unless one wants an version older (< 0.0.5), one can now just use Conda. Going forward, Conda Forge's Autotick Bot will automatically submit pull requests and build any new versions of the package whenever they get pushed to PyPI.
conda is the package manager (installer and uninstaller) for Anaconda or Miniconda.
pip is the package manager for Python.
Depend on your system environment and additional settings, pip and conda may install onto the same Python installation folder ($PYTHONPATH/Lib/site-packages or %PYTHONPATH%\Lib\site-packages). Hence both conda and pip usually work well together.
However, conda and pip get their Python packages from different channels or websites.
conda searches and downloads from the official channel: https://repo.anaconda.com/pkgs/
This packages are supported officially by Anaconda and hence maintained in that channel.
However, we may not find every Python packages or packages of newer versions than those in the official channel. That is why sometimes we may install Python packages from "conda-forge" or "bioconda". These are the unofficial channels maintained by developers and other friendly users.
We could specify other channel like these:
conda install <package1> --channel conda-forge
conda install <package2> --channel bioconda
pip searches and download from pypi
We should be able to download every publicly available Python packages there.
These packages are generated and uploaded by developers and friendly users.
The dependency setting in each package may not be fully tested nor verified.
These packages may not support older or newer version of Python.
Hence, if you are using Anaconda or Miniconda, you should use conda. If you could not find specific packages from the official channels, you may try conda-forge or bioconda. Finally get it from pypi.
However, if you do not use Anaconda, then stick with pip.
For advanced users, you may download the most latest libraries from their source (such as github, gitlab, etc.) However there is a catch:
Some Python packages are written in pure Python. In this case, you should not have issue to install these packages into your system.
Some Python packages are written in C, C++, Go, etc. In this case, you would need
A supported compiler for your system as well as your Python environment (32- or 64-bit, versions).
Python header files, linkable Python libraries and archives specific for your installed Python version. Anaconda includes these in its installation.
How do we know if a Python package needs a particular compiler?
It may not be easy for people to find out. However, you could find out in the following means (possibly order):
Look at the landing page (or README.nd or README.txt files) in the source repository.
For example, if you go to Pandas's source repository, it show that it needs cython, hence the installation would need a C compiler.
Look at the setup.py in the source repository.
For example, if you go to numpy's setup.py, it needs a C compiler.
Look at the amount of source code that are written using programming languages that need compilation (such as C, C++, Go, etc.) For example, numpy library is written using 35.7% of C, 1.0% of C++, etc. However, this is only a guide as these source code may be only testing routines.
Ask in stackoverflow.
For the compiled C packages, you could import a package, see where it's located, and check the package itself to see what it imports. At some point, you would read into an import of a compiled module (.so extension on *nix). There's possibly an easier way, but that may depend on at what point in the import sequence of the package the compiled module is loaded.
Fuzzymatcher may not be available through Conda, or only an outdated version, or only a version that matches an outdated set of dependencies. Then you may end up with an out-of-date set of packages. Pip may have a more recent version of fuzzymatcher, and likely cares less (for better or worse) on the versions of various other packages in your environment. I'm not familiar with fuzzymatcher, so I can't give you an exact reason: you'd have to ask the author.
Note that the point of that paragraph, on installing the necessary packages with Conda, is that some packages require (C) libraries (not necessary compiled packages, though these will depend on these libraries) that may not be installed by default on your system. Conda will install these for you; Pip will not.
I have recently started using the Anaconda Python distribution as it offers a lot of Data Analysis libraries out of the box. And using conda to create environments and install packages is also a breeze. But I have faced some serious issues when I want to update Python itself or any other module, I am informed beforehand that a LOT of my existing libraries will be removed.
For example, this is what I get when I use conda update [package_name]
$ conda update pandas
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: C:\Users\User\Anaconda3
added / updated specs:
- matplotlib
The following packages will be REMOVED:
[Almost half of my existing packages]
The following packages will be UPDATED:
[Some packages including my desired library, in this case, pandas]
I have searched the web on how to update packages and Python using conda and almost everywhere I saw that conda update [package name] was suggested. But why doesn't it work for me? I mean it will work but at the expense of tons of important libraries that I need.
So I have tried using the Anaconda Navigator to update the desired libraries (like matplotlib and pandas) hoping that the removal of existing libraries might be a command line issue on my computer. But I had seriously messed up my base (root) environment by updating pandas using Navigator. I didn't get any warnings that a lot of my modules will be removed so I thought I was doing fine. But after the update was done and I wrote some matplotlib code, I wasn't able to run it. I got errors that resembled something that indicated matplotlib was a "non-conda module". So I had to do conda install --revision n to go back to a state where I had my modules.
Right now, the only way for me to update any package or Python is to do this:
conda install pandas=[package_version_that_is_higher_than_mine]
But there's got to be a reason why I am facing this issue. Any help is absolutely appreciated.
EDIT: It turns out that the issue is mainly when I am trying to update using the base environment. When I use my other conda environments, the conda update [package_name] or conda update --all works fine.
Anaconda (as distinct from Conda) is designed to be used as a fixed set of package builds that have been vetted for compatibility (see "What's in a Name? Clarifying the Anaconda Metapackage). When you try to introduce new packages or package upgrades into that context, Conda can be rather unpredictable as to how it will solve that. I think it helps to keep in mind that commands like conda (install|upgrade|remove) mean requesting a distinct environment as a whole, and do not represent low-level commands to change a single package.
Conda does offer some options to get this more low-level behavior. One thing to try is the --freeze-installed flag, which would do what you're asking for. Recent versions of Conda do this by default in the first round of solves, and if it doesn't work then it attempts a full solve. There is also the more dangerous and brute force --no-dep flag, which won't do a solve at all and just install the package. The documentation for this literally says,
"This WILL lead to broken environments and inconsistent behavior. Use at your own risk."
Typically, if you want to use newer packages, it is better to create a new env (conda create -n my_env [pkg1 pkg2 ...]) because the fact is that you no longer want the Anaconda distribution, but instead a custom one with newer versions. My personal view is that most non-beginners should be using Miniconda and relegate their base env to only having conda, while being very liberal about creating envs for projects that have different package requirements. If you ever need a true Anaconda distribution, there's always the anaconda package for that.
I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?
Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>
Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.
I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.
Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.
Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.
Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org
Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv
1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip
Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.
I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.