Anaconda and VirtualEnv - python

I have a virtualenv running python 2.7.7. It has a pretty extensive set of libraries which support a pretty complicated set of proprietary modules. In other words, the virtualenv needs to maintain its integrity. That is of course the whole point of virtualenv.
Recently, I encountered a number of problems that are very easily solved by using Anaconda. I tried it out in a test environment and it worked quite well. Now I'm tasked with incorporating this new configuration into production. It isn't clear to me how to incorporate Anaconda into a virtualenv, or whether this is even a good idea. In fact, it almost seems to me like I should use the anaconda install as the new source and desconstruct the old virtualenv... merging the libraries it held into the conda.
Does anyone have a recommendation as to the best approach? If merging the environments is called for, can anyone point to an explanation of how to go about it?

It doesn't really make sense to merge Anaconda and a virtualenv, as Anaconda is a completely independent installation of Python. You can do it, typically by setting your PYTHONPATH, but things have a good chance of breaking when you do this sort of thing, and I would recommend against it.
If there are libraries in your virtualenv, you can use them with Anaconda by making conda packages for them. They may already have conda packages (search with conda search and search https://binstar.org/). Otherwise, you can build a package using a conda recipe. See http://conda.pydata.org/docs/build.html and https://github.com/conda/conda-recipes for some example recipes.

Related

Module or Incorrect Python Version Problem?

I'm installing a bunch of python modules on my system that are specific to this code I am going to be working with. Specifically, I used pip install pyda to get the pyda module. To make sure I had gotten all the modules, I went through and ran some of the code snippets, and came across the following error:
ModuleNotFoundError: No module named 'pyda.utilities'
I tried using pip install pyda.utilities, but that honestly doesn't make sense, it should have just come with the pyda module. According to this website https://pypi.org/project/pyda/ it seems like it should come with the package. I tried determining if I just installed it in the wrong python version, but I'm having a difficult time forcing it to use the specific python version that I installed the module in (specifically, I tried to create an alias for /usr/bin/python3.7 or something like this as I have seen on other websites, but it just fusses at me that this is simply a directory, incredibly unhelpful because I can't find the corresponding executable, so I'm a bit confused here).
This is a very long question likely with a very simple answer, any thoughts or help on what the issue might be would be appreciated.
Edit: I have determined that it's a package problem, not a python problem. The command 'pip install pyda' is not actually installing everything, oddly enough, which is why it cannot find the pyda.utilities module. Unfortunately, I think this means I will have to install the package manually. I will keep this question posted because of the useful answer on virtual environments, so thanks everyone.
The answer is indeed straightforward. As #Chris indicated in the comments, start using virtual environments.
It's not as complicated as it sounds and there's plenty of tutorials on getting started with virtualenv for Python, like https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/26/python-virtual-env/
The basic steps:
check you're using the version of Python you want in your virtual environment
if you don't, change directories to where it lives
ensure you have pip and it works
check if you have virtualenv and if you don't pip install virtualenv
create a virtual environment virtualenv /your/env/folder/here
activate the virtual environment with /your/env/folder/here/Scripts/activate
After that, just install the packages you need with pip and they will end up in your virtual environment, with no interference from other Python versions or packages.
Check your python version, if it does not work restart your computer and try run setup.py install on the python command line

Is it ok to use the anaconda distribution for web development?

I started learning Python working on projects involving data and following tutorials advising to install the anaconda bundle, to take advantage of the other libraries coming with it.
So did I and I got confortable with it, I liked the way it manages environments.
During the last months, I have been teaching myself web development with django, flask, continuing to use the anaconda python. But most of the time I install the dependencies I need with pip install though (inside the conda environment).
I've never seen any tutorials or podcasts mentioning the conda environment as an option for developing web apps so I start to get worried. Is it for a good reason?
Everywhere its the combination of pip and virtualenv that prevail. And virtualenv isn't compatible with anaconda that has its own env management system.
My newbie question is: Will I run into problems later (dependencies management in production or deployment maybe?) using the anaconda distribution to develop my web apps?
Yes. Albeit, with a few caveats. First, I don't recommend using the big Anaconda distribution. I recommend installing Miniconda(3) (link).
To set up the second caveat, it's important to figure out what part of Conda you are talking about using. Conda is two things, that is, it is has both the functionality of virtualenv (an environment manager) and pip (a package manager).
So you certainly can use Conda in place of virtualenv (an environment manager) and still use pip within that Conda environment as your package manager. Actually this is my preference. Jake VanderPlas had a good comparison of virtualenv vs Conda as an environment manager. Conda has a more limited offering of packages thus I try to keep everything as one package manager (pip) within that environment. One problem I've found with virtualenv is you can't choose any particular flavor of Python, e.g. 2.7, 3.3, 3.6, etc like you can seemlessly install that version of Python within your environment with Conda.
Here's a list of command comparisons of Conda, virtualenv, and pip if that helps clear things up a bit on how you can utilize Conda and/or virtualenv and/or pip.

Managing python modules 101

I am very confused, how to handle python's modules. There are multiple ways of installing packages. I am currently using three ways.
1) Packages of the linux distribution
Currently, many of the most popular modules/packets, like ipython, it is possible to install them via the distribution package manager. This leaves me with a system wide access of the package. I don't have to do anything about my PATHs, user access rights. It just work and until now, it was my most favorite method.
2) Pip or conda
As started to use packages, which are not that famous, they don't have a distribution package. So i have to obtain them elsewhere. Until now, if i couldn't find it in my distribution, i could just use pip to install any other necessary package. Conda is another option though.
My Question:
What is the "best" approach. I mean all possibilities do exactly the same, in the view of the user. I would like to use one, so i don't get confused, however as i stated, not everything is available everywhere. I am forced to use all of them right now. This is very annoying, especially in terms of update, conflicts, user rights and access, path variables, not even mentioning python 2.7 and 3.x "war".
So how do you do it? How do you maintain a system (i.e. a desktop) with so many different package managers. Do you stick do pip/conda only, installing everything for yourself(i.e. your home directory)?
The Python documentation recommends pip for installing python models:
pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.
For a more complete overview, you can check out the documentation on installing modules.
In terms of dealing with the conflicts you've mentioned, you should be using virtual environments, either with pyenv or virtualenv. Virtual environments allow you to use different modules or versions of modules for different projects. Using virtual environments also allows you to replicate that environment elsewhere, for instance, on a server.

Does Conda replace the need for virtualenv?

I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?
Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>
Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.
I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.
Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.
Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.
Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org
Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv
1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip
Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.
I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.

Managing Python installations

There are many versions of Python, and it becomes difficult to manage them all.
Often I need to install one module into 3 different versions of Python.
Is there a tool which can simplify things?
I'm on Windows.
Thanks.
Are you using virtualenv? If not, you definitely want to check that out: http://pypi.python.org/pypi/virtualenv
It helps you by managing and switching between several virtual Python environments, with different versions of Python if you want to.
There are loads of tutorials of how to set it up, all over dem interweb.
What Legogris said: use virtualenv.
I just answered a question on pip, virtualenv, and virtualenvwrapper applicable here. I highly recommend this combination of tools for maintaining isolated python environments.
As a further point, I strong recommend using the no-site-packages option so that each virtualenv has all its requirements in one place.
Because some modules contain binary code which is linked agains a specific Python version, it will not be possible to instal a module only once. You will always have to install it for each installed version. But if you use pip, you should have a look at pip: dealing with multiple Python versions? Just create a batch file which calls pip for each installed version. That should at least simplify your life.
I'm not aware of any Python facility for doing that, that's really the OS's job. Debian/Ubuntu, for example, has support for installing multiple versions of Python and installing libraries into each version. I doubt there's any such support in Windows.

Categories

Resources