Setting packages version using pip in a conda environment

Setting packages version using pip in a conda environment - python

Background:
I've been working on a server last year as a part of a student project. Now the server is being changed and unfortunately, a piece of code that I'd written is failed to pass a test. I don't have access to the server (because of security issues) but the good news is I have the scripts as well as the packages (and their version) which are installed on the new server. So I've decided to install all the packages on the server on my own local machine in an environment to imitate the server.
Problem:
Since not all the packages are available with conda (or channels need to be specified) I decided to make a conda environment and install all the packages with pip. Once I create the environment, conda installs a bunch of packages (like wheel,pip,ipython, certifi, etc.). But I need a certain (older) version of them. So I tried to override them by pip install -U <package_name>==<version>. But for some packages, I ended up with two different versions: one the default version installed by conda at the environment creation, and the other one with pip. And now, when I run my code, I'm not sure which version of those packages are being used. So I want to keep only one version: the one which is installed via pip!
Question(s):
How can I uninstall the packages that are installed with conda while keeping the pip ones? I used conda remove <package_name> with the expectation that only conda-installed packages would be wiped, yet that wasn't the case and both were gone for good!
How to override a conda-installed version via pip? I know conda is made to avoid this action to protect the environment from getting messy, but in my case it leads me to another question,
How in the first place, can I make the environment exactly the same as the server so that I don't deal with conda/pip packages any more?

Related

How to install python packages in a virtual environment without downloading them again?

It's a great hassle when installing some packages in a VE and conda or pip downloads them again even when I already have it in my base environment. Since I have limited internet bandwidth and I'm assuming I'll work with many different VE's, it will take a lot of time to download basic packages such as OpenCV/Tensorflow.

By default, pip caches anything it downloads, and will used the cached version whenever possible. This cache is shared between your base environment and all virtual environments. So unless you pass the --no-cache-dir option, pip downloading a package means it has not previously downloaded a compatible version of that package. If you already have that package installed in your base environment or another virtual environment and it downloads it anyway, this probably means one or more of the following is true:
You installed your existing version with a method other than pip.
There is a newer version available, and you didn't specify, for example, pip install pandas=1.1.5 (if that's the version you already have elsewhere). Pip will install the newest compatible version for your environment, unless you tell it otherwise.
The VE you're installing to is a different Python version (e.g. created with Pyenv), and needs a different build.
I'm less familiar with the specifics of conda, and I can't seem to find anything in its online docs that focuses on the default caching behavior. However, a how-to for modifying the cache location seems to assume that the default behavior is similar to how pip works. Perhaps someone else with more Anaconda experience can chime in as well.
So except for the caveats above, as long as you're installing a package with the same method you did last time, you shouldn't have to download anything.
If you want to simplify the process of installing all the same packages (that were installed via pip) in a new VE that you already have in another environment, pip can automate that too. Run pip freeze > requirements.txt in the first environment, and copy the resulting file to your newly created VE. There, run pip install -r requirements.txt and pip will install all the packages that were installed (via pip) in the first environment. (Note that pip freeze records version numbers as well, so this won't install newer versions that may be available -- whether this is a good or bad thing depends on your needs.)

How can we conda install packages have been downloaded successfully, ignoring download failed packages?

everyone:
Because of the speed of network, when I conda install some packages, there will exist some related packages can not be downloaded completely. But we can not install packages have been downloaded successfully without other "related" packages(maybe "related" means the best march in version, but not necessary).
For example, When I install pytorch, it need numpy-1.14.2, but I am with numpy-1.15.1. I don't need verson 1.14.2 numpy in practice.
So I am a little confused how to make "conda" trying to install packages have been downloaded successfully, ignoring download failed packages?
Thanks!
EricKani

From the conda documentation there are two options that may help https://docs.conda.io/projects/conda/en/latest/commands/install.html
--no-update-deps
Do not update or change already-installed dependencies.
--no-deps
Do not install, update, remove, or change dependencies. This WILL lead to broken environments and inconsistent behavior. Use at your own
risk.
I believe by default conda tries with --no-update-deps first and then if that fails tries to update deps; giving it that option will make sure some version of each needed package is installed, if not necessarily the latest.
You could try --no-deps as well, which will literally prevent conda from installinh ANYTHING other than the exact packages you tell it to, but things may not work with that.

What is the difference between <package name> and python-<package name>?

I want to downgrade tornado to a previous version because the new one causes an error according to the answers here: Jupyter notebook kernel not connecting. I am on Ubuntu, in a virtual environment.
To check the current version of it, I used pip freeze and got this: tornado==6.0.1. When I use apt-cache policy tornado the output is: "Unable to locate package tornado". When I type apt-cache policy python-tornado the output is "python-tornado: Installed: 4.5.3-1".
How do I proceed from here? My ultimate goal is to make the jupyter notebook run, and I need to figure out this tornado module for that. What is the difference between tornado and python-tornado? Which one I should care about?

One of those names is the actual package name under which it is published to the Python Package Index (PyPI), which is the namespace that pip deals in.
The other is the name as set by your Ubuntu operating system, and given the version string, I am guessing that you are using Ubuntu 18.04 Bionic Beaver. Ubuntu uses a strict naming convention, where all Python packages must start with a python- prefix. These packages are managed and installed by your OS package manager.
How to proceed depends on your Jupyter setup. If it is installed and running from a virtualenv, then you can use the pip command when the virtualenv is active to alter versions there. Take into account that using pip should already ensure you are getting compatible versions installed; you could try to upgrade jupyter if tornado was upgraded independently.
If you are using the Ubuntu-managed jupyter package then there too the package manager should take care of matching versions.
However, if you you are using a virtualenv that still has access to the OS-mananged jupyter system while locally only tornado is installed, then you want to add jupyter to your virtualenv to mask the system version, which is too old.

Is it ok to use the anaconda distribution for web development?

I started learning Python working on projects involving data and following tutorials advising to install the anaconda bundle, to take advantage of the other libraries coming with it.
So did I and I got confortable with it, I liked the way it manages environments.
During the last months, I have been teaching myself web development with django, flask, continuing to use the anaconda python. But most of the time I install the dependencies I need with pip install though (inside the conda environment).
I've never seen any tutorials or podcasts mentioning the conda environment as an option for developing web apps so I start to get worried. Is it for a good reason?
Everywhere its the combination of pip and virtualenv that prevail. And virtualenv isn't compatible with anaconda that has its own env management system.
My newbie question is: Will I run into problems later (dependencies management in production or deployment maybe?) using the anaconda distribution to develop my web apps?

Yes. Albeit, with a few caveats. First, I don't recommend using the big Anaconda distribution. I recommend installing Miniconda(3) (link).
To set up the second caveat, it's important to figure out what part of Conda you are talking about using. Conda is two things, that is, it is has both the functionality of virtualenv (an environment manager) and pip (a package manager).
So you certainly can use Conda in place of virtualenv (an environment manager) and still use pip within that Conda environment as your package manager. Actually this is my preference. Jake VanderPlas had a good comparison of virtualenv vs Conda as an environment manager. Conda has a more limited offering of packages thus I try to keep everything as one package manager (pip) within that environment. One problem I've found with virtualenv is you can't choose any particular flavor of Python, e.g. 2.7, 3.3, 3.6, etc like you can seemlessly install that version of Python within your environment with Conda.
Here's a list of command comparisons of Conda, virtualenv, and pip if that helps clear things up a bit on how you can utilize Conda and/or virtualenv and/or pip.

Does Conda replace the need for virtualenv?

I recently discovered Conda after I was having trouble installing SciPy, specifically on a Heroku app that I am developing.
With Conda you create environments, very similar to what virtualenv does. My questions are:
If I use Conda will it replace the need for virtualenv? If not, how do I use the two together? Do I install virtualenv in Conda, or Conda in virtualenv?
Do I still need to use pip? If so, will I still be able to install packages with pip in an isolated environment?

Conda replaces virtualenv. In my opinion it is better. It is not limited to Python but can be used for other languages too. In my experience it provides a much smoother experience, especially for scientific packages. The first time I got MayaVi properly installed on Mac was with conda.
You can still use pip. In fact, conda installs pip in each new environment. It knows about pip-installed packages.
For example:
conda list
lists all installed packages in your current environment.
Conda-installed packages show up like this:
sphinx_rtd_theme 0.1.7 py35_0 defaults
and the ones installed via pip have the <pip> marker:
wxpython-common 3.0.0.0 <pip>

Short answer is, you only need conda.
Conda effectively combines the functionality of pip and virtualenv in a single package, so you do not need virtualenv if you are using conda.
You would be surprised how many packages conda supports. If it is not enough, you can use pip under conda.
Here is a link to the conda page comparing conda, pip and virtualenv:
https://docs.conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands.

I use both and (as of Jan, 2020) they have some superficial differences that lend themselves to different usages for me. By default Conda prefers to manage a list of environments for you in a central location, whereas virtualenv makes a folder in the current directory. The former (centralized) makes sense if you are e.g. doing machine learning and just have a couple of broad environments that you use across many projects and want to jump into them from anywhere. The latter (per project folder) makes sense if you are doing little one-off projects that have completely different sets of lib requirements that really belong more to the project itself.
The empty environment that Conda creates is about 122MB whereas the virtualenv's is about 12MB, so that's another reason you may prefer not to scatter Conda environments around everywhere.
Finally, another superficial indication that Conda prefers its centralized envs is that (again, by default) if you do create a Conda env in your own project folder and activate it the name prefix that appears in your shell is the (way too long) absolute path to the folder. You can fix that by giving it a name, but virtualenv does the right thing by default.
I expect this info to become stale rapidly as the two package managers vie for dominance, but these are the trade-offs as of today :)
EDIT: I reviewed the situation again in 04/2021 and it is unchanged. It's still awkward to make a local directory install with conda.

Virtual Environments and pip
I will add that creating and removing conda environments is simple with Anaconda.
> conda create --name <envname> python=<version> <optional dependencies>
> conda remove --name <envname> --all
In an activated environment, install packages via conda or pip:
(envname)> conda install <package>
(envname)> pip install <package>
These environments are strongly tied to conda's pip-like package management, so it is simple to create environments and install both Python and non-Python packages.
Jupyter
In addition, installing ipykernel in an environment adds a new listing in the Kernels dropdown menu of Jupyter notebooks, extending reproducible environments to notebooks. As of Anaconda 4.1, nbextensions were added, adding extensions to notebooks more easily.
Reliability
In my experience, conda is faster and more reliable at installing large libraries such as numpy and pandas. Moreover, if you wish to transfer your preserved state of an environment, you can do so by sharing or cloning an env.
Comparisons
A non-exhaustive, quick look at features from each tool:
Feature
virtualenv
conda
Global
n
y
Local
y
n
PyPI
y
y
Channels
n
y
Lock File
n
n
Multi-Python
n
y
Description
virtualenv creates project-specific, local environments usually in a .venv/ folder per project. In contrast, conda's environments are global and saved in one place.
PyPI works with both tools through pip, but conda can add additional channels, which can sometimes install faster.
Sadly neither has an official lock file, so reproducing environments has not been solid with either tool. However, both have a mechanism to create a file of pinned packages.
Python is needed to install and run virtualenv, but conda already ships with Python. virtualenv creates environments using the same Python version it was installed with. conda allows you to create environments with nearly any Python version.
See Also
virtualenvwrapper: global virtualenv
pyenv: manage python versions
mamba: "faster" conda
In my experience, conda fits well in a data science application and serves as a good general env tool. However in software development, dropping in local, ephemeral, lightweight environments with virtualenv might be convenient.

Installing Conda will enable you to create and remove python environments as you wish, therefore providing you with same functionality as virtualenv would.
In case of both distributions you would be able to create an isolated filesystem tree, where you can install and remove python packages (probably, with pip) as you wish. Which might come in handy if you want to have different versions of same library for different use cases or you just want to try some distribution and remove it afterwards conserving your disk space.
Differences:
License agreement. While virtualenv comes under most liberal MIT license, Conda uses 3 clause BSD license.
Conda provides you with their own package control system. This package control system often provides precompiled versions (for most popular systems) of popular non-python software, which can easy ones way getting some machine learning packages working. Namely you don't have to compile optimized C/C++ code for you system. While it is a great relief for most of us, it might affect performance of such libraries.
Unlike virtualenv, Conda duplicating some system libraries at least on Linux system. This libraries can get out of sync leading to inconsistent behavior of your programs.
Verdict:
Conda is great and should be your default choice while starting your way with machine learning. It will save you some time messing with gcc and numerous packages. Yet, Conda does not replace virtualenv. It introduces some additional complexity which might not always be desired. It comes under different license. You might want to avoid using conda on a distributed environments or on HPC hardware.

Another new option and my current preferred method of getting an environment up and running is Pipenv
It is currently the officially recommended Python packaging tool from Python.org

Conda has a better API no doubt. But, I would like to touch upon the negatives of using conda since conda has had its share of glory in the rest of the answers:
Solving environment Issue - One big thorn in the rear end of conda environments. As a remedy, you get advised to not use conda-forge channel. But, since it is the most prevalent channel and some packages (not just trivial ones, even really important ones like pyspark) are exclusively available on conda-forge you get cornered pretty fast.
Packing the environment is an issue
There are other known issues as well. virtualenv is an uphill journey but, rarely a wall on the road. conda on the other hand, IMO, has these occasional hard walls where you just have to take a deep breath and use virtualenv

1.No, if you're using conda, you don't need to use any other tool for managing virtual environments (such as venv, virtualenv, pipenv etc).
Maybe there's some edge case which conda doesn't cover but virtualenv (being more heavyweight) does, but I haven't encountered any so far.
2.Yes, not only can you still use pip, but you will probably have to. The conda package repository contains less than pip's does, so conda install will sometimes not be able to find the package you're looking for, more so if it's not a data-science package.
And, if I remember correctly, conda's repository isn't updated as fast/often as pip's, so if you want to use the latest version of a package, pip might once again be your only option.
Note: if the pip command isn't available within a conda virtual environment, you will have to install it first, by hitting:
conda install pip

Yes, conda is a lot easier to install than virtualenv, and pretty much replaces the latter.

I work in corporate, behind several firewall with machine on which I have no admin acces
In my limited experience with python (2 years) i have come across few libraries (JayDeBeApi,sasl) which when installing via pip threw C++ dependency errors
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
these installed fine with conda, hence since those days i started working with conda env.
however it isnt easy to stop conda from installing dependency inside c.programfiles where i dont have write access.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.