Scikit-learn - installing development version (0.20)

Scikit-learn - installing development version (0.20) - python

I currently have scikit-learn 0.19 installed. I'd like to test my code using the latest development version as there seems to be a fix for Incremental PCA.
How do I go about installing this new version if I've previously installed scikit-learn using anaconda?
Also, how would I revert back to the stable release in the event that 0.20 does not solve my problem?
I am in need of some hand holding here, as I've read the docs on the website and not sure I completely understand the process (especially being able to revert back to the stable version if needed).

The whole point of the Anaconda Python distribution (apart from the convenience of having a bunch of useful packages included) is that you get the conda environment manager, which exists to meet exactly this sort of requirement.
What you want to do is to create a new conda environment by launching the Anaconda prompt and typing
conda create -n myenv scikit-learn other-package other-package2 etc
where myenv is the name you want to give the new environment and other-package other-package2 etc are the names of any other packages you will want to use (import) in your code. conda will figure out any dependencies of these packages and show you a list of what is going to be installed before it proceeds.
If you want to specify that a package should be a particular version, add that to the package name e.g. other-package=1.1.0, otherwise conda will install the latest versions of each package that are mutually compatible. You can also specify a particular version of Python by including it in the package list, e.g. python=3.4. You can check what versions of a package are available with conda search package-name (where package-name is the name of the package you want, obviously).
To run your code in the newly created environment, first activate the environment at the Anaconda prompt. If you use the Spyder IDE, launch it after activating the correct environment, or use the start menu shortcut specific to that environment if you have one. Other IDEs may have their own method of selecting a specific environment to work in.
To revert to the version(s) you were using before, activate the environment containing those versions - if you've never created a new environment before, that'll be root.

Just in case someone comes here looking for a solution without conda:
The website recommends that you download the latest code via
git clone git://github.com/scikit-learn/scikit-learn.git
and then include it in pip via (after changing to the directory)
pip install --editable .
You can also add the --user flag to have pip install to a local directory. Then, uninstalling should be as easy as pip uninstall sklearn.

Related

why does pip export with a different version on subsequent conda env exports?

I have been trying to establish a pre-commit git hook to detect environment changes and create a new env.yml export automatically ... similar to the ones described here
Where I am having trouble is that the git hook is detecting an environment change with the pip package on every run of the pre-commit file. Is this possibly related to some scripts using different versions of pip?
If so, I don't understand why the same version isn't being exported every time I run conda env export > env.yml. It almost seems like it is randomly toggling between versions ... but I know there must be some rationale

conda and pip have their own versions of every package installed (provided that you have installed a certain app using both). anaconda (if it's what you're using) is also known for giving plenty of headaches even in simple cases when you pip install something instead of conda install and start mixing dependencies installed with either of those. The general advice is to be very careful about being consistent with each environment separately. In my personal experience, anaconda always tries to superimpose itself by by breaking dependencies managed by pip. In short, if you are using a conda env, make sure that you're using the dependencies installed by conda and conda only.

Mark package as manually installed in anaconda virtualenv (miniconda)

I've had to install a package with pip in a conda environment to get it to work for my application (link).
The package works fine. However, every time I modify the virtual environment in any way, conda tries to install the "missing" package - which would effectively result in downgrading it.
Question: is there a way to mark the pip package as 'manually installed' in the conda venv (e.g. in the same way apt-mark would handle it)? The intention is to get miniconda to leave it alone while still handling the remaining dependencies for the desired additional package.
The pip installed package indeed shows up when typing conda list, with Channel "pypi".
Can give any additional information if needed.
Thanks in advance for any help.

What is the relationship between a python virtual environment and specific system libraries?

We have an application which does some of its work in Python in a python virtual environment setup using virtualenv.
We've hit a problem where the version of a system library does not match the version installed in the virtual environment. That is we have NetCDF4 installed into the virtual environment and and previously had libnetcdf.so.7 installed through yum. The python package appears to be dependent on having libnetcdf.so.7 available.
Due to a system update libnetcdf.so.7 no longer exists and has been replaced by libnetcdf.so.11.
So the question is this: Does setting up the virtual environment detect the system library version or is there some other mechanism? Also do we need to re-build the environment to fix this or is there another option?

When you use virtualenv to create a virtual environment you have the option of whether or not to include the standard site packages as part of the environment. Since this is now default behaviour (though it can be asserted by using --no-site-packages in the command line) it's possible that you are using an older version of virtualenv that doesn't insist on this.
In that case you should be able to re-create the environment fairly easily. First of all capture the currently-installed packages in the existing environment with the commmand
pip freeze > /tmp/requirements.txt
Then delete the virtual environment, and re-create it with the following commands:
virtualenv --no-site-packages envname
source envname/bin/activate
pip install -r /tmp/requirements.txt
However none of this addresses the tricky issue of not having the required support libraries installed. You might try creating a symbolic link to the new library from the old library's position - it may be thatNetCDF4 can work with multiple versions of libnetCDF and is simply badly configured to use a specific version. If not then solving thsi issue might turn out to be long and painful.

anaconda update all possible packages?

I tried the conda search --outdated, there are lots of outdated packages, for example the scipy is 0.17.1 but the latest is 0.18.0. However, when I do the conda update --all. It will not update any packages.
update 1
conda update --all --alt-hint
Fetching package metadata .......
Solving package specifications: ..........
# All requested packages already installed.
# packages in environment at /home/user/opt/anaconda2:
#
update 2
I can update those packages separately. I can do conda update scipy. But why I cannot update all of them in one go?

TL;DR: dependency conflicts: Updating one requires (by it's requirements) to downgrade another
You are right:
conda update --all
is actually the way to go1. Conda always tries to upgrade the packages to the newest version in the series (say Python 2.x or 3.x).
Dependency conflicts
But it is possible that there are dependency conflicts (which prevent a further upgrade). Conda usually warns very explicitly if they occur.
e.g. X requires Y <5.0, so Y will never be >= 5.0
That's why you 'cannot' upgrade them all.
Resolving
Update 1: since a while, mamba has proven to be an extremely powerful drop-in replacement for conda in terms of dependency resolution and (IMH experience) finds solutions to problems where conda fails. A way to invoke it without installing mamba is via the --solver=libmamba flag (requires conda-libmamba-solver), as pointed out by matteo in the comments.
To add: maybe it could work but a newer version of X working with Y > 5.0 is not available in conda. It is possible to install with pip, since more packages are available in pip. But be aware that pip also installs packages if dependency conflicts exist and that it usually breaks your conda environment in the sense that you cannot reliably install with conda anymore. If you do that, do it as a last resort and after all packages have been installed with conda. It's rather a hack.
A safe way you can try is to add conda-forge as a channel when upgrading (add -c conda-forge as a flag) or any other channel you find that contains your package if you really need this new version. This way conda does also search in this places for available packages.
Considering your update: You can upgrade them each separately, but doing so will not only include an upgrade but also a downgrade of another package as well. Say, to add to the example above:
X > 2.0 requires Y < 5.0, X < 2.0 requires Y > 5.0
So upgrading Y > 5.0 implies downgrading X to < 2.0 and vice versa.
(this is a pedagogical example, of course, but it's the same in reality, usually just with more complicated dependencies and sub-dependencies)
So you still cannot upgrade them all by doing the upgrades separately; the dependencies are just not satisfiable so earlier or later, an upgrade will downgrade an already upgraded package again. Or break the compatibility of the packages (which you usually don't want!), which is only possible by explicitly invoking an ignore-dependencies and force-command. But that is only to hack your way around issues, definitely not the normal-user case!
1 If you actually want to update the packages of your installation, which you usually don't. The command run in the base environment will update the packages in this, but usually you should work with virtual environments (conda create -n myenv and then conda activate myenv). Executing conda update --all inside such an environment will update the packages inside this environment. However, since the base environment is also an environment, the answer applies to both cases in the same way.

To answer more precisely to the question:
conda (which is conda for miniconda as for Anaconda) updates all but ONLY within a specific version of a package -> major and minor. That's the paradigm.
In the documentation you will find "NOTE: Conda updates to the highest version in its series, so Python 2.7 updates to the highest available in the 2.x series and 3.6 updates to the highest available in the 3.x series."
doc
If Wang does not gives a reproducible example, one can only assist.
e.g. is it really the virtual environment he wants to update or could Wang get what he/she wants with
conda update -n ENVIRONMENT --all
*PLEASE read the docs before executing "update --all"!
This does not lead to an update of all packages by nature. Because conda tries to resolve the relationship of dependencies between all packages in your environment, this can lead to DOWNGRADED packages without warnings.
If you only want to update almost all, you can create a pin file
echo "conda ==4.0.0" >> ~/miniconda3/envs/py35/conda-meta/pinned
echo "numpy 1.7.*" >> ~/miniconda3/envs/py35/conda-meta/pinned
before running the update. conda issues not pinned
If later on you want to ignore the file in your env for an update, you can do:
conda update --all --no-pin
You should not do update --all. If you need it nevertheless you are saver to test this in a cloned environment.
First step should always be to backup your current specification:
conda list -n py35 --explicit
(but even so there is not always a link to the source available - like for jupyterlab extensions)
Next you can clone and update:
conda create -n py356 --clone py35
conda activate py356
conda config --set pip_interop_enabled True # for conda>=4.6
conda update --all
conda config
update:
Currently I would use mamba (or micromamba) as conda pkg-manager replacement
update:
Because the idea of conda is nice but it is not working out very well for complex environments I personally prefer the combination of nix-shell (or lorri) and poetry [as superior pip/conda .-)] (intro poetry2nix).
Alternatively you can use nix and mach-nix (where you only need you requirements file. It resolves and builds environments best.
On Linux / macOS you could use nix like
nix-env -iA nixpkgs.python37
to enter an environment that has e.g. in this case Python3.7 (for sure you can change the version)
or as a very good Python (advanced) environment you can use mach-nix (with nix) like
mach-nix env ./env -r requirements.txt
(which even supports conda [but currently in beta])
or via api like
nix-shell -p nixFlakes --run "nix run github:davhau/mach-nix#with.ipython.pandas.seaborn.bokeh.scikit-learn "
Finally if you really need to work with packages that are not compatible due to its dependencies, it is possible with technologies like NixOS/nix-pkgs.

Imagine the dependency graph of packages, when the number of packages grows large, the chance of encountering a conflict when upgrading/adding packages is much higher. To avoid this, simply create a new environment in Anaconda.
Be frugal, install only what you need. For me, I installed the following packages in my new environment:
pandas
scikit-learn
matplotlib
notebook
keras
And I have 84 packages in total.

I agree with Mayou36.
For example, I was doing the mistake to install new packages in the base environment using conda for some packages and pip for some other packages.
Why this is bad?
1.None of this is going to help with updating packages that have been > installed >from PyPI via pip, or any packages installed using python
setup.py install. conda list will give you some hints about the
pip-based Python packages you have in an environment, but it won't do
anything special to update them.
And I had all my projects in the same one environment! And I used update all -which is bad and did not update all-.
So, the best thing to do is to create a new environment for each project. Why?
2. A Conda environment is a directory that contains a specific collection of Conda packages that you have installed. For example, you
may be working on a research project that requires NumPy 1.18 and its
dependencies, while another environment associated with an finished
project has NumPy 1.12 (perhaps because version 1.12 was the most
current version of NumPy at the time the project finished). If you
change one environment, your other environments are not affected. You
can easily activate or deactivate environments, which is how you
switch between them.
So, to wrap it up:
Create a new environment for each project
Be aware for the differences in conda and pip
3.Only include the packages that you will actually need and update them properly only if necessary.

if working in MS windows, you can use Anaconda navigator. click on the environment, in the drop-down box, it's "installed" by default. You can select "updatable" and start from there

To update all possible packages I used conda update --update-all
It works!

I solved this problem with conda and pip.
Firstly, I run:
conda uninstall qt and conda uninstall matplotlib and conda uninstall PyQt5
After that, I opened the cmd and run this code that
pip uninstall qt , pip uninstall matplotlib , pip uninstall PyQt5
Lastly, You should install matplotlib in pip by this code that pip install matplotlib

How to install Python libraries under specific environments

I have two Anaconda installations on my computer. The first one is based on Python 2.7 and the other is based on Python 3.4. The default Python version is the 3.4 though. What is more, I can start Python 3.4 either by typing /home/eualin/.bin/anaconda3/bin/python or just python. I can do the same but for Python 2.7 by typing /home/eualin/.bin/anaconda2/bin/python. My problem is that I don't know how to install new libraries under certain environments (either under Python 2.7 or Python 3.4). For example, when I do pip install seaborn the library gets installed under Python 3.4 by default when in fact I want to install it under Python 2.7. Any ideas?
EDIT
This is what I am doing so far: the ~/.bashrc file contains the following two blocks, of which only one is enabled at any given time.
# added by Anaconda 2.1.0 installer
export PATH="/home/eualin/.bin/anaconda2/bin:$PATH"
# added by Anaconda3 2.1.0 installer
#export PATH="/home/eualin/.bin/anaconda3/bin:$PATH"
Depending of which version I want to work, I open the fie, comment the opposite block and do source ~/.bashrc Then, I install the libraries I want to use one by one. But, is this the recommended way?

You don't need multiple anaconda distributions for different python versions. I would suggest keeping only one.
conda basically lets you create environments for your different needs.
conda create -n myenv python=3.3 creates a new environment named myenv, which works with a python3.3 interpreter.
source activate myenv switches to the newly created environment. This basically sets the PATH such that pip, conda, python and other binaries point to the correct environment and interpreter.
conda install pip is the first thing you may want to do. Afterwards you can use pip and conda to install the packages you need.
After activating your environment pip install <mypackage> will point to the right version of pip so no need to worry too much.
You may want to create environments for different python versions or different sets of packages. Of course you can easily switch between those environments using source activate <environment name>.
For more examples and details you may want to have a look at the docs.

Virtualenv seems like the obvious answer here, but I do want to suggest an alternative that we've been using to great effect lately: Fig - this is particularly effective since we use Docker in production as well, but I imagine that using Fig as a replacement for virtualenv would be quite effective regardless of your production environment.

Using virtualenv is your best option as #Dettorer has mentioned.
I found this method of installing and using virtualenv the most useful.
Check it out:
Proper way to install virtualenv

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.