I've written an Apache Spark python script, and for compatibility reasons I need to pass a specific version of scikit-learn when I submit the Spark job to the cluster. The problem is I'm not sure where I can get a copy of the scikit-learn binary/executable that I need. I don't think it's as straight forward as downloading the scikit-learn source and compiling myself because I've heard scikit-learn has a lot of dependencies, and I'm not sure which dependencies I need for the version I require. I was thinking I could create a Conda environment with the specific scikit-learn version I need so that Conda could do all the compiling for me, but I'm not sure where Conda saves the libraries that it builds. I tried checking under the default venv folder, but didn't see anything promising.
Conda takes care of the dependencies. Just pass the version to Conda:
$ conda install scikit-learn=0.16.1
If you want the exact version of every package, you can do the following:
$ conda list -e > requirements.txt
You then create a new environment as follows:
$ conda create -n my_environment --file requirements.txt
Packages are stored in the (prefix)/pkgs folder, before being extracted. Extracted files can live lots of places in the prefix - just whatever the package specifies. You can ship the package tarballs around if necessary, and install from them directly (specify them as arguments conda install). However, it really is nicer to do what Alexander suggested here: create a requirements file that pins versions. You should also look into using conda-env. It gives more flexibility in terms of obtaining packages from anaconda.org than the plain requirements file obtained from conda list.
Docs on conda-env: http://conda.pydata.org/docs/using/envs.html
Related
I have been trying to establish a pre-commit git hook to detect environment changes and create a new env.yml export automatically ... similar to the ones described here
Where I am having trouble is that the git hook is detecting an environment change with the pip package on every run of the pre-commit file. Is this possibly related to some scripts using different versions of pip?
If so, I don't understand why the same version isn't being exported every time I run conda env export > env.yml. It almost seems like it is randomly toggling between versions ... but I know there must be some rationale
conda and pip have their own versions of every package installed (provided that you have installed a certain app using both). anaconda (if it's what you're using) is also known for giving plenty of headaches even in simple cases when you pip install something instead of conda install and start mixing dependencies installed with either of those. The general advice is to be very careful about being consistent with each environment separately. In my personal experience, anaconda always tries to superimpose itself by by breaking dependencies managed by pip. In short, if you are using a conda env, make sure that you're using the dependencies installed by conda and conda only.
I exported a conda environment in this way:
conda env export > environment.yml´
Then commited and pulled the environment.yml file to the git repo.
From another computer I cloned the repo and then tried to create the conda environment:
conda env create -f environment.yml
First I got a warning:
Warning: you have pip-installed dependencies in your environment file,
but you do not list pip itself as one of your conda dependencies.
Conda may not use the correct pip to install your packages, and they
may end up in the wrong place. Please add an explicit pip dependency.
I'm adding one for you, but still nagging you
I don't know why conda export does not include pip in the environment definition.
Then I got errors like wrong/unavailable versions of packages:
es-core-news-sm==3.0.0 version not found
I just removed the version part and only left the name of the package and got it work with:
conda env update --prefix ./env --file environment.yml --prune
Here additional details:
I would like to know how can I avoid this behavior?
es-core-news-sm==3.0 does not exist on pypi, where only 3.1 and 2.3.1 are available, hence your error message.
This is of course something very specific to the environment that you have and the packages that you have installed. In your specific case, just removing the version can be a fix, but no guarantee that this will work in all cases.
As for the cause, I can only guess, but what I expect happened in your case is:
You installed es-core-news-sm==3.0 to your environment
The developers of that package created a newer version and decided to delete the old version
Exporting the environment does correctly state that it contains es-core-news-sm==3.0
Creating an environment from the .yaml from step 3 fails, because the packe is not available any longer (see 2.)
An alternative (depending on your usecase) coul;d be to use conda-pack, which can create a packed version of your environment that you can then unpack. This only works though if the OS on the source and target machine are the same
I currently have a python 3.7 installed by using anaconda on the machine. My intention is to create a lower version of python environment, say 3.6, for reason of compatibility. I follow the documentation to create the conda environment as conda create -n py36 python=3.6 However, this environment is a clean version of python, where many additional package like numpy, scipy are missing and these packages are already installed on python3.7. So what is the best way that I can create not only a python but also migrate all other packages in previous python version.(python3.7)
I understand the dependency may be different since some packages are not compatible with old version of python, but I still want to migrate as many packages as possible and let conda itself to decide the dependency tree. Current, what I can do is to first create a clean environment and manually conda install numpy and so on, which definitely not a good idea.
#Save all the info about previous env in requirements file
conda list -e > requirement.txt
then change the python version in the created 'requirement.txt' file
#the create new env from requirement file:
conda env create -f requirement.txt
I currently have scikit-learn 0.19 installed. I'd like to test my code using the latest development version as there seems to be a fix for Incremental PCA.
How do I go about installing this new version if I've previously installed scikit-learn using anaconda?
Also, how would I revert back to the stable release in the event that 0.20 does not solve my problem?
I am in need of some hand holding here, as I've read the docs on the website and not sure I completely understand the process (especially being able to revert back to the stable version if needed).
The whole point of the Anaconda Python distribution (apart from the convenience of having a bunch of useful packages included) is that you get the conda environment manager, which exists to meet exactly this sort of requirement.
What you want to do is to create a new conda environment by launching the Anaconda prompt and typing
conda create -n myenv scikit-learn other-package other-package2 etc
where myenv is the name you want to give the new environment and other-package other-package2 etc are the names of any other packages you will want to use (import) in your code. conda will figure out any dependencies of these packages and show you a list of what is going to be installed before it proceeds.
If you want to specify that a package should be a particular version, add that to the package name e.g. other-package=1.1.0, otherwise conda will install the latest versions of each package that are mutually compatible. You can also specify a particular version of Python by including it in the package list, e.g. python=3.4. You can check what versions of a package are available with conda search package-name (where package-name is the name of the package you want, obviously).
To run your code in the newly created environment, first activate the environment at the Anaconda prompt. If you use the Spyder IDE, launch it after activating the correct environment, or use the start menu shortcut specific to that environment if you have one. Other IDEs may have their own method of selecting a specific environment to work in.
To revert to the version(s) you were using before, activate the environment containing those versions - if you've never created a new environment before, that'll be root.
Just in case someone comes here looking for a solution without conda:
The website recommends that you download the latest code via
git clone git://github.com/scikit-learn/scikit-learn.git
and then include it in pip via (after changing to the directory)
pip install --editable .
You can also add the --user flag to have pip install to a local directory. Then, uninstalling should be as easy as pip uninstall sklearn.
I'm working on packaging up a suite of tools that can be installed in different environments, and I've run into many problems with dependencies, which are an issue since this package will be installed in air-gapped environments.
The package will be installed via Anaconda, and I have provided the installation script. In order to create the package, I ran the following command:
conda metapackage toolkit_bundle 0.0.1 --dependencies r-essentials tensorflow gensim spacy r-ggplot2 r-plotly r-dplyr r-rjson r-tm r-reshape2 r-shiny r-sparklyr r-slam r-nlp r-cluster r-ggvis r-plyr r-tidyr r-zoo r-magrittr r-xtable r-htmlwidgets r-formattable r-highcharter --summary "Toolkit Bundle"
This produced a tar.bzip2 file that I held on to and tried to install via the conda command
conda install toolkit_bundle.tar.bz2
The command seemed to run successfully, but I was unsuccessful in importing the modules in Python. I also tried creating a virtual conda environment and importing the package.
conda create -n myenv toolkit_bundle-0.0.1.tar.bz2
There was no error, but none of the modules were able to be imported either.
Am I missing a step in this process, or is my thought process flawed?
Update:
It looks like my thinking was pretty flawed. A quick skim of the conda metapackage command documentation revealed the following:
Tool for building conda metapackages. A metapackage is a package with no files, only metadata. They are typically used to collect several packages together into a single package via dependencies.
So my initial understanding was incorrect, and the package only contains metadata. Are there any other ideas for creating packages with dependencies resolved that can be installed in an air gapped environment?
I think you want to look at the command conda build for making packages, which just requires writing an appropriate meta.yaml file containing the dependencies, along with some other build parameters. There is good documentation for doing so on the conda website: https://conda.io/docs/user-guide/tasks/build-packages and there is a repo of examples.
If you have a working PIP package, you can also auto-generate a conda package recipe using conda skeleton.
Once you have built a set of packages locally, you can use the --use-local option to conda install to install from your local repo, with no need for an internet connection (as long as the packages for all the dependencies are in your local repo).
I was able to download the packages I needed via the pypi website, and after determining the dependencies, I manually downloaded them and wrote a script to install them in the required order.