Creating and installing Conda packages into Virtual Envs - python

I'm working on packaging up a suite of tools that can be installed in different environments, and I've run into many problems with dependencies, which are an issue since this package will be installed in air-gapped environments.
The package will be installed via Anaconda, and I have provided the installation script. In order to create the package, I ran the following command:
conda metapackage toolkit_bundle 0.0.1 --dependencies r-essentials tensorflow gensim spacy r-ggplot2 r-plotly r-dplyr r-rjson r-tm r-reshape2 r-shiny r-sparklyr r-slam r-nlp r-cluster r-ggvis r-plyr r-tidyr r-zoo r-magrittr r-xtable r-htmlwidgets r-formattable r-highcharter --summary "Toolkit Bundle"
This produced a tar.bzip2 file that I held on to and tried to install via the conda command
conda install toolkit_bundle.tar.bz2
The command seemed to run successfully, but I was unsuccessful in importing the modules in Python. I also tried creating a virtual conda environment and importing the package.
conda create -n myenv toolkit_bundle-0.0.1.tar.bz2
There was no error, but none of the modules were able to be imported either.
Am I missing a step in this process, or is my thought process flawed?
Update:
It looks like my thinking was pretty flawed. A quick skim of the conda metapackage command documentation revealed the following:
Tool for building conda metapackages. A metapackage is a package with no files, only metadata. They are typically used to collect several packages together into a single package via dependencies.
So my initial understanding was incorrect, and the package only contains metadata. Are there any other ideas for creating packages with dependencies resolved that can be installed in an air gapped environment?

I think you want to look at the command conda build for making packages, which just requires writing an appropriate meta.yaml file containing the dependencies, along with some other build parameters. There is good documentation for doing so on the conda website: https://conda.io/docs/user-guide/tasks/build-packages and there is a repo of examples.
If you have a working PIP package, you can also auto-generate a conda package recipe using conda skeleton.
Once you have built a set of packages locally, you can use the --use-local option to conda install to install from your local repo, with no need for an internet connection (as long as the packages for all the dependencies are in your local repo).

I was able to download the packages I needed via the pypi website, and after determining the dependencies, I manually downloaded them and wrote a script to install them in the required order.

Related

why does pip export with a different version on subsequent conda env exports?

I have been trying to establish a pre-commit git hook to detect environment changes and create a new env.yml export automatically ... similar to the ones described here
Where I am having trouble is that the git hook is detecting an environment change with the pip package on every run of the pre-commit file. Is this possibly related to some scripts using different versions of pip?
If so, I don't understand why the same version isn't being exported every time I run conda env export > env.yml. It almost seems like it is randomly toggling between versions ... but I know there must be some rationale
conda and pip have their own versions of every package installed (provided that you have installed a certain app using both). anaconda (if it's what you're using) is also known for giving plenty of headaches even in simple cases when you pip install something instead of conda install and start mixing dependencies installed with either of those. The general advice is to be very careful about being consistent with each environment separately. In my personal experience, anaconda always tries to superimpose itself by by breaking dependencies managed by pip. In short, if you are using a conda env, make sure that you're using the dependencies installed by conda and conda only.

Is there a way to package up conda dependencies as a zip file

I'm trying to use conda for packaging dependencies and environment creation for python packages. However, I have a hard restriction that I have to do this in environments that are offline and can't go into conda-forge to grab the dependencies during the conda package installation.
So the solution that I've currently come up with is in an environment that has access online, I will create a conda environment and install my conda package along with its dependencies. I've set my .condarc so that the pkg_dirs is set to "./pkgs/win-64". This way the dependency package .tar.bz2 files are captured in that directory. I then zip up the contents of the "./pkgs/win-64" to a file called condaPkgDependencies.zip and pass that to the environments that are offline and don't have access to conda-forge. I extract the condaDependencies.zip to some folder path. And then I run conda index (https://conda.io/projects/conda-build/en/latest/resources/commands/conda-index.html) to create a local channel and point to that local channel when trying to run the conda install on that offline environment.
I was wondering if there was a much cleaner way of handling for this. If I use the default pkg_dirs location I believe that ALL dependencies are stored there so packaging up the default location will contain some conda dependencies I do not need.
I've also looked at https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#sharing-an-environment but I don't think this solves my issue of packaging up dependencies for offline mode installation.
Here is what I've added to my .condarc
pkgs_dirs:
- ./pkgs/win-64

Scikit-learn - installing development version (0.20)

I currently have scikit-learn 0.19 installed. I'd like to test my code using the latest development version as there seems to be a fix for Incremental PCA.
How do I go about installing this new version if I've previously installed scikit-learn using anaconda?
Also, how would I revert back to the stable release in the event that 0.20 does not solve my problem?
I am in need of some hand holding here, as I've read the docs on the website and not sure I completely understand the process (especially being able to revert back to the stable version if needed).
The whole point of the Anaconda Python distribution (apart from the convenience of having a bunch of useful packages included) is that you get the conda environment manager, which exists to meet exactly this sort of requirement.
What you want to do is to create a new conda environment by launching the Anaconda prompt and typing
conda create -n myenv scikit-learn other-package other-package2 etc
where myenv is the name you want to give the new environment and other-package other-package2 etc are the names of any other packages you will want to use (import) in your code. conda will figure out any dependencies of these packages and show you a list of what is going to be installed before it proceeds.
If you want to specify that a package should be a particular version, add that to the package name e.g. other-package=1.1.0, otherwise conda will install the latest versions of each package that are mutually compatible. You can also specify a particular version of Python by including it in the package list, e.g. python=3.4. You can check what versions of a package are available with conda search package-name (where package-name is the name of the package you want, obviously).
To run your code in the newly created environment, first activate the environment at the Anaconda prompt. If you use the Spyder IDE, launch it after activating the correct environment, or use the start menu shortcut specific to that environment if you have one. Other IDEs may have their own method of selecting a specific environment to work in.
To revert to the version(s) you were using before, activate the environment containing those versions - if you've never created a new environment before, that'll be root.
Just in case someone comes here looking for a solution without conda:
The website recommends that you download the latest code via
git clone git://github.com/scikit-learn/scikit-learn.git
and then include it in pip via (after changing to the directory)
pip install --editable .
You can also add the --user flag to have pip install to a local directory. Then, uninstalling should be as easy as pip uninstall sklearn.

How to get my code dependencies from a conda env?

I wrote some Python code I would like to package to be easily installed from pip and conda.
The code runs in a conda environment containing all its dependencies.
For both pip and conda it seems that I need to write a setup.py file with a install_requires variable to set the dependencies. For pip I also need a requirement.txt with these dependencies.
conda list --export gives me a way to export the environment with everything including libraries, cython and ipython:
alabaster=0.7.3=py27_0
babel=1.3=py27_0
backports_abc=0.4=py27_0
cairo=1.12.18=3
cffi=0.9.2=py27_0
cython=0.23.4=py27_0
decorator=4.0.4=py27_0
docutils=0.12=py27_0
flake8=2.3.0=py27_0
fontconfig=2.11.1=3
...
but how can I get only the Python packages my code depends on? Shall I go through all my imports? In that case how would I manage the dependencies of the dependencies?

Where does Conda save the compiled libraries associated with Conda environments?

I've written an Apache Spark python script, and for compatibility reasons I need to pass a specific version of scikit-learn when I submit the Spark job to the cluster. The problem is I'm not sure where I can get a copy of the scikit-learn binary/executable that I need. I don't think it's as straight forward as downloading the scikit-learn source and compiling myself because I've heard scikit-learn has a lot of dependencies, and I'm not sure which dependencies I need for the version I require. I was thinking I could create a Conda environment with the specific scikit-learn version I need so that Conda could do all the compiling for me, but I'm not sure where Conda saves the libraries that it builds. I tried checking under the default venv folder, but didn't see anything promising.
Conda takes care of the dependencies. Just pass the version to Conda:
$ conda install scikit-learn=0.16.1
If you want the exact version of every package, you can do the following:
$ conda list -e > requirements.txt
You then create a new environment as follows:
$ conda create -n my_environment --file requirements.txt
Packages are stored in the (prefix)/pkgs folder, before being extracted. Extracted files can live lots of places in the prefix - just whatever the package specifies. You can ship the package tarballs around if necessary, and install from them directly (specify them as arguments conda install). However, it really is nicer to do what Alexander suggested here: create a requirements file that pins versions. You should also look into using conda-env. It gives more flexibility in terms of obtaining packages from anaconda.org than the plain requirements file obtained from conda list.
Docs on conda-env: http://conda.pydata.org/docs/using/envs.html

Categories

Resources