Miniconda with dockerfile, how to use the conda environment

Miniconda with dockerfile, how to use the conda environment - python

Goal: create a docker image from miniconda that will install all my dependencies and then run some commands for django and other packages. Also every time someone bin/bash into the container it should start with those packages available without me adding an entrypoint and do env hacks there.
Dockerfile:
FROM continuumio/miniconda3
ADD environment.yml /code/
WORKDIR /code/
RUN conda env create -f environment.yml # successful
RUN python test/manage.py 8000 # fails, no dependencies like pandas installed
But now I'm stuck, say I want to run some commands in the created environment:
RUN python manage.py runserver
it doesn't run it in my environment.
Some ugly hacks here: https://github.com/ContinuumIO/docker-images/issues/89 that don't actually work because you're using a new shell session when you enter a container or do another RUN command so you have to concatenate the commands with && (ugly).
Ideally I want to install all my conda packages globally from environment.yml but apparently I can't do that.

You have to tell docker where is the python version managed by conda, and inside docker this is done with conda run
conda run --no-capture-output -n myenv python run.py
Source
https://pythonspeed.com/articles/activate-conda-dockerfile/

Related

New Dockerfile can't use .env file to read environment variables in Python

I'm in the process of changing my dockerfile for a python script of mine from using a requirements.txt file to using pipenv & multi stage builds. I store the env variables in a .env file and then import os and use the os.environ.get function to read them. I'm having issues with environment variables getting to work with my new dockerfile. Below are examples of what's going on.
1st dockerfile where os.environ.get('my_env_var') works in the python script and returns a value
FROM python:3.8-slim as builder
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python3", "app.py"]
2nd dockerfile where os.environ.get('my_env_var') doesn't work and returns None for everything. I've entered the container and confirmed the .env file is still there and in the same directory as the app.py file by doing a ls -a in the container and i ran cat .env to ensure the values are still there.
I've copied most of this code from a tutorial post, it seems to work fine for my use case.
FROM python:3.8-slim as base
# Setup env
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONFAULTHANDLER 1
FROM base AS python-deps
# Install pipenv and compilation dependencies
RUN pip install pipenv
RUN apt-get update && apt-get install -y --no-install-recommends gcc
# Install python dependencies in /.venv
COPY Pipfile .
COPY Pipfile.lock .
RUN PIPENV_VENV_IN_PROJECT=1 pipenv install --deploy
FROM base AS runtime
# Copy virtual env from python-deps stage
COPY --from=python-deps /.venv /.venv
ENV PATH="/.venv/bin:$PATH"
# Create and switch to a new user
RUN useradd --create-home appuser
WORKDIR /home/appuser
USER appuser
# Install application into container
COPY . .
# Run the application
CMD ["python3", "app.py"]
I'm not familiar enough with Linux/dockerfiles/env variables to know if I'm like resetting something that would make it so i can no longer automatically use the .env file to read in env variables with the os.environ.get function. Maybe it's getting screwed up because there are ENV commands in there? If anyone has any info I'd appreciate it!

As far as I know ENV variables are set in the run command independently if the .env file is present in the image or not.
When running docker image, you should specify which .env file you wish to have: docker run --env-file=path/to/env/file image:tag
You could also set up individual env variables: --env ENVVARIABLE1=foobar
Similarly if you run with docker-compose, use env_file

Activate conda env within running Docker container?

Is there a way to spin up a Docker container and then activate a given conda environment within the container using a Python script? I don't have access to the Dockerfile of the image I'm using.

conda run
Conda includes a conda run command for running arbitrary commands within an environment.
docker run <image> "conda run -n <your_env> python <script.py>"
If the script requires interaction, you may need the --live-stream and/or --no-capture-output arguments (see conda run -h).

I think if you load right docker image there is no need to activate conda in your dockerfile try this image below
FROM continuumio/miniconda3
RUN conda info
This worked for me.

Why are there differences in container contents depending on whether I `docker run ...`or `docker-compose run ...`?

I'm experiencing differences with the contents of a container depending on whether I open a bash shell via docker run -i -t <container> bash or docker-compose run <container> bash and I don't know/understand how this is possible.
To aid in the explanation, please see this screenshot from my terminal. In both instances, I am running the image called blaze which has been built from the Dockerfile in my code. One of the steps during the build is to create a virutalenv called venv, however when I open a bash shell via docker-compose this virtualenv doesn't seem to exist unlike when I run docker run ....
I am relatively new to setting up my own builds with Docker, but surely if they are both referencing the same image, the output of ls within a bash shell should be the same? I would greatly appreciate any help or guidance to resources that would explain what exactly is going wrong here...
As an additional point, running docker images shows that both commands must be using the same image...
Thanks in advance!
This is my Dockerfile:
FROM blaze-base-image:latest
# add an URL that PIP automatically searches (e.g., Azure Artifact Store URL)
ARG INDEX_URL
ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
# Copy source code to docker image
RUN mkdir /opt/app
COPY . /opt/app
RUN ls /opt/app
# Install Blaze pip dependencies
WORKDIR /opt/app
RUN python3.7 -m venv /opt/app/venv
RUN /opt/app/venv/bin/python -m pip install --upgrade pip
RUN /opt/app/venv/bin/python -m pip install keyring artifacts-keyring
RUN touch /opt/app/venv/pip.conf
RUN echo $'[global]\nextra-index-url=https://www.index.com' > /opt/app/venv/pip.conf
RUN /opt/app/venv/bin/python -m pip install -r /opt/app/requirements.txt
RUN /opt/app/venv/bin/python -m spacy download en_core_web_sm
# Comment
CMD ["echo", "Container build complete"]
And this is my docker-compose.yml:
version: '3'
services:
blaze:
build: .
image: blaze
volumes:
- .:/opt/app

There are two intersecting things going on here:
When you have a Compose volumes: or docker run -v option mounting host content over a container directory, the host content completely replaces what's in the image. If you don't have a ./venv directory on the host, then there won't be a /opt/app/venv directory in the container. That's why, when you docker-compose run blaze ..., the virtual environment is missing.
If you docker run a container, the only options that are considered are those in that specific docker run command. docker run doesn't know about the docker-compose.yml file and won't take options from there. That means there isn't this volume mount in the docker run case, which is why the virtual environment reappears.
Typically in Docker you don't need a virtual environment at all: the Docker image is isolated from other images and Python installations, and so it's safe and normal to install your application into the "system" Python. You also typically want your image to be self-contained and not depend on content from the host, so you wouldn't generally need the bind mount you show.
That would simplify your Dockerfile to:
FROM blaze-base-image:latest
# Any ARG will automatically appear as an environment variable to
# RUN directives; this won't be needed at run time
ARG PIP_EXTRA_INDEX_URL
# Creates the directory if it doesn't exist
WORKDIR /opt/app
# Install the Python-level dependencies
RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install -r requirements.txt
# The requirements.txt file should list every required package
# Install the rest of the application
COPY . .
# Set the main container command to run the application
CMD ["./app.py"]
The docker-compose.yml file can be similarly simplified to
version: '3.8' # '3' means '3.0'
services:
blaze:
build: .
# Compose picks its own image name
# Do not need volumes:, the image is self-contained
and then it will work consistently with either docker run or docker-compose run (or docker-compose up).

Activate conda environment in singularity container from dockerfile

I'm trying to set up a singularity container from an existing docker image in which a conda environment named "tensorflow" is activated as soon as I run the container. I've found some answers on this topic here. Unfortunately, in this post they only explain how they would set up the the singularity .def file to activate the conda environment by default. However, I want to modify my existing Dockerfile only and then build a singularity image from it.
What I've tried so far is setting up the Dockerfile like this:
FROM opensuse/tumbleweed
ENV PATH /opt/conda/bin:$PATH
ENV PATH /opt/conda/envs/tensorflow/bin:$PATH
# Add conda environment files (.yml)
COPY ["./conda_environments/", "."]
# Install with zypper
RUN zypper install -y sudo wget bzip2 vim tree which util-linux
# Get installation file
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh -O ~/anaconda.sh
# Install anaconda at /opt/conda
RUN /bin/bash ~/anaconda.sh -b -p "/opt/conda"
# Remove installation file
RUN rm ~/anaconda.sh
# Make conda command available to all users
RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh
# Create tensorflow environment
RUN conda env create -f tensorflow.yml
# Activate conda environment with interactive bash session
RUN echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
RUN echo "conda activate tensorflow" >> ~/.bashrc
# Default command
CMD ["/bin/bash"]
After building the docker image I run the docker container with:
docker run -t -d --rm --name=my_container opensuse_conda:latest
and enter the container with:
docker exec -it my_container bash
The result is as expected. The shell session is started directly with the "tensorflow" environment being active which is indicated by the (tensorflow) prefix.
To build a singularity image from this docker image I use:
sudo singularity build opensuse_conda.sif docker-daemon://opensuse_conda:latest
and run the container with:
sudo singularity run opensuse_conda.sif
This is where the problem occurs. Instead of the "tensorflow" environment the "base" environment is activated by default. However, I would rather have the "tensorflow" environment being activated when I run the singularity container.
How can I modify my Dockerfile so that when running both the docker container and the singularity container the default environment is "tensorflow"?
Thank you very much for your help!

Your problem is that .bashrc will only be read when you start an interactive shell, but not when the container is running with the default command. See this answer for background information.
There are a bunch of bash startup files where you could put the conda activate tensorflow command in instead. I recommend to define a file of your own, and put the filename into the BASH_ENV environment variable. Both can easily be done from the Dockerfile.

run a Python script with Conda dependencies on a Docker container [duplicate]

Scenario
I'm trying to setup a simple docker image (I'm quite new to docker, so please correct my possible misconceptions) based on the public continuumio/anaconda3 container.
The Dockerfile:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -n testenv pip -y \
&& source activate testenv \
&& conda env list
Building and image from this by docker build -t test . ends with the error:
/bin/sh: 1: source: not found
when activating the new virtual environment.
Suggestion 1:
Following this answer I tried:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda env list \
&& conda create -y -n testenv pip \
&& /bin/bash -c "source activate testenv" \
&& conda env list
This seems to work at first, as it outputs: prepending /opt/conda/envs/testenv/bin to PATH, but conda env list as well ass echo $PATH clearly show that it doesn't:
[...]
# conda environments:
#
testenv /opt/conda/envs/testenv
root * /opt/conda
---> 80a77e55a11f
Removing intermediate container 33982c006f94
Step 3 : RUN echo $PATH
---> Running in a30bb3706731
/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
The docker files work out of the box as a MWE.
I appreciate any ideas. Thanks!

Using the docker ENV instruction it is possible to add the virtual environment path persistently to PATH. Although this does not solve the selected environment listed under conda env list.
See the MWE:
FROM continuumio/anaconda3:latest
# update conda and setup environment
RUN conda update conda -y \
&& conda create -y -n testenv pip
ENV PATH /opt/conda/envs/testenv/bin:$PATH
RUN echo $PATH
RUN conda env list

Method 1: use SHELL with a custom entrypoint script
EDIT: I have developed a new, improved approach which better than the "conda", "run" syntax.
Sample dockerfile available at this gist. It works by leveraging a custom entrypoint script to set up the environment before execing the arguments of the RUN stanza.
Why does this work?
A shell is (put very simply) a process which can act as an entrypoint for arbitrary programs. exec "$#" allows us to launch a new process, inheriting all of the environment of the parent process. In this case, this means we activate conda (which basically mangles a bunch of environment variables), then run /bin/bash -c CONTENTS_OF_DOCKER_RUN.
Method 2: SHELL with arguments
Here is my previous approach, courtesy of Itamar Turner-Trauring; many thanks to them!
# Create the environment:
COPY environment.yml .
RUN conda env create -f environment.yml
# Set the default docker build shell to run as the conda wrapped process
SHELL ["conda", "run", "-n", "vigilant_detect", "/bin/bash", "-c"]
# Set your entrypoint to use the conda environment as well
ENTRYPOINT ["conda", "run", "-n", "myenv", "python", "run.py"]
Modifying ENV may not be the best approach since conda likes to take control of environment variables itself. Additionally, your custom conda env may activate other scripts to further modulate the environment.
Why does this work?
This leverages conda run to "add entries to PATH for the environment and run any activation scripts that the environment may contain" before starting the new bash shell.
Using conda can be a frustrating experience, since both tools effectively want to monopolize the environment, and theoretically, you shouldn't ever need conda inside a container. But deadlines and technical debt being a thing, sometimes you just gotta get it done, and sometimes conda is the easiest way to provision dependencies (looking at you, GDAL).

Piggybacking on ccauet's answer (which I couldn't get to work), and Charles Duffey's comment about there being more to it than just PATH, here's what will take care of the issue.
When activating an environment, conda sets the following variables, as well as a few that backup default values that can be referenced when deactivating the environment. These variables have been omitted from the Dockerfile, as the root conda environment need never be used again. For reference, these are CONDA_PATH_BACKUP, CONDA_PS1_BACKUP, and _CONDA_SET_PROJ_LIB. It also sets PS1 in order to show (testenv) at the left of the terminal prompt line, which was also omitted. The following statements will do what you want.
ENV PATH /opt/conda/envs/testenv/bin:$PATH
ENV CONDA_DEFAULT_ENV testenv
ENV CONDA_PREFIX /opt/conda/envs/testenv
In order to shrink the number of layers created, you can combine these commands into a single ENV command setting all the variables at once as well.
There may be some other variables that need to be set, based on the package. For example,
ENV GDAL_DATA /opt/conda/envs/testenv/share/gdal
ENV CPL_ZIP_ENCODING UTF-8
ENV PROJ_LIB /opt/conda/envs/testenv/share/proj
The easy way to get this information is to call printenv > root_env.txt in the root environment, activate testenv, then call printenv > test_env.txt, and examine
diff root_env.txt test_env.txt.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.