How to add Python libraries to Docker image

How to add Python libraries to Docker image - python

Today I started working with Docker. So please bear with me. I'm not even sure if the title makes sense. I just installed Tensorflow using Docker and wanted to run a script. However, I got the following error saying that Matplotlib is not installed.
Traceback (most recent call last):
File "tf_mlp_v3.py", line 3, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'
I used the following command to install Tensorflow
docker pull tensorflow/tensorflow:latest-gpu-jupyter
How can I now add other python libraries such as Matplotlib to that image?

To customize an image you generally want to create a new one using the existing image as a base. In Docker it is extremely common to create custom images when existing ones don't quite do what you want. By basing your images off public ones you can add your own customizations without having to repeat (or even know) what the base image does.
Add the necessary steps to a new Dockerfile.
FROM tensorflow/tensorflow:latest-gpu-jupyter
RUN <extra install steps>
COPY <extra files>
RUN and COPY are examples of instructions you might use. RUN will run a command of your choosing such as RUN pip install matplotlib. COPY is used to add new files from your machine to the image, such as a configuration file.
Build and tag the new image. Give it a new name of your choosing. I'll call it my-customized-tensorflow, but you can name it anything you like.
Assuming the Dockerfile is in the current directory, run docker build:
$ docker build -t my-customized-tensorflow .
Now you can use my-customized-tensorflow as you would any other image.
$ docker run my-customized-tensorflow

Add this to your Dockerfile after pulling the image:
RUN python -m pip install matplotlib

There are multiple options:
Get into the container and install dependencies (be aware that this changes will be lost when recreating the container):
docker exec <your-container-id> /bin/bash
That should open an interactive bash. Then install the dependencies (pip or conda).
Another alternative, is to add it during build time (of the image). That is adding the instruction RUN into a Dockerfile
All dependencies are installed using python default tools (i.e: pip, conda)

As an alternative you can use '--user' to store the packages mounted folders
mkdir /path/to/local
mkdir /path/to/cache
Add these option to the docker command
--mount type=bind,source=/path/to/local,target=/.local --mount type=bind,source=/path/to/cache,target=/.cache
Then you can install packages using
pip install --user pandas
The packages will then be persistent without having to build and restart you docker every time.

Related

Airflow: Module not found

I am trying to schedule a python script in airflow.
When the script start to execute it fails to find specific mysql module.
I already installed mysql-connector.
So it should not be a problem.
Please help me here.

Since you are running airflow in a docker image, the python lib you use should be install in the docker image.
For example, if you are using the image apache/airflow, you can create a new docker image based on this image, with all the libs you need:
Dockerfile:
FROM apache/airflow
RUN pip install mysql-connector
Then you build the image:
docker build -t my_custom_airflow_image <path/to/Dockerfile>
Then you can replace apache/airflow in your docker-compose file by my_custom_airflow_image, and it will work without any problem.

GCP - AI Platform notebook schedule

I am relatively new to GCP and am trying to schedule a notebook on GCP to run everyday. This notebook has dependencies in terms of libraries and other python modules/scripts. When I schedule this with the Cloud Scheduler (as shown in image), there are errors shown in logs at import statements of libraries and while importing other python modules.
I also created a requirements.txt file, but the scheduler doesn't seem to be reading it.
Am I doing something wrong?
Can anyone help or guide me with some possible solutions? Been stuck with this since a few days, any help would be highly appreciated.
PS- Cloud Functions would be by last option incase I'm not able to run this way.

The problem is that we have 2 different environments:
Notebook document itself
Docker container that Notebook Executor uses when you click on Execute: a Docker container is passed to Executor backend (Notebooks API + Vertex Custom Job) and since you are installing the dependencies in the Notebook itself (Managed Notebook underlying infra), these are not included in the container, hence this fails. You need to pass a container that includes Selenium.
If you need to build a custom container I would do the following:
Create a custom container
# Dockerfile.example
FROM gcr.io/deeplearning-platform-release/tf-gpu:latest
RUN pip install -y selenium
Then you’ll need to build and push it somewhere accessible.
PROJECT="my-gcp-project"
docker build . -f Dockerfile.example -t "gcr.io/${PROJECT}/tf-custom:latest"
gcloud auth configure-docker
docker push "gcr.io/${PROJECT}/tf-custom:latest"
Specify the container when launching the Execution "Custom Container"

The error means that you are missing the selenium module, you need to install it. You can use the following commands to install it:
python -m pip install -U selenium (you need pip installed)
pip install selenium
or depending on your permissions:
sudo pip install selenium
For python3:
sudo pip3 install selenium
Edit 1:
If you have selenium installed, check where you have Python located and where the Python looks for libraries/packages, including the ones installed using pip. Sometimes Python runs from a location, but looks for libraries in a different location. Make sure Python is looking for the libraries in the right directory.
Here is an answer that you can use to check if Python is configured correctly.

How to create a new docker image based on an existing image, but including more Python packages?

Let's say I've pulled the NVIDIA NGC PyTorch docker image like this:
docker pull nvcr.io/nvidia/pytorch:21.07-py3
Then I want to add these python packages: omegaconf wandb pycocotools?
How do I create a new Docker image with both the original Docker image and the additional Python packages?
Also, how do I distribute the new image throughout my organization?

Create a file named Dockerfile. Add to it the lines explained below.
Add a FROM line to specify the base image:
FROM nvcr.io/nvidia/pytorch:21.07-py3
Upgrade Pip to the latest version:
RUN python -m pip install --upgrade pip
Install the additional Python packages that you need:
RUN python -m pip install omegaconf wandb pycocotools
Altogether, the Dockerfile looks like this:
FROM nvcr.io/nvidia/pytorch:21.07-py3
RUN python -m pip install --upgrade pip
RUN python -m pip install omegaconf wandb pycocotools
In the same directory as the Dockerfile, run this command to build the new image, replacing my-new-image with a name of your choosing:
docker build -t my-new-image .
This works for me, but Pip generates a warning about installing packages as the root user. I found it best to ignore this warning. See the note at the end of this answer to understand why.
The new docker image should now appear on your system:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my-new-image latest 082f76972805 13 seconds ago 15.1GB
nvcr.io/nvidia/pytorch 21.07-py3 7beec3ff8d35 5 weeks ago 15GB
[...]
You can now run the new image ..
$ docker run --gpus all -it --rm --ipc=host my-new-image
.. and verify that it has the additional Python packages:
# python -m pip list | grep 'omegaconf\|wandb\|pycocotools'
omegaconf 2.1.1
pycocotools 2.0+nv0.5.1
wandb 0.12.1
The Docker Hub Repositories documentation details the steps necessary to:
Create a repository (possibly private)
Push an image
Add collaborators
Pull the image from the respository
NOTE: The problem of non-root users: Although it is considered "best practices" not to run a Docker container as the root Docker user, in practice non-root users can add several complications.
You could create a non-root user in your docker file with lines like this:
RUN useradd -ms /bin/bash myuser
USER myuser
ENV PATH "$PATH:/home/myuser/.local/bin"
However, if you run the container with mounted volumes using the -v flag, then myuser will be conferred access to those volumes based on whether their userid or groupid matches a user or group in the host system. You can modify the useradd commandline to specify the desired userid or groupid, but of course the resulting image will not be portable to systems that have different ids.
Additionally, there appears to be a limitation that prevents a non-root user from accessing a mounted volume that points to an fscrypt encrypted folder. However, this works fine for me with the root docker user.
For these reasons, I found it easiest to just let the container run as root.

Heroku container:push always re-installs conda packages

I've followed the python-miniconda tutorial offered by Heroku in order to create my own ML server on Python, which utilizes Anaconda and its packages.
Everything seems to be in order, however each time I wish to update the scripts located at /webapp by entering
heroku container:push
A complete re-installation of the pip (or rather, Conda) dependencies is performed, which takes quite some time and seems illogical to me. My understanding of both Docker and Heroku frameworks is very shaky, so I haven't been able to find a solution which allows me to push ONLY my code while leaving the container as is without (re?)uploading an entire image.
Dockerfile:
FROM heroku/miniconda
ADD ./webapp/requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt
ADD ./webapp /opt/webapp/
WORKDIR /opt/webapp
RUN conda install scikit-learn
RUN conda install opencv
CMD gunicorn --bind 0.0.0.0:$PORT wsgi

This happens because once you updated the webapp directory, you invalidate the build cache. Whatever after this line needs to be rebuild.
When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.
Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used. (docs)
Hence, to take advantage of the build cache your Dockerfile needs to be defined such as
FROM heroku/miniconda
RUN conda install scikit-learn opencv
ADD ./webapp /opt/webapp/
RUN pip install -qr /opt/webapp/requirements.txt
WORKDIR /opt/webapp
CMD gunicorn --bind 0.0.0.0:$PORT wsgi
You should merge the two RUN conda commands to a single statement, to reduce number of layers in the image. Also, merge the ADD into single command and run pip requirements from a different directory.

What is the purpose of running a django application in a virtualenv inside a docker container?

What is the purpose of virtualenv inside a docker django application? Python and other dependencies are already installed, but at the same time it's necessary to install lots of packages using pip, so it seems that conflict is still unclear.
Could you please explain the concept?
EDIT: Also, for example. I'v created virtualenv inside docker django app and recently installed pip freeze djangorestframework and added it to installed in settings.py but docker-compose up raises error . No module named rest_framework.Checked, everything is correct.Docker/virtualenv conflict ? May it be?

Docker and containerization might inspire the illusion that you do not need a virtual environment. distutil's Glpyh makes a very compelling argument against this misconception in this pycon talk.
The same fundamental aspects of virtualenv advantages apply for a container as they do for a non-containerized application, because fundamentally you're still running a linux distribution.
Debian and Red Hat are fantastically complex engineering projects.
Integrating billions of lines of C code.For example, you can just apt install libavcodec. Or yum install ffmpeg.
Writing a working build
system for one of those things is a PhD thesis. They integrate
thousands of Python packages simultaneously into one working
environment. They don't always tell you whether their tools use Python
or not.
And so, you might want to docker exec some tools inside a
container, they might be written in Python, if you sudo pip install
your application in there, now it's all broken.
So even in containers, isolate your application code from the system's
Regardless of whether you're using docker or not you should always run you application in a virtual environment.
Now in docker in particular using a virtualenv is a little trickier than it should be. Inside docker each RUN command runs in isolation and no state other than file system changes are kept from line to line. To install to a virutalenv you have to prepend the activation command on every line:
RUN apt-get install -y python-virtualenv
RUN virtualenv /appenv
RUN . /appenv/bin/activate; \
pip install -r requirements.txt
ENTRYPOINT . /appenv/bin/activate; \
run-the-app

A virtualenv is there for isolating the packages to a specific environment. Docker is also there to isolate the settings to a specific environment. So in essence if you use docker there isn't much benefit of using virtualenv too.
Just pip install thing into the docker environment directly it'll do no harm. To pip install the requirements use the dockerfile where you can execute commands.
You can find a pseudo code example below.
FROM /path/to/used/docker/image
RUN pip install -r requirements.txt

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.