How to build Docker images quicker

How to build Docker images quicker - python

I'm currently building a docker image and running the container to run some tests in it for a Python application I'm working on. Currently the Dockerfile copies the files over from the host machine, sets the working directory to those copied files, runs a sudo apt-get and installs pip, and finally runs the tests from setup.py. The Dockerfile can be seen below.
FROM ubuntu
ADD . /home/dev/ProjectName
WORKDIR /home/dev/ProjectName
RUN apt-get update && \
apt-get install -y python3-pip && \
python3 setup.py test
I was curious if there were a more conventional way to avoid having to run the apt-get and apt-get install pip every time I'd like to run a test. The main idea I had was to build an image with pip already on it, and then build this image from that one.

Docker builds using cached layers if it can. By adding files you have changed it invalidates the cache for all subsequent rules. Put the apt commands first and those will only be run the first time you build. See this blog for more info.

Related

Python script with numpy in dockerfile on Raspberry PI - libf77blas.so.3 error

I want to run a Docker container on my Raspberry PI 2 with a Python script that uses numpy. For this I have the following Dockerfile:
FROM python:3.7
COPY numpy_script.py /
RUN pip install numpy
CMD ["python", "numpy_script.py"]
But when I want to import numpy, I get the error message that libf77blas.so.3 was not found.
I have also tried to install numpy with a wheel from www.piwheels.org, but the same error occurs.
The Google search revealed that I need to install the liblapack3. How do I need to modify my Dockerfile for this?
Inspired by the answer of om-ha, this worked for me:
FROM python:3.7
COPY numpy_script.py /
RUN apt-get update \
&& apt-get -y install libatlas-base-dev \
&& pip install numpy
CMD ["python", "numpy_script.py"]

Working Dockerfile
# Python image (debian-based)
FROM python:3.7
# Create working directory
WORKDIR /app
# Copy project files
COPY numpy_script.py numpy_script.py
# RUN command to update packages & install dependencies for the project
RUN apt-get update \
&& apt-get install -y \
&& pip install numpy
# Commands to run within the container
CMD ["python", "numpy_script.py"]
Explanation
You have an extra trailing \ in your dockerfile, this is used for multi-line shell commands actually. You can see this used in-action here. I used this in my answer above. Beware the last shell command (in this case pip) does not need a trailing \, a mishap that was done in the code you showed.
You should probably use a working directory via WORKDIR /app
Run apt-get update just to be sure everything is up-to-date.
It's recommended to group multiple shell commands within ONE RUN directive using &&. See best practices and this discussion.
Resources sorted by order of use in this dockerfile
FROM
WORKDIR
COPY
RUN
CMD

Flask, React and Docker: Non-Zero Codes

I am trying to follow the Flask/React tutorial here, on a plain Windows machine.
On Windows 10, without considering Docker, I have the tutorial working.
On Windows 10 under a docker system (ubuntu-based containers and docker-compose), I do not:
The React server works under the docker.
The Flask server won't successfully build.
The Dockerfile for the Flask server is:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
RUN pip3 install flask
#RUN pip3 install venv
RUN mkdir -p /app
WORKDIR /app
COPY . /app
#RUN python3 -m venv venv
RUN cd api/venv/Scripts
RUN flask run --no-debugger
This fails at the very last line:
The command '/bin/sh -c flask run --no-debugger' returned a non-zero code: 1
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time. The venv commands are commented out because I'm not even sure venv makes sense in a docker (but what would I know?) and also because the pip3 install venv command halts with a non-zero code:2.
Any advice is welcome.

There are two obvious issues in the Dockerfile you show.
Each RUN command runs in a clean environment starting from the last known state of the image. Settings like the current directory (and also environment variable values) are not preserved when a RUN command exits. So RUN cd ... starts the RUN command from the old directory, changes to the new directory, and then doesn't remember that; the following RUN command starts again from the old directory. You need the WORKDIR directive to actually change directories.
The RUN commands also run during the build phase. They won't publish network ports or have access to databases; in a multi-container Compose setup they can't connect to other containers. You probably want to run the Flask app as the main container CMD.
So you can update your Dockerfile to look like:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
WORKDIR /app # Creates the directory as well
COPY requirements.txt ./ # Includes "flask"
RUN pip install -r requirements.txt
COPY . ./
WORKDIR /app/api/venv/Scripts # Not `RUN cd ...`
CMD flask run --no-debugger # Not `RUN ...`
It is in fact common to just not use a virtual environment in Docker; the Docker image is isolated from any other Python installation and so it's safe to use the "system" Python package tree. (I am a little suspicious of the venv directory in there, since virtual environments can't be transplanted into other setups very well.)
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time.
Put Docker away for another day. It's not necessary, especially during the development phase of your application. If you read through SO questions there are a lot of questions trying to contort Docker into acting just like a local development environment, where it's really not designed for it. There's nothing wrong with locally installing the tools you need to do your job, especially when they're very routine tools like Python and Node.

I believe that flask can't find your app when you run your docker (especially as the docker build attempts to run it). If you want to use the docker only for the purpose of running your app through that docker, use CMD in the dockerfile, thus when running the docker image, it will start your flask app first thing.

how to use multiple dockerfiles for single application

I am using a centos os base image and installing python3 with the following dockerfile
FROM centos:7
ENV container docker
ARG USER=dsadmin
ARG HOMEDIR=/home/${USER}
RUN yum clean all \
&& yum update -q -y -t \
&& yum install file -q -y
RUN useradd -s /bin/bash -d ${HOMEDIR} ${USER}
RUN export LC_ALL=en_US.UTF-8
# install Development Tools to get gcc
RUN yum groupinstall -y "Development Tools"
# install python development so that pip can compile packages
RUN yum -y install epel-release && yum clean all \
&& yum install -y python34-setuptools \
&& yum install -y python34-devel
# install pip
RUN easy_install-3.4 pip
# install virtualenv or virtualenvwrapper
RUN pip3 install virtualenv \
&& pip3 install virtualenvwrapper \
&& pip3 install pandas
# # install django
# RUN pip3 install django
USER ${USER}
WORKDIR ${HOMEDIR}
I build and tag the above as follows:
docker build . --label validation --tag validation
I then need to add a .tar.gz file to the home directory. This file contains all the python scripts I maintain. This file will change frequently. If I add it to the dockerfile above, python is installed every time I change the .gz file. This adds a lot of time to development. As a workaround, I tried creating a second dockerfile file that uses the above image as the base and then just adds the .tar.gz file on it.
FROM validation:latest
ARG USER=dsadmin
ARG HOMEDIR=/home/${USER}
ADD code/validation_utility.tar.gz ${HOMEDIR}/.
USER ${USER}
WORKDIR ${HOMEDIR}
After that if I run docker image and do an ls, all the files in the folder have a owner of games.
-rw-r--r-- 1 501 games 35785 Nov 2 21:24 Validation_utility.py
To fix the above, I added the following lines to the second docker file:
ADD code/validation_utility.tar.gz ${HOMEDIR}/.
RUN chown -R ${USER}:${USER} ${HOMEDIR} \
&& chmod +x ${HOMEDIR}/Validation_utility.py
but I get the error:
chown: changing ownership of '/home/dsadmin/Validation_utility.py': Operation not permitted
The goal is to have two docker files. The users will run the first docker file to install centos and python dependencies. The second dockerfile will install the custom python scripts. If the scripts change, they will just run the second docker file again. Is that the right way to think about docker? Thank you.

Is that the right way to think about docker?
This is the easy part of your question. Yes. You're thinking about the proper way to structure your Dockerfiles, reuse them, and keep your image builds efficient. Good job.
As for the error you're receiving, I'm less confident in answering why the ADD command is un-tarballing your tar.gz as the games user. I'm not nearly as familiar with CentOS. That's the start of the problem. dsadmin, as a regular non-privileged user, can't change ownership of files he doesn't own. Since this un-tarballed script is owned by games, the chown command fails.
I used your Dockerfiles and got the same issue on MacOS.
You can get around this by, well, not using ADD. Which is funny because local tarball extraction is the one use case where Docker thinks you should prefer ADD over COPY.
COPY code/validation_utility.tar.gz ${HOMEDIR}/.
RUN tar -xvf validation_utility.tar.gz
Properly extracts the tarball and, since dsadmin was the user at the time, the contents come out properly owned by dsadmin.
(An uglier route might be to switch the USER to root to set permissions, then set it back to dsadmin. I think this is icky, but it's an option.)

Supporting docker based and non docker based deployments

Currently for my python project I have a deploy.sh file which runs some apt-get's, pip installs, creates some dirs and copies some files.... so the process is git clone my private repo then run deploy.sh.
Now I'm playing with docker, and the basic question is, should the dockerfile RUN a git clone and then RUN deploy.sh or should the dockerfile have its own RUNs for each apt-get, pip, etc and ignore deploy.sh... which seems like duplicating work (typing) and has the possibility of going out of sync?

That work should be duplicated in the dockerfile. The reason for this is to take advantage of docker's layer and caching system. Take this example:
# this will only execute the first time you build the image, all future builds will use a cached layer
RUN apt-get update && apt-get install somepackage -y
# this will only run pip if your requirements file changes
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
ADD . /app/
Also, you should not do a git checkout of your code in the docker build. Just simply add the files from the local checkout to the images like in the above example.

How do I reduce a python (docker) image size using a multi-stage build?

I am looking for a way to create multistage builds with python and Dockerfile:
For example, using the following images:
1st image: install all compile-time requirements, and install all needed python modules
2nd image: copy all compiled/built packages from the first image to the second, without the compilers themselves (gcc, postgers-dev, python-dev, etc..)
The final objective is to have a smaller image, running python and the python packages that I need.
In short: how can I 'wrap' all the compiled modules (site-packages / external libs) that were created in the first image, and copy them in a 'clean' manner, to the 2nd image.

ok so my solution is using wheel, it lets us compile on first image, create wheel files for all dependencies and install them in the second image, without installing the compilers
FROM python:2.7-alpine as base
RUN mkdir /svc
COPY . /svc
WORKDIR /svc
RUN apk add --update \
postgresql-dev \
gcc \
musl-dev \
linux-headers
RUN pip install wheel && pip wheel . --wheel-dir=/svc/wheels
FROM python:2.7-alpine
COPY --from=base /svc /svc
WORKDIR /svc
RUN pip install --no-index --find-links=/svc/wheels -r requirements.txt
You can see my answer regarding this in the following blog post
https://www.blogfoobar.com/post/2018/02/10/python-and-docker-multistage-build

I recommend the approach detailed in this article (section 2). He uses virtualenv so pip install stores all the python code, binaries, etc. under one folder instead of spread out all over the file system. Then it's easy to copy just that one folder to the final "production" image. In summary:
Compile image
Activate virtualenv in some path of your choosing.
Prepend that path to your docker ENV. This is all virtualenv needs to function for all future docker RUN and CMD action.
Install system dev packages and pip install xyz as usual.
Production image
Copy the virtualenv folder from the Compile Image.
Prepend the virtualenv folder to docker's PATH

This is a place where using a Python virtual environment inside Docker can be useful. Copying a virtual environment normally is tricky since it needs to be the exact same filesystem path on the exact same Python build, but in Docker you can guarantee that.
(This is the same basic recipe #mpoisot describes in their answer and it appears in other SO answers as well.)
Say you're installing the psycopg PostgreSQL client library. The extended form of this requires the Python C development library plus the PostgreSQL C client library headers; but to run it you only need the PostgreSQL C runtime library. So here you can use a multi-stage build: the first stage installs the virtual environment using the full C toolchain, and the final stage copies the built virtual environment but only includes the minimum required libraries.
A typical Dockerfile could look like:
# Name the single Python image we're using everywhere.
ARG python=python:3.10-slim
# Build stage:
FROM ${python} AS build
# Install a full C toolchain and C build-time dependencies for
# everything we're going to need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
build-essential \
libpq-dev
# Create the virtual environment.
RUN python3 -m venv /venv
ENV PATH=/venv/bin:$PATH
# Install the Python library dependencies, including those with
# C extensions. They'll get installed into the virtual environment.
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# Final stage:
FROM ${python}
# Install the runtime-only C library dependencies we need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
libpq5
# Copy the virtual environment from the first stage.
COPY --from=build /venv /venv
ENV PATH=/venv/bin:$PATH
# Copy the application in.
COPY . .
CMD ["./main.py"]
If your application uses a Python entry point script then you can do everything in the first stage: RUN pip install . will copy the application into the virtual environment and create a wrapper script in /venv/bin for you. In the final stage you don't need to COPY the application again. Set the CMD to run the wrapper script out of the virtual environment, which is already at the front of the $PATH.
Again, note that this approach only works because it is the same Python base image in both stages, and because the virtual environment is on the exact same path. If it is a different Python or a different container path the transplanted virtual environment may not work correctly.

The docs on this explain exactly how to do this.
https://docs.docker.com/engine/userguide/eng-image/multistage-build/#before-multi-stage-builds
Basically you do exactly what you've said. The magic of multistage build feature though is that you can do this all from one dockerfile.
ie:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
This builds a go binary, then the next image runs the binary. The first image has all the build tools and the seccond is just a base linux machine that can run a binary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.