Common advice (example) for carrying out CI is to use an image with pre-installed dependencies. Unfortunately for a n00b like me, the link in question doesn't go into further detail.
When I look for docker tutorials, it seems that usually teach you how to containerise an app rather than, say, Python with some pre-installed dependencies.
For example, if this is what my .gitlab-ci.yml file looks like:
image: "python:3.7"
before_script:
- python --version
- pip install -r requirements.txt
stages:
- Static Analysis
flake8:
stage: Static Analysis
script:
- flake8 --max-line-length=120
how can I containerise Python with some pre-installed dependencies (here, the ones in requirements.txt), and how should I change the .gitlab-ci.yml file, so that the CI process runs faster?
To make it faster I will recommend creating your custom Dockerfile based on python:3.7 that has installed all the dependency during the build. So this will save your time and your job will do not need to install dependency during each job build.
FROM python:3.7
RUN python --version
# Create app directory
WORKDIR /app
# copy requirements.txt
COPY local-src/requirements.txt ./
# Install app dependencies
RUN pip install -r requirements.txt
# Bundle app source
COPY src /app
You can read more about this practice docker-python-pip-requirements and write-effective-docker-files-with-python
Another option is to add git client in the Dockerfile and pull code during creating the container.
Related
I have a web app built with a framework like FastAPI or Django, and my project uses Poetry to manage the dependencies.
I didn't find any topic similar to this.
The question is: should I install poetry in my production dockerfile and install the dependencies using the poetry, or should I export the requirements.txt and just use pip inside my docker image?
Actually, I am exporting the requirements.txt to the project's root before deploy the app and just using it inside the docker image.
My motivation is that I don't need the "complexity" of using poetry inside a dockerfile, since the requirements.txt is already generated by the poetry and use it inside the image will generate a new step into docker build that can impact the build speed.
However, I have seen much dockerfiles with poetry installation, what makes me think that I am doing a bad use of the tool.
There's no need to use poetry in production. To understand this we should look back to what the original reason poetry exists. There are basically two main reasons for poetry:-
To manage python venv for us - in the past people use different range of tools, from home grown script to something like virtualenvwrapper to automatically manage the virtual env.
To help us publishing packages to PyPI
Reason no. 2 not really a concern for this question so let just look at reason no. 1. Why we need something like poetry in dev? It because dev environment could be different between developers. My venv could be in /home/kamal/.venv while John probably want to be fancy and place his virtualenv in /home/john/.local/venv.
When writing notes on how to setup and run your project, how would you write the notes to cater the difference between me and John? We probably use some placeholder such as /path/to/your/venv. Using poetry, we don't have to worry about this. Just write in the notes that you should run the command as:-
poetry run python manage.py runserver ...
Poetry take care of all the differences. But in production, we don't have this problem. Our app in production will be in single place, let say in /app. When writing notes on how to run command on production, we can just write:-
/app/.venv/bin/myapp manage collectstatic ...
Below is a sample Dockerfile we use to deploy our app using docker:-
FROM python:3.10-buster as py-build
# [Optional] Uncomment this section to install additional OS packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends netcat util-linux \
vim bash-completion yamllint postgresql-client
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python3 -
COPY . /app
WORKDIR /app
ENV PATH=/opt/poetry/bin:$PATH
RUN poetry config virtualenvs.in-project true && poetry install
FROM node:14.20.0 as js-build
COPY . /app
WORKDIR /app
RUN npm install && npm run production
FROM python:3.10-slim-buster
EXPOSE 8000
COPY --from=py-build /app /app
COPY --from=js-build /app/static /app/static
WORKDIR /app
CMD /app/.venv/bin/run
We use multistage build where in the build stage, we still use poetry to install all the dependecies but in the final stage, we just copy /app which would also include .venv virtualenv folder.
Below Dockerfile has environment variable XDG_CACHE_HOME=/cache
that allows command,
pip install -r requirements_test.txt
to utilise local cache(as shown below) instead of downloading from network:
But below Dockerfile also has /build folder.
So, I would like to understand,
if the purpose(content) of /build folder different from /cache folder
Dockerfile
FROM useraccount/todobackend-base:latest
MAINTAINER Development team <devteam#abc.com>
RUN apt-get update && \
# Development image should have access to source code and
# be able to compile python package dependencies that installs from source distribution
# python-dev has core development libraries required to build & compile python application from source
apt-get install -qy python-dev libmysqlclient-dev
# Activate virtual environment and install wheel support
# Python wheels are application package artifacts
RUN . /appenv/bin/activate && \
pip install wheel --upgrade
# PIP environment variables (NOTE: must be set after installing wheel)
# Configure docker image to output wheels to folder called /wheelhouse
# PIP cache location using XDG_CACHE_HOME to improve performance during test/build/release operation
ENV WHEELHOUSE=/wheelhouse PIP_WHEEL_DIR=/wheelhouse PIP_FIND_LINKS=/wheelhouse XDG_CACHE_HOME=/cache
# OUTPUT: Build artifacts (wheels) are output here
# Read more - https://www.projectatomic.io/docs/docker-image-author-guidance/
VOLUME /wheelhouse
# OUTPUT: Build cache
VOLUME /build
# OUTPUT: Test reports are output here
VOLUME /reports
# Add test entrypoint script
COPY scripts/test.sh /usr/local/bin/test.sh
RUN chmod +x /usr/local/bin/test.sh
# Set defaults for entrypoint and command string
ENTRYPOINT ["test.sh"]
CMD ["python", "manage.py", "test", "--noinput"]
# Add application source
COPY src /application
WORKDIR /application
Below is the docker-compose.yml file
test: # Unit & integration testing
build: ../../
dockerfile: docker/dev/Dockerfile
volumes_from:
- cache
links:
- db
environment:
DJANGO_SETTINGS_MODULE: todobackend.settings.test
MYSQL_HOST: db
MYSQL_USER: root
MYSQL_PASSWORD: password
TEST_OUTPUT_DIR: /reports
builder: # Generate python artifacts
build: ../../
dockerfile: docker/dev/Dockerfile
volumes:
- ../../target:/wheelhouse
volumes_from:
- cache
entrypoint: "entrypoint.sh"
command: ["pip", "wheel", "--non-index", "-f /build", "."]
db:
image: mysql:5.6
hostname: db
expose:
- "3386"
environment:
MYSQL_ROOT_PASSWORD: password
cache: # volume container
build: ../../
dockerfile: docker/dev/Dockerfile
volumes:
- /tmp/cache:/cache
- /build
entrypoint: "true"
Below volumes
volumes:
- /tmp/cache:/cache
- /build
are created in volume container(cache)
entrypoint file test.sh:
#!/bin/bash
# Activate virtual environment
. /appenv/bin/activate
# Download requirements to build cache
pip download -d /build -r requirements_test.txt --no-input
# Install application test requirements
# -r allows the requirements to be mentioned in a txt file
# pip install -r requirements_test.txt
pip install --no-index -f /build -r requirements_test.txt
# Run test.sh arguments
exec $#
Edit:
pip download -d /build -r requirements_test.txt --no-input storing below files in /build folder
pip install -r requirements_test.txt is picking dependencies from /build folder:
Above two commands are not using /cache folder
1)
So,
Why do we need /cache folder? pip install command is referring to /build
2)
In test.sh file.... From the aspect of using /build vs /cache content...
How
pip install --no-index -f /build -r requirements_test.txt
different from
pip install -r requirements_test.txt command ?
1) They might be the same, but might not as well. As I understand about what's being done here is that, /cache uses your host cache (/tmp/cache is in the host) and then the container builds the cache (using the host cache) and stores it in /build which points to /var/lib/docker/volumes/hjfhjksahfjksa in your host.
So, they might be the same at some point, but not always.
2) This container needs the cache stored in /build, so you need to use the -f flag to let pip know where it's located.
Python has a couple of different formats for packages. They're typically distributed as source code, which can run anywhere Python runs, but occasionally have C (or FORTRAN!) extensions that require an external compiler to build. The current recommended format is a wheel, which can be specific to a particular OS and specific Python build options, but doesn't depend on anything at the OS level outside of Python. The Python Packaging User Guide goes into a lot more detail on this.
The build volume contains .whl files for your application; the wheelhouse volume contains .whl files for other Python packages; the cache volume contains .tar.gz or .whl files that get downloaded from PyPI. The cache volume is only consulted when downloading things; the build and wheelhouse volumes are used to install code without needing to try to download at all.
The pip --no-index option says "don't contact public PyPI"; -f /build says "use artifacts located here". The environment variables mentioning /wheelhouse also have an effect. These combine to let you install packages using only what you've already built.
The Docker Compose setup is a pretty long-winded way to build your application as wheels, and then make it available to a runtime image that doesn't have a toolchain.
The cache container does literally nothing. It has the two directories you show: /cache is a host-mounted directory, and /build is an anonymous volume. Other containers have volumes_from: cache to reuse these volumes. (Style-wise, adding named volumes: to the docker-compose.yml is almost definitely better.)
The builder container only runs pip wheel. It mounts an additional directory, ./target from the point of view of the Dockerfile, on to /wheelhouse. The pip install documentation discusses how caching works: if it downloads files they go into $XDG_CACHE_DIR (the /cache volume directory), and if it builds wheels they go into the /wheelhouse volume directory. The output of pip wheel will go into the /build volume directory.
The test container, at startup time, downloads some additional packages and puts them in the build volume. Then it does pip install --no-index to install packages only using what's in the build and wheelhouse volumes, without calling out to PyPI at all.
This setup is pretty complicated for what it does. Some general guidelines I'd suggest here:
Prefer named volumes to data-volume containers. (Very early versions of Docker didn't have named volumes, but anything running on a modern Linux distribution will.)
Don't establish a virtual environment inside your image; just install directly into the system Python tree.
Install software at image build time (in the Dockerfile), not at image startup time (in an entrypoint script).
Don't declare VOLUME in a Dockerfile pretty much ever; it's not necessary for this setup and when it has effects it's usually more confusing than helpful.
A more typical setup would be to build all of this, in one shot, in a multi-stage build. The one downside of this is that downloads aren't cached across builds: if your list of requirements doesn't change then Docker will reuse it as a set, but if you add or remove any single thing, Docker will repeat the pip command to download the whole set.
This would look roughly like (not really tested):
# First stage: build and download wheels
FROM python:3 AS build
# Bootstrap some Python dependencies.
RUN pip install --upgrade pip \
&& pip install wheel
# This stage can need some extra host dependencies, like
# compilers and C libraries.
RUN apt-get update && \
apt-get install -qy python-dev libmysqlclient-dev
# Create a directory to hold built wheels.
RUN mkdir /wheel
# Install the application's dependencies (only).
WORKDIR /app
COPY requirements.txt .
RUN pip wheel --wheel-dir=/wheel -r requirements.txt \
&& pip install --no-index --find-links=/wheel -r requirements.txt
# Build a wheel out of the application.
COPY . .
RUN pip wheel --wheel-dir=/wheel --no-index --find-links=/wheel .
# Second stage: actually run the application.
FROM python:3
# Bootstrap some Python dependencies.
RUN pip install --upgrade pip \
&& pip install wheel
# Get the wheels from the first stage.
RUN mkdir /wheel
COPY --from=build /wheel /wheel
# Install them.
RUN pip install --no-index --find-links=/wheel /wheel/*.whl
# Standard application metadata.
# The application should be declared as entry_points in setup.py.
EXPOSE 3000
CMD ["the_application"]
I have created a Python command line application that is available through PyPi / pip install.
The application has native dependencies.
To make the installation less painful for Windows users I would like to create a Dockerised version out of this command line application.
What are the steps to convert setup.py with an entry point and requirements.txt to a command line application easily? Are there any tooling around this, or should I just write Dockerfile by hand?
Well, You have to create a Dockerfile and build an image off of it. There are best practices regarding the docker image creation that you need to apply. There are also language specific best practices.
Just to give you some ideas about the process:
FROM python:3.7.1-alpine3.8 #base image
ADD . /myapp # add project files
WORKDIR /myapp
RUN apk add dep1 dep2 #put your dependency packages here
RUN pip-3.7 install -r requirements.txt #install pip packages
RUN pip-3.7 install .
CMD myapp -h
Now build image and push it to some public registry:
sudo docker build -t <yourusername>/myapp:0.1 .
users can just pull image and use it:
sudo docker run -it myapp:0.1 myapp.py <switches/arguments>
I can't wrap my head around how to dockerize existing Django app.
I've read this official manual by Docker explaining how to create Django project during the creation of Docker image, but what I need is to dockerize existing project using the same method.
The main purpose of this approach is that I have no need to build docker images locally all the time, instead what I want to achieve is to push my code to a remote repository which has docker-hub watcher attached to it and as soon as the code base is updated it's being built automatically on the server.
For now my Dockerfile looks like:
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install Django
RUN pip install djangorestframework
RUN pip install PyQRCode
ADD . /code/
Can anyone please explain how should I compose Dockerfile and do I need to use docker-compose.yml (if yes: how?) to achieve functionality I've described?
Solution for this question:
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
RUN pip install *name of package*
RUN pip install *name of another package*
ADD . /code/
EXPOSE 8000
CMD python3 manage.py runserver 0.0.0.0:8000
OR
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/
EXPOSE 8000
CMD python3 manage.py runserver 0.0.0.0:8000
requirements.txt should be a plain list of packages, for example:
Django==1.11
djangorestframework
pyqrcode
pypng
This question is too broad. What happens with the Dockerfile you've created?
You don't need docker compose unless you have multiple containers that need to interact.
Some general observations from your current Dockerfile:
It would be better to collapse the pip install commands into a single statement. In docker, each statement creates a file system layer, and the layers in between the pip install commmands probably serve no useful purpose.
It's better to declare dependencies in setup.py or a requirements.txt file (pip install -r requirements.txt), with fixed version numbers (foopackage==0.0.1) to ensure a repeatable build.
I'd recommend packaging your Django app into a python package and installing it with pip (cd /code/; pip install .) rather than directly adding the code directory.
You're missing a statement (CMD or ENTRYPOINT) to execute the app. See https://docs.docker.com/engine/reference/builder/#cmd
Warning: -onbuild images have been deprecated.
#AlexForbes raised very good points. But if you want a super simple Dockerfile for Django, you can probably just do:
FROM python:3-onbuild
RUN python manage.py collectstatic
CMD ["python", "manage.py"]
You then run your container with:
docker run myimagename runserver
The little -onbuild modifier does most of what you need. It creates /usr/src/app, sets it as the working directory, copies all your source code inside, and runs pip install -r requirements.txt (which you forgot to run). Finally we collect statics (might not be required in your case if statics are hosted somewhere), and set the default command to manage.py so everything is easy to run.
You would need docker-compose if you had to run other containers like Celery, Redis or any other background task or server not supplied by your environment.
I actually wrote an article about this in https://rehalcon.blogspot.mx/2018/03/dockerize-your-django-app-for-local.html
My case is very similar, but it adds a MySQL db service and environment variables for code secrets, as well as the use of docker-compose (needed in macOS). I also use the python:2.7-slim docker parten image instead, to make the image much maller (under 150MB).
Currently for my python project I have a deploy.sh file which runs some apt-get's, pip installs, creates some dirs and copies some files.... so the process is git clone my private repo then run deploy.sh.
Now I'm playing with docker, and the basic question is, should the dockerfile RUN a git clone and then RUN deploy.sh or should the dockerfile have its own RUNs for each apt-get, pip, etc and ignore deploy.sh... which seems like duplicating work (typing) and has the possibility of going out of sync?
That work should be duplicated in the dockerfile. The reason for this is to take advantage of docker's layer and caching system. Take this example:
# this will only execute the first time you build the image, all future builds will use a cached layer
RUN apt-get update && apt-get install somepackage -y
# this will only run pip if your requirements file changes
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
ADD . /app/
Also, you should not do a git checkout of your code in the docker build. Just simply add the files from the local checkout to the images like in the above example.