Supporting docker based and non docker based deployments

Supporting docker based and non docker based deployments - python

Currently for my python project I have a deploy.sh file which runs some apt-get's, pip installs, creates some dirs and copies some files.... so the process is git clone my private repo then run deploy.sh.
Now I'm playing with docker, and the basic question is, should the dockerfile RUN a git clone and then RUN deploy.sh or should the dockerfile have its own RUNs for each apt-get, pip, etc and ignore deploy.sh... which seems like duplicating work (typing) and has the possibility of going out of sync?

That work should be duplicated in the dockerfile. The reason for this is to take advantage of docker's layer and caching system. Take this example:
# this will only execute the first time you build the image, all future builds will use a cached layer
RUN apt-get update && apt-get install somepackage -y
# this will only run pip if your requirements file changes
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
ADD . /app/
Also, you should not do a git checkout of your code in the docker build. Just simply add the files from the local checkout to the images like in the above example.

Related

Should I use Poetry in production dockerfile?

I have a web app built with a framework like FastAPI or Django, and my project uses Poetry to manage the dependencies.
I didn't find any topic similar to this.
The question is: should I install poetry in my production dockerfile and install the dependencies using the poetry, or should I export the requirements.txt and just use pip inside my docker image?
Actually, I am exporting the requirements.txt to the project's root before deploy the app and just using it inside the docker image.
My motivation is that I don't need the "complexity" of using poetry inside a dockerfile, since the requirements.txt is already generated by the poetry and use it inside the image will generate a new step into docker build that can impact the build speed.
However, I have seen much dockerfiles with poetry installation, what makes me think that I am doing a bad use of the tool.

There's no need to use poetry in production. To understand this we should look back to what the original reason poetry exists. There are basically two main reasons for poetry:-
To manage python venv for us - in the past people use different range of tools, from home grown script to something like virtualenvwrapper to automatically manage the virtual env.
To help us publishing packages to PyPI
Reason no. 2 not really a concern for this question so let just look at reason no. 1. Why we need something like poetry in dev? It because dev environment could be different between developers. My venv could be in /home/kamal/.venv while John probably want to be fancy and place his virtualenv in /home/john/.local/venv.
When writing notes on how to setup and run your project, how would you write the notes to cater the difference between me and John? We probably use some placeholder such as /path/to/your/venv. Using poetry, we don't have to worry about this. Just write in the notes that you should run the command as:-
poetry run python manage.py runserver ...
Poetry take care of all the differences. But in production, we don't have this problem. Our app in production will be in single place, let say in /app. When writing notes on how to run command on production, we can just write:-
/app/.venv/bin/myapp manage collectstatic ...
Below is a sample Dockerfile we use to deploy our app using docker:-
FROM python:3.10-buster as py-build
# [Optional] Uncomment this section to install additional OS packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends netcat util-linux \
vim bash-completion yamllint postgresql-client
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python3 -
COPY . /app
WORKDIR /app
ENV PATH=/opt/poetry/bin:$PATH
RUN poetry config virtualenvs.in-project true && poetry install
FROM node:14.20.0 as js-build
COPY . /app
WORKDIR /app
RUN npm install && npm run production
FROM python:3.10-slim-buster
EXPOSE 8000
COPY --from=py-build /app /app
COPY --from=js-build /app/static /app/static
WORKDIR /app
CMD /app/.venv/bin/run
We use multistage build where in the build stage, we still use poetry to install all the dependecies but in the final stage, we just copy /app which would also include .venv virtualenv folder.

How to prevent pip from re-downloading all packages when I rebuild the docker image after a minor change in the requirement list?

I have over 200 python packages in requirements.txt. When I rebuild the image after modifying or adding a package in the list, docker surprisingly re-downloads all packages despite that most packages in the list are not related to the change I make. This makes the building process unnecessarily over an hour long.
This problem only happens inside Docker. If I add an item and pip install -r requirements.txt outside docker, pip knows how to update and download the minimum amount of relevant packages instead of redoing it from scratch.
Here is how my Dockerfile look like:
ARG PYTORCH="1.9.0"
ARG CUDA="11.1"
ARG CUDNN="8"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
...
...
WORKDIR /usr/src/app
RUN pip install cmake
ADD ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt
ADD . /usr/src/app
...

The problem:
You have a problem with the docker cache.
When you re-build an image, docker use cache but when you change the requirements.txt, docker understands that can't use cache for from ADD ./requirements.txt /usr/src/app/requirements.txt and since this step needs re-run the command (detail: docker image build only have the cache from the previous image but in this step you don't have any packages installed).
The previous image cache is used to RUN pip install cmake but after docker needs re-run because the requirements.txt is changed.
My recommendation:
You can use different requirements files and sort by modification rate, because when docker detects changes stop using cache and re-run the full command.

how to successfully run docker image as container

Below my docker file,
FROM python:3.9.0
ARG WORK_DIR=/opt/quarter_1
RUN apt-get update && apt-get install cron -y && apt-get install -y default-jre
# Install python libraries
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
WORKDIR $WORK_DIR
EXPOSE 8888
VOLUME /home/data/quarter_1/
# Copy etl code
# copy code on container under your workdir "/opt/quarter_1"
COPY . .
I tried to connect to the server then i did the build with docker build -t my-python-app .
when i tried to run the container from a build image i got nothing and was not able to do it.
docker run -p 8888:8888 -v /home/data/quarter_1/:/opt/quarter_1 image_id
work here is opt

Update based on comments
If I understand everything you've posted correctly, my suggestion here is to use a base Docker Jupyter image, modify it to add your pip requirements, and then add your files to the work path. I've tested the following:
Start with a dockerfile like below
FROM jupyter/base-notebook:python-3.9.6
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
COPY ./quarter_1 /home/jovyan/quarter_1
Above assumes you are running the build from the folder containing dockerfile, "requirements.txt", and the "quarter_1" folder with your build files.
Note "home/joyvan" is the default working folder in this image.
Build the image
docker build -t biwia-jupyter:3.9.6 .
Start the container with open port to 8888. e.g.
docker run -p 8888:8888 biwia-jupyter:3.9.6
Connect to the container to access token. A few ways to do but for example:
docker exec -it CONTAINER_NAME bash
jupyter notebook list
Copy the token in the URL and connect using your server IP and port. You should be able to paste the token there, and afterwards access the folder you copied into the build, as I did below.
Jupyter screenshot
If you are deploying the image to different hosts this is probably the best way to do it using COPY/ADD etc., but otherwise look at using Docker Volumes which give you access to a folder (for example quarter_1) from the host, so you don't constantly have to rebuild during development.
Second edit for Python 3.9.0 request
Using the method above, 3.9.0 is not immediately available from DockerHub. I doubt you'll have much compatibility issues between 3.9.0 and 3.9.6, but we'll build it anyway. We can download the dockerfile folder from github, update a build argument, create our own variant with 3.9.0, and do as above.
Assuming you have git. Otherwise download the repo manually.
Download the Jupyter Docker stack repo
git clone https://github.com/jupyter/docker-stacks
change into the base-notebook directory of the cloned repo
cd ./base-notebook
Build the image with python 3.9.0 instead
docker build --build-arg PYTHON_VERSION=3.9.0 -t jupyter-base-notebook:3.9.0 .
Create the version with your copied folders and 3.9.0 version from the steps above, replacing the first line in the dockerfile instead with:
FROM jupyter-base-notebook:3.9.0
I've tested this and it works, running Python 3.9.0 without issue.
There are lots of ways to build Jupyter images, this is just one method. Check out docker hub for Jupyter to see their variants.

Containerising Python command line application

I have created a Python command line application that is available through PyPi / pip install.
The application has native dependencies.
To make the installation less painful for Windows users I would like to create a Dockerised version out of this command line application.
What are the steps to convert setup.py with an entry point and requirements.txt to a command line application easily? Are there any tooling around this, or should I just write Dockerfile by hand?

Well, You have to create a Dockerfile and build an image off of it. There are best practices regarding the docker image creation that you need to apply. There are also language specific best practices.
Just to give you some ideas about the process:
FROM python:3.7.1-alpine3.8 #base image
ADD . /myapp # add project files
WORKDIR /myapp
RUN apk add dep1 dep2 #put your dependency packages here
RUN pip-3.7 install -r requirements.txt #install pip packages
RUN pip-3.7 install .
CMD myapp -h
Now build image and push it to some public registry:
sudo docker build -t <yourusername>/myapp:0.1 .
users can just pull image and use it:
sudo docker run -it myapp:0.1 myapp.py <switches/arguments>

How to build Docker images quicker

I'm currently building a docker image and running the container to run some tests in it for a Python application I'm working on. Currently the Dockerfile copies the files over from the host machine, sets the working directory to those copied files, runs a sudo apt-get and installs pip, and finally runs the tests from setup.py. The Dockerfile can be seen below.
FROM ubuntu
ADD . /home/dev/ProjectName
WORKDIR /home/dev/ProjectName
RUN apt-get update && \
apt-get install -y python3-pip && \
python3 setup.py test
I was curious if there were a more conventional way to avoid having to run the apt-get and apt-get install pip every time I'd like to run a test. The main idea I had was to build an image with pip already on it, and then build this image from that one.

Docker builds using cached layers if it can. By adding files you have changed it invalidates the cache for all subsequent rules. Put the apt commands first and those will only be run the first time you build. See this blog for more info.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.