Should I use Poetry in production dockerfile?

Should I use Poetry in production dockerfile? - python

I have a web app built with a framework like FastAPI or Django, and my project uses Poetry to manage the dependencies.
I didn't find any topic similar to this.
The question is: should I install poetry in my production dockerfile and install the dependencies using the poetry, or should I export the requirements.txt and just use pip inside my docker image?
Actually, I am exporting the requirements.txt to the project's root before deploy the app and just using it inside the docker image.
My motivation is that I don't need the "complexity" of using poetry inside a dockerfile, since the requirements.txt is already generated by the poetry and use it inside the image will generate a new step into docker build that can impact the build speed.
However, I have seen much dockerfiles with poetry installation, what makes me think that I am doing a bad use of the tool.

There's no need to use poetry in production. To understand this we should look back to what the original reason poetry exists. There are basically two main reasons for poetry:-
To manage python venv for us - in the past people use different range of tools, from home grown script to something like virtualenvwrapper to automatically manage the virtual env.
To help us publishing packages to PyPI
Reason no. 2 not really a concern for this question so let just look at reason no. 1. Why we need something like poetry in dev? It because dev environment could be different between developers. My venv could be in /home/kamal/.venv while John probably want to be fancy and place his virtualenv in /home/john/.local/venv.
When writing notes on how to setup and run your project, how would you write the notes to cater the difference between me and John? We probably use some placeholder such as /path/to/your/venv. Using poetry, we don't have to worry about this. Just write in the notes that you should run the command as:-
poetry run python manage.py runserver ...
Poetry take care of all the differences. But in production, we don't have this problem. Our app in production will be in single place, let say in /app. When writing notes on how to run command on production, we can just write:-
/app/.venv/bin/myapp manage collectstatic ...
Below is a sample Dockerfile we use to deploy our app using docker:-
FROM python:3.10-buster as py-build
# [Optional] Uncomment this section to install additional OS packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends netcat util-linux \
vim bash-completion yamllint postgresql-client
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python3 -
COPY . /app
WORKDIR /app
ENV PATH=/opt/poetry/bin:$PATH
RUN poetry config virtualenvs.in-project true && poetry install
FROM node:14.20.0 as js-build
COPY . /app
WORKDIR /app
RUN npm install && npm run production
FROM python:3.10-slim-buster
EXPOSE 8000
COPY --from=py-build /app /app
COPY --from=js-build /app/static /app/static
WORKDIR /app
CMD /app/.venv/bin/run
We use multistage build where in the build stage, we still use poetry to install all the dependecies but in the final stage, we just copy /app which would also include .venv virtualenv folder.

Related

Flask, React and Docker: Non-Zero Codes

I am trying to follow the Flask/React tutorial here, on a plain Windows machine.
On Windows 10, without considering Docker, I have the tutorial working.
On Windows 10 under a docker system (ubuntu-based containers and docker-compose), I do not:
The React server works under the docker.
The Flask server won't successfully build.
The Dockerfile for the Flask server is:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
RUN pip3 install flask
#RUN pip3 install venv
RUN mkdir -p /app
WORKDIR /app
COPY . /app
#RUN python3 -m venv venv
RUN cd api/venv/Scripts
RUN flask run --no-debugger
This fails at the very last line:
The command '/bin/sh -c flask run --no-debugger' returned a non-zero code: 1
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time. The venv commands are commented out because I'm not even sure venv makes sense in a docker (but what would I know?) and also because the pip3 install venv command halts with a non-zero code:2.
Any advice is welcome.

There are two obvious issues in the Dockerfile you show.
Each RUN command runs in a clean environment starting from the last known state of the image. Settings like the current directory (and also environment variable values) are not preserved when a RUN command exits. So RUN cd ... starts the RUN command from the old directory, changes to the new directory, and then doesn't remember that; the following RUN command starts again from the old directory. You need the WORKDIR directive to actually change directories.
The RUN commands also run during the build phase. They won't publish network ports or have access to databases; in a multi-container Compose setup they can't connect to other containers. You probably want to run the Flask app as the main container CMD.
So you can update your Dockerfile to look like:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository universe
RUN apt-get update && apt-get install -y python3-pip yarn
WORKDIR /app # Creates the directory as well
COPY requirements.txt ./ # Includes "flask"
RUN pip install -r requirements.txt
COPY . ./
WORKDIR /app/api/venv/Scripts # Not `RUN cd ...`
CMD flask run --no-debugger # Not `RUN ...`
It is in fact common to just not use a virtual environment in Docker; the Docker image is isolated from any other Python installation and so it's safe to use the "system" Python package tree. (I am a little suspicious of the venv directory in there, since virtual environments can't be transplanted into other setups very well.)
Note that I find myself in the unenviable position of trying to use/teach myself all of Docker, venv, react, and flask at the same time.
Put Docker away for another day. It's not necessary, especially during the development phase of your application. If you read through SO questions there are a lot of questions trying to contort Docker into acting just like a local development environment, where it's really not designed for it. There's nothing wrong with locally installing the tools you need to do your job, especially when they're very routine tools like Python and Node.

I believe that flask can't find your app when you run your docker (especially as the docker build attempts to run it). If you want to use the docker only for the purpose of running your app through that docker, use CMD in the dockerfile, thus when running the docker image, it will start your flask app first thing.

Containerising Python command line application

I have created a Python command line application that is available through PyPi / pip install.
The application has native dependencies.
To make the installation less painful for Windows users I would like to create a Dockerised version out of this command line application.
What are the steps to convert setup.py with an entry point and requirements.txt to a command line application easily? Are there any tooling around this, or should I just write Dockerfile by hand?

Well, You have to create a Dockerfile and build an image off of it. There are best practices regarding the docker image creation that you need to apply. There are also language specific best practices.
Just to give you some ideas about the process:
FROM python:3.7.1-alpine3.8 #base image
ADD . /myapp # add project files
WORKDIR /myapp
RUN apk add dep1 dep2 #put your dependency packages here
RUN pip-3.7 install -r requirements.txt #install pip packages
RUN pip-3.7 install .
CMD myapp -h
Now build image and push it to some public registry:
sudo docker build -t <yourusername>/myapp:0.1 .
users can just pull image and use it:
sudo docker run -it myapp:0.1 myapp.py <switches/arguments>

Dockerize existing Django project

I can't wrap my head around how to dockerize existing Django app.
I've read this official manual by Docker explaining how to create Django project during the creation of Docker image, but what I need is to dockerize existing project using the same method.
The main purpose of this approach is that I have no need to build docker images locally all the time, instead what I want to achieve is to push my code to a remote repository which has docker-hub watcher attached to it and as soon as the code base is updated it's being built automatically on the server.
For now my Dockerfile looks like:
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install Django
RUN pip install djangorestframework
RUN pip install PyQRCode
ADD . /code/
Can anyone please explain how should I compose Dockerfile and do I need to use docker-compose.yml (if yes: how?) to achieve functionality I've described?
Solution for this question:
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
RUN pip install *name of package*
RUN pip install *name of another package*
ADD . /code/
EXPOSE 8000
CMD python3 manage.py runserver 0.0.0.0:8000
OR
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/
EXPOSE 8000
CMD python3 manage.py runserver 0.0.0.0:8000
requirements.txt should be a plain list of packages, for example:
Django==1.11
djangorestframework
pyqrcode
pypng

This question is too broad. What happens with the Dockerfile you've created?
You don't need docker compose unless you have multiple containers that need to interact.
Some general observations from your current Dockerfile:
It would be better to collapse the pip install commands into a single statement. In docker, each statement creates a file system layer, and the layers in between the pip install commmands probably serve no useful purpose.
It's better to declare dependencies in setup.py or a requirements.txt file (pip install -r requirements.txt), with fixed version numbers (foopackage==0.0.1) to ensure a repeatable build.
I'd recommend packaging your Django app into a python package and installing it with pip (cd /code/; pip install .) rather than directly adding the code directory.
You're missing a statement (CMD or ENTRYPOINT) to execute the app. See https://docs.docker.com/engine/reference/builder/#cmd

Warning: -onbuild images have been deprecated.
#AlexForbes raised very good points. But if you want a super simple Dockerfile for Django, you can probably just do:
FROM python:3-onbuild
RUN python manage.py collectstatic
CMD ["python", "manage.py"]
You then run your container with:
docker run myimagename runserver
The little -onbuild modifier does most of what you need. It creates /usr/src/app, sets it as the working directory, copies all your source code inside, and runs pip install -r requirements.txt (which you forgot to run). Finally we collect statics (might not be required in your case if statics are hosted somewhere), and set the default command to manage.py so everything is easy to run.
You would need docker-compose if you had to run other containers like Celery, Redis or any other background task or server not supplied by your environment.

I actually wrote an article about this in https://rehalcon.blogspot.mx/2018/03/dockerize-your-django-app-for-local.html
My case is very similar, but it adds a MySQL db service and environment variables for code secrets, as well as the use of docker-compose (needed in macOS). I also use the python:2.7-slim docker parten image instead, to make the image much maller (under 150MB).

Supporting docker based and non docker based deployments

Currently for my python project I have a deploy.sh file which runs some apt-get's, pip installs, creates some dirs and copies some files.... so the process is git clone my private repo then run deploy.sh.
Now I'm playing with docker, and the basic question is, should the dockerfile RUN a git clone and then RUN deploy.sh or should the dockerfile have its own RUNs for each apt-get, pip, etc and ignore deploy.sh... which seems like duplicating work (typing) and has the possibility of going out of sync?

That work should be duplicated in the dockerfile. The reason for this is to take advantage of docker's layer and caching system. Take this example:
# this will only execute the first time you build the image, all future builds will use a cached layer
RUN apt-get update && apt-get install somepackage -y
# this will only run pip if your requirements file changes
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
ADD . /app/
Also, you should not do a git checkout of your code in the docker build. Just simply add the files from the local checkout to the images like in the above example.

How do I reduce a python (docker) image size using a multi-stage build?

I am looking for a way to create multistage builds with python and Dockerfile:
For example, using the following images:
1st image: install all compile-time requirements, and install all needed python modules
2nd image: copy all compiled/built packages from the first image to the second, without the compilers themselves (gcc, postgers-dev, python-dev, etc..)
The final objective is to have a smaller image, running python and the python packages that I need.
In short: how can I 'wrap' all the compiled modules (site-packages / external libs) that were created in the first image, and copy them in a 'clean' manner, to the 2nd image.

ok so my solution is using wheel, it lets us compile on first image, create wheel files for all dependencies and install them in the second image, without installing the compilers
FROM python:2.7-alpine as base
RUN mkdir /svc
COPY . /svc
WORKDIR /svc
RUN apk add --update \
postgresql-dev \
gcc \
musl-dev \
linux-headers
RUN pip install wheel && pip wheel . --wheel-dir=/svc/wheels
FROM python:2.7-alpine
COPY --from=base /svc /svc
WORKDIR /svc
RUN pip install --no-index --find-links=/svc/wheels -r requirements.txt
You can see my answer regarding this in the following blog post
https://www.blogfoobar.com/post/2018/02/10/python-and-docker-multistage-build

I recommend the approach detailed in this article (section 2). He uses virtualenv so pip install stores all the python code, binaries, etc. under one folder instead of spread out all over the file system. Then it's easy to copy just that one folder to the final "production" image. In summary:
Compile image
Activate virtualenv in some path of your choosing.
Prepend that path to your docker ENV. This is all virtualenv needs to function for all future docker RUN and CMD action.
Install system dev packages and pip install xyz as usual.
Production image
Copy the virtualenv folder from the Compile Image.
Prepend the virtualenv folder to docker's PATH

This is a place where using a Python virtual environment inside Docker can be useful. Copying a virtual environment normally is tricky since it needs to be the exact same filesystem path on the exact same Python build, but in Docker you can guarantee that.
(This is the same basic recipe #mpoisot describes in their answer and it appears in other SO answers as well.)
Say you're installing the psycopg PostgreSQL client library. The extended form of this requires the Python C development library plus the PostgreSQL C client library headers; but to run it you only need the PostgreSQL C runtime library. So here you can use a multi-stage build: the first stage installs the virtual environment using the full C toolchain, and the final stage copies the built virtual environment but only includes the minimum required libraries.
A typical Dockerfile could look like:
# Name the single Python image we're using everywhere.
ARG python=python:3.10-slim
# Build stage:
FROM ${python} AS build
# Install a full C toolchain and C build-time dependencies for
# everything we're going to need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
build-essential \
libpq-dev
# Create the virtual environment.
RUN python3 -m venv /venv
ENV PATH=/venv/bin:$PATH
# Install the Python library dependencies, including those with
# C extensions. They'll get installed into the virtual environment.
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# Final stage:
FROM ${python}
# Install the runtime-only C library dependencies we need.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
libpq5
# Copy the virtual environment from the first stage.
COPY --from=build /venv /venv
ENV PATH=/venv/bin:$PATH
# Copy the application in.
COPY . .
CMD ["./main.py"]
If your application uses a Python entry point script then you can do everything in the first stage: RUN pip install . will copy the application into the virtual environment and create a wrapper script in /venv/bin for you. In the final stage you don't need to COPY the application again. Set the CMD to run the wrapper script out of the virtual environment, which is already at the front of the $PATH.
Again, note that this approach only works because it is the same Python base image in both stages, and because the virtual environment is on the exact same path. If it is a different Python or a different container path the transplanted virtual environment may not work correctly.

The docs on this explain exactly how to do this.
https://docs.docker.com/engine/userguide/eng-image/multistage-build/#before-multi-stage-builds
Basically you do exactly what you've said. The magic of multistage build feature though is that you can do this all from one dockerfile.
ie:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]
This builds a go binary, then the next image runs the binary. The first image has all the build tools and the seccond is just a base linux machine that can run a binary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.