We are trying to create a Docker container for a python application. The Dockerfile installs dependencies using "pip install". The Dockerfile looks like
FROM ubuntu:latest
RUN apt-get update -y
RUN apt-get install -y git wget python3-pip
RUN mkdir /app
COPY . /app
RUN pip3 install asn1crypto
RUN pip3 install cffi==1.10.0
RUN pip3 install click==6.7
RUN pip3 install conda==4.3.16
RUN pip3 install Flask==0.12.2
RUN pip3 install Flask-SSLify==0.1.5
RUN pip3 install Flask-SSLify==0.1.5
RUN pip3 install flask-restful==0.3.6
WORKDIR /app
ENTRYPOINT ["python3"]
CMD [ "X.py", "/app/Y.yml" ]
The docker gets created successfully the issue is on the rebuild time.
If nothing is changed in the dockerfile above
If a line is changed in the dockerfile which is after pip install the docker daemon still runs all the commands in pip install, downloading all the packages though not installing them.
Is there a way to optimize the rebuild?
Thx
Below is what i would like to do momentarily with the Dockerfile for optimization -
FROM ubuntu:latest
RUN apt-get update -y && apt-get install -y \
git \
wget \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY ./requirements.txt .
RUN pip3 install -r requirements.txt
COPY . /app
ENTRYPOINT ["python3"]
CMD [ "X.py", "/app/Y.yml" ]
Reduce the layers by integrating multiple commands into a single one specifically when they are interdependent. This helps reducing the image size.
Always try to use the COPY at the end since a regular source code change may invalidate the next layer caching.
Use a single requirements.txt file for installation through pip. Also define separate steps in case you have lots of packages to install, don't let a normal source code change force packages installation on every build.
Always cleanup the intermediate things which are not required in the final image.
Ref- https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Related
I have around custom made 30 algorithm python libraries and the count is increasing day by day.Kindly refer below screenshot:
The docker file for deploying it is as follows:
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8
# updating image and installing jdk as well as removing caches
RUN apt-get update && apt-get install default-jdk -y && rm -rf /var/lib/apt/lists/* && mkdir -p /data/models/ && chmod -R 777 /data
# Setting up virtual environment for python
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# COPY production dependencies.
COPY requirements.txt ./
# copy local dependencies
COPY localpackages ./localpackages/
# installing all dependencies
RUN mkdir -p $PWD/localpackages && pip install --no-cache-dir -r ./requirements.txt && pip install --no-cache-dir $PWD/localpackages/*
COPY ./ /app
ENV PYTHONPATH /app
EXPOSE 5000
WORKDIR /app
CMD ["python", "app/main.py"]
The problem with this approach is if some package changes in localpackages, it builds that layer as a whole again and image size is too high.
I tried in below fashion by calling each package but layers will increase as algos increase:
RUN pip install --no-cache-dir $PWD/localpackages/{abc}*
RUN pip install --no-cache-dir $PWD/localpackages/{def}*
RUN pip install --no-cache-dir $PWD/localpackages/{ghi}*
How can i optimise this docker image so that size is small and layers are also reduced?
So I've a Flask web app that will be exposing some deep learning models.
I built the image and everything works fine.
the problem is the size of this image is 5.58GB! which is a bit ridiculous.
I have some deep learning models that are copied during the build, I thought they might be the culprit but their size combined does not exceed 300MB so that's definately not it.
upon checking the history and the size of each layer I discovered this:
RUN /bin/sh -c pip install -r requirements.txt is taking up 771MB.
RUN /bin/sh -c pip install torch==1.10.2 is taking up 2.8GB!
RUN /bin/sh -c apt-get install ffmpeg libsm6 libxext6 is taking up 400MB.
so how do I incorporate these libraries while keeping image size reasonable? is it ok to have images of these size when deploying ml models in python?
below is the root directory:
Dockerfile:
FROM python:3.7.13
WORKDIR /app
COPY ["rdm.pt", "autosort_model.pt", "rotated_model.pt", "yolov5x6.pt", "/app/"]
RUN pip install torch==1.10.2
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
COPY . /app
CMD python ./app.py
.dockerignore:
Dockerfile
README.md
__pycache__
By default torch will package CUDA packages and stuff. Add --extra-index-url https://download.pytorch.org/whl/cpu and --no-cache-dir to pip install command if you do not require CUDA.
RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
Also it's good practice to remove the apt list cache:
RUN apt-get update \
&& apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
&& rm -rf /var/lib/apt/lists/*
I am working with the following Dockerfile.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y \
python3.9 \
python3-pip
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8501
ENTRYPOINT [ "streamlit", "run" ]
CMD ["app.py"]
Whenever, I rebuild the image, the docker uses the previously cached version of image and builds it quickly, except when it reaches the RUN pip install -r requirements.txt part, which it runs every time I rebuild the image (does not cache). The problem is that one of the requirement in the requirements.txt is streamlit==1.9.0, which has a large number of dependencies, so it slows the down the rebuild process. Long story short, the docker is cacheing RUN apt-get install foo but not the RUN pip install bar, which, I guess, is expected. How would you find a work-around it, to speed-up the image rebuild process when you have a long list in requirements.txt file?
It can't cache it because your files in app are changing constantly (I guess). The way its usually done is to copy requirements.txt seperately first, then install, then copy everything else. This way, docker can cache the install as requirements.txt didn't change.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y \
python3.9 \
python3-pip
WORKDIR /app
COPY requirements.txt . # Copy ONLY requirements. This layer is cached if this file doesn't change
RUN pip install -r requirements.txt # This uses the cached layer if the previous layer was cached aswell
COPY . .
EXPOSE 8501
ENTRYPOINT [ "streamlit", "run" ]
CMD ["app.py"]
Docker BuildKit has recently introduced mounting during build time.
See https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md
For particular scenario for PIP, check out https://dev.doroshev.com/docker-mount-type-cache/
I'm using Docker to automate my backend work in Python. I have a file backend.py, which when executed, downloads pdf files and converts them into images.
This is my Dockerfile:
FROM python:3.6.3
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get install -y ghostscript libgs-dev
RUN apt-get install -y libmagickwand-dev imagemagick --fix-missing
RUN apt-get install -y libpng-dev zlib1g-dev libjpeg-dev
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
ADD backend.py .
ADD Vera.ttf .
CMD [ "python", "backend.py" ]
What I want is when I run the Dockerfile using the command:
docker run -d -it --name devtest-1 --mount type=bind,source=D:\projects\imageProject\public\assets,target=/app/data kidsuki-test3
I want the the pdf files and images to get stored in my local machine in the path: "D:\projects\imageProject\public\assets" and also get stored in the container in the path : "/app/data"
But for now, what I'm getting is, it copies the files in my "D:\projects\imageProject\public\assets" folder and stores it in "/app/data" in the docker container devtest-1
Thanks in advance!
Currently I am creating a virtual environment in the first stage.
Running command pip install -r requirements.txt , which install executables in /venv/bin dir.
In second stage i am copying the /venv/bin dir , but on running the python app error comes as module not found i.e i need to run pip install -r requirements.txt again to run the app .
The application is running in python 2.7 and some of the dependencies requires compiler to build . Also those dependencies are failing with alpine images compiler , and only works with ubuntu compiler or python:2.7 official image ( which in turn uses debian)
Am I missing some command in the second stage that will help in using the copied dependencies instead of installing it again .
FROM python:2.7-slim AS build
RUN apt-get update &&apt-get install -y --no-install-recommends build-essential gcc
RUN pip install --upgrade pip
RUN python3 -m venv /venv
COPY ./requirements.txt /project/requirements/
RUN /venv/bin/pip install -r /project/requirements/requirements.txt
COPY . /venv/bin
FROM python:2.7-slim AS release
COPY --from=build /venv /venv
WORKDIR /venv/bin
RUN apt-get update && apt-get install -y --no-install-recommends build-essential gcc
#RUN pip install -r requirements.txt //
RUN cp settings.py.sample settings.py
CMD ["/venv/bin/python3", "-m", "main.py"]
I am trying to avoid pip install -r requirements.txt in second stage to reduce the image size which is not happening currently.
Only copying the bin dir isn't enough; for example, packages are installed in lib/pythonX.X/site-packages and headers under include. I'd just copy the whole venv directory. You can also run it with --no-cache-dir to avoid saving the wheel archives.
insert before all
FROM yourimage:tag AS build