How to make docker-build to cache the pip installs as well? - python

I am working with the following Dockerfile.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y \
python3.9 \
python3-pip
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8501
ENTRYPOINT [ "streamlit", "run" ]
CMD ["app.py"]
Whenever, I rebuild the image, the docker uses the previously cached version of image and builds it quickly, except when it reaches the RUN pip install -r requirements.txt part, which it runs every time I rebuild the image (does not cache). The problem is that one of the requirement in the requirements.txt is streamlit==1.9.0, which has a large number of dependencies, so it slows the down the rebuild process. Long story short, the docker is cacheing RUN apt-get install foo but not the RUN pip install bar, which, I guess, is expected. How would you find a work-around it, to speed-up the image rebuild process when you have a long list in requirements.txt file?

It can't cache it because your files in app are changing constantly (I guess). The way its usually done is to copy requirements.txt seperately first, then install, then copy everything else. This way, docker can cache the install as requirements.txt didn't change.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y \
python3.9 \
python3-pip
WORKDIR /app
COPY requirements.txt . # Copy ONLY requirements. This layer is cached if this file doesn't change
RUN pip install -r requirements.txt # This uses the cached layer if the previous layer was cached aswell
COPY . .
EXPOSE 8501
ENTRYPOINT [ "streamlit", "run" ]
CMD ["app.py"]

Docker BuildKit has recently introduced mounting during build time.
See https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md
For particular scenario for PIP, check out https://dev.doroshev.com/docker-mount-type-cache/

Related

Dockerfile not respecting caching

FROM python:3.8
WORKDIR /app
COPY . .
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN pip3 install -r requirements.txt
CMD ["/app"]
This is my dockerfile and according to the documentation, unchanged lines should cache. But I'm seeing some weird behavior where when I don't change my Dockerfile, or add a line at the end, causes a rebuild with no caching. Does anyone have any idea what's happening?
Using copy . . will invalidate the cache for instructions below it every time you make a change to copied files.
Make sure you have a docker ignore file that stops copying of files you don't need in the container. AND/OR only copy files you need.
Order your instructions to make changes in the container from least-likely to change to most likely towards the bottom, as much as can be helped.
You don't need those apt installs
For further reading, check out Faster or slower: the basics of Docker build caching (and the whole series, too).
A slightly more optimized version of your dockerfile:
FROM python:3.8
# you would do apt update/install here if you needed it
# python3.8 image includes pip already so no need.
WORKDIR /app
COPY requirements.txt .
# only copy requirements file here
# This way, we only need to pip install again when `requirements.txt` changes
RUN pip install -r requirements.txt
# NOW copy the rest of the files you need
COPY . .
# Even better, specify exact directories/files needed e.g.,
# COPY mypackage .
# now nothing will need to update just because a file changes!
CMD ["/app"]

Running pip as the 'root' user can result in broken permissions in Dockerfile [duplicate]

This question already has answers here:
WARNING: Running pip as the 'root' user
(4 answers)
Closed 10 months ago.
I have this Dockerfile:
FROM python:3.8-slim
WORKDIR /app
COPY . .
RUN apt-get update
RUN apt-get install -y python3 python3-pip python3-venv
RUN pip freeze > requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python3", "main.py"]
Everything works file until this line:
RUN pip install --no-cache-dir -r requirements.txt
Using docker run --rm -it name bash and pip install -r requirements.txt then I found this error:
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting
behaviour with the system package manager. It is recommended to use a virtual environment
instead: https://pip.pypa.io/warnings/venv
Here, I found solution (which didn't work for me), that it's possible to resolve just by creating new user, but it doesn't seem to be optimal solution. How can I fix this?
In this case the problem was in version of images. Using this Dockerfile I was able to fix this:
FROM python:3.9.3
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python3", "main.py"]
PS. I don't really know if it was about it, but this images has the same python version, that I have on my computer. I could have impact on dependencies.

Multi-stage Dockerfile not working for python

Currently I am creating a virtual environment in the first stage.
Running command pip install -r requirements.txt , which install executables in /venv/bin dir.
In second stage i am copying the /venv/bin dir , but on running the python app error comes as module not found i.e i need to run pip install -r requirements.txt again to run the app .
The application is running in python 2.7 and some of the dependencies requires compiler to build . Also those dependencies are failing with alpine images compiler , and only works with ubuntu compiler or python:2.7 official image ( which in turn uses debian)
Am I missing some command in the second stage that will help in using the copied dependencies instead of installing it again .
FROM python:2.7-slim AS build
RUN apt-get update &&apt-get install -y --no-install-recommends build-essential gcc
RUN pip install --upgrade pip
RUN python3 -m venv /venv
COPY ./requirements.txt /project/requirements/
RUN /venv/bin/pip install -r /project/requirements/requirements.txt
COPY . /venv/bin
FROM python:2.7-slim AS release
COPY --from=build /venv /venv
WORKDIR /venv/bin
RUN apt-get update && apt-get install -y --no-install-recommends build-essential gcc
#RUN pip install -r requirements.txt //
RUN cp settings.py.sample settings.py
CMD ["/venv/bin/python3", "-m", "main.py"]
I am trying to avoid pip install -r requirements.txt in second stage to reduce the image size which is not happening currently.
Only copying the bin dir isn't enough; for example, packages are installed in lib/pythonX.X/site-packages and headers under include. I'd just copy the whole venv directory. You can also run it with --no-cache-dir to avoid saving the wheel archives.
insert before all
FROM yourimage:tag AS build

Docker re-build time

We are trying to create a Docker container for a python application. The Dockerfile installs dependencies using "pip install". The Dockerfile looks like
FROM ubuntu:latest
RUN apt-get update -y
RUN apt-get install -y git wget python3-pip
RUN mkdir /app
COPY . /app
RUN pip3 install asn1crypto
RUN pip3 install cffi==1.10.0
RUN pip3 install click==6.7
RUN pip3 install conda==4.3.16
RUN pip3 install Flask==0.12.2
RUN pip3 install Flask-SSLify==0.1.5
RUN pip3 install Flask-SSLify==0.1.5
RUN pip3 install flask-restful==0.3.6
WORKDIR /app
ENTRYPOINT ["python3"]
CMD [ "X.py", "/app/Y.yml" ]
The docker gets created successfully the issue is on the rebuild time.
If nothing is changed in the dockerfile above
If a line is changed in the dockerfile which is after pip install the docker daemon still runs all the commands in pip install, downloading all the packages though not installing them.
Is there a way to optimize the rebuild?
Thx
Below is what i would like to do momentarily with the Dockerfile for optimization -
FROM ubuntu:latest
RUN apt-get update -y && apt-get install -y \
git \
wget \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY ./requirements.txt .
RUN pip3 install -r requirements.txt
COPY . /app
ENTRYPOINT ["python3"]
CMD [ "X.py", "/app/Y.yml" ]
Reduce the layers by integrating multiple commands into a single one specifically when they are interdependent. This helps reducing the image size.
Always try to use the COPY at the end since a regular source code change may invalidate the next layer caching.
Use a single requirements.txt file for installation through pip. Also define separate steps in case you have lots of packages to install, don't let a normal source code change force packages installation on every build.
Always cleanup the intermediate things which are not required in the final image.
Ref- https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/

Pip install -e packages don't appear in Docker

I have a requirements.txt file containing, amongst others:
Flask-RQ==0.2
-e git+https://token:x-oauth-basic#github.com/user/repo.git#egg=repo
When I try to build a Docker container using Docker Compose, it downloads both packages, and install them both, but when I do a pip freeze there is no sign of the -e package. When I try to run the app, it looks as if this package hasn't been installed. Here's the relevant output from the build:
Collecting Flask-RQ==0.2 (from -r requirements.txt (line 3))
Downloading Flask-RQ-0.2.tar.gz
Obtaining repo from git+https://token:x-oauth-basic#github.com/user/repo.git#egg=repo (from -r requirements.txt (line 4))
Cloning https://token:x-oauth-basic#github.com/user/repo.git to ./src/repo
And here's my Dockerfile:
FROM python:2.7
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip install -r requirements.txt
COPY . /usr/src/app
I find this situation very strange and would appreciate any help.
I ran into a similar issue, and one possible way that the problem can appear is from:
WORKDIR /usr/src/app
being set before pip install. pip will create the src/ directory (where the package is installed) inside of the WORKDIR. Now all of this shouldn't be an issue since your app files, when copied over, should not overwrite the src/ directory.
However, you might be mounting a volume to /usr/src/app. When you do that, you'll overwrite the /usr/src/app/src directory and then your package will not be found.
So one fix is to move WORKDIR after the pip install. So your Dockerfile will look like:
FROM python:2.7
RUN mkdir -p /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip install -r /usr/src/app/requirements.txt
COPY . /usr/src/app
WORKDIR /usr/src/app
This fixed it for me. Hopefully it'll work for you.
#mikexstudios is correct, this happens because pip stores the package source in /usr/src/app/src, but you're mounting a local directory over top of it, meaning python can't find the package source.
Rather than changing the position of WORKDIR, I solved it by changing the pip command to:
pip install -r requirements.txt --src /usr/local/src
Either approach should work.
If you are recieving a similar error when installing a git repo from a requirements file under a dockerized container, you may have forgotten to install git.
Here is the error I recieved:
Downloading/unpacking CMRESHandler from
git+git://github.com/zigius/python-elasticsearch-logger.git (from -r
/home/ubuntu/requirements.txt (line 5))
Cloning git://github.com/zigius/python-elasticsearch-logger.git to
/tmp/pip_build_root/CMRESHandler
Cleaning up...
Cannot find command 'git'
Storing debug log for failure in /root/.pip/pip.log
The command '/bin/sh -c useradd ubuntu -b /home && echo
"ubuntu ALL = NOPASSWD: ALL" >> /etc/sudoers &&
chown -R ubuntu:ubuntu /home/ubuntu && pip install -r /home/ubuntu/requirements.txt returned a non-zero code: 1
Here is an example Dockerfile that installs git and then installs all requirements:
FROM python:3.5-slim
RUN apt-get update && apt-get install -y --no-install-recommends git \
ADD . /code
WORKDIR /code
RUN pip install --upgrade pip setuptools && pip install -r /home/ubuntu/requirements.txt
Now you can use git packages in your requirements file in a Dockerized environment

Categories

Resources