Dockerfile for Python DJango failing on COPY

Dockerfile for Python DJango failing on COPY - python

I have a Dockerfile that fails on build with the error;
COPY failed: stat /var/lib/docker/tmp/docker-builder476469130/requirements.txt: no such file or directory
The error occurs on the COPY line for the requirments.txt file. I use a pretty standard Dockerfile;
FROM python:3.6.7-slim
# Version: 1.4
# Dockerfile to build the coroner container.
# Install Python and Package Libraries
RUN apt-get update && apt-get upgrade -y && apt-get autoremove && apt-get autoclean
RUN apt-get install -y \
libffi-dev \
libssl-dev \
default-libmysqlclient-dev \
libxml2-dev \
libxslt-dev \
libjpeg-dev \
libfreetype6-dev \
zlib1g-dev \
net-tools \
nano
ARG PROJECT=coroner
ARG PROJECT_DIR=/var/www/${PROJECT}
WORKDIR $PROJECT_DIR
ENV PYTHONUNBUFFERED 1
RUN mkdir -p $PROJECT_DIR
COPY requirements.txt $PROJECT_DIR/requirments.txt
RUN pip install --upgrade pip
RUN pip install -r $PROJECT_DIR/requirements.txt
EXPOSE 8888
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8888"]
I am bashing my head against this and have been praying at the church of google for a while now. I have checked the context and it seems to be correct. my build command is;
sudo docker build -t coroner:dev .
Docker Version Docker version 19.03.6, build 369ce74a3c
Can somebody put me out of my misery, please?

You've got a typo in 'requirements.txt' in the destination, you've put 'requirments.txt'.
However, because you're simply copying this to where you've specified your WORKDIR, you can just do:
COPY requirements.txt .
The file will then be copied into your CWD.

Related

pip - is it possible to preinstall a lib, so that any requirements.txt that is run containing that lib won't need to re-build it?

So my scenario is that I'm trying to create a Dockerfile that I can build on my Mac for running Spacy in production. The production server contains a Nvidia GPU with CUDA. To get Spacy to use GPU, I need the lib cupy-cuda117. That lib won't build on my Mac because it can't find the CUDA GPU. So what I'm trying to do is create an image from the Linux server that has the CUDA GPU, that's already pre-build cupy-cuda117 on it. I'll then use that as the parent image for Docker, as all other libs in my requirements.txt will build on my Mac.
My goal at the moment is to build that lib into the server, but I'm not sure the right path forward. Is it sudo pip3 intall cupy-cuda117? Or should I create a venv, and pip3 install cupy-cuda117? Basically my goal is later to add all the other app code and full requirements.txt, and when pip3 install -r requirements.txt is run by Docker, it'll download/build/install everything, but not cupy-cuda117, because hopefully it'll see that it's already been built.
FYI the handling of using GPU on the prod server and CPU on the dev computer i've already got sorted, it's just the building of that one package I'm stuck on. I basically just need it not to try and rebuild on my Mac. Thanks!
FROM "debian:bullseye-20210902-slim" as builder
# install build dependencies
RUN apt-get update -y && apt-get install --no-install-recommends -y build-essential git locales \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN apt update -y && apt upgrade -y && apt install -y sudo
# Install Python 3.9 reqs
RUN sudo apt install -y --no-install-recommends wget libxml2 libstdc++6 zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
# Install Python 3.9
RUN wget --no-check-certificate https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz && \
tar -xf Python-3.9.1.tgz && \
cd Python-3.9.1 && \
./configure --enable-optimizations && \
make -j $(nproc) && \
sudo make altinstall && \
cd .. && \
sudo rm -rf Python-3.9.1 && \
sudo rm -rf Python-3.9.1.tgz
# Install CUDA
RUN wget --no-check-certificate https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run && \
sudo chmod +x cuda_11.7.1_515.65.01_linux.run && \
sudo ./cuda_11.7.1_515.65.01_linux.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-11.7 --samplespath=/usr/local/cuda --no-opengl-libs && \
sudo ln -s /usr/local/cuda-11.7 /usr/local/cuda && \
sudo rm -rf cuda_11.7.1_515.65.01_linux.run
## Add NVIDIA CUDA to PATH and LD_LIBRARY_PATH ##
RUN echo 'case ":${PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${PATH}" ] ; then\n\
PATH=/usr/local/cuda-11.7/bin\n\
else\n\
PATH=/usr/local/cuda-11.7/bin:$PATH\n\
fi\n\
esac\n\
case ":${LD_LIBRARY_PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${LD_LIBRARY_PATH}" ] ; then\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64\n\
else\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH\n\
fi\n\
esac\n\
export PATH LD_LIBRARY_PATH\n\
export GLPATH=/usr/lib/x86_64-linux-gnu\n\
export GLLINK=-L/usr/lib/x86_64-linux-gnu\n\
export DFLT_PATH=/usr/lib\n'\
>> ~/.bashrc
ENV PATH="$PATH:/usr/local/cuda-11.7/bin"
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64"
ENV GLPATH="/usr/lib/x86_64-linux-gnu"
ENV GLLINK="-L/usr/lib/x86_64-linux-gnu"
ENV DFLT_PATH="/usr/lib"
RUN python3.9 -m pip install -U wheel setuptools
RUN sudo pip3.9 install torch torchvision torchaudio
RUN sudo pip3.9 install -U 'spacy[cuda117,transformers]'
# set runner ENV
ENV ENV="prod"
CMD ["bash"]
My local Dockerfile is this:
FROM myacct/myimg:latest
ENV ENV=prod
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
COPY ./requirements /code/requirements
RUN pip3 install --no-cache-dir -r /code/requirements.txt
COPY ./app /code/app
ENV ENV=prod
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

Why is there no console output from my docker container?

I have an application that consists of a docker container with a redis instance, another docker container with a rabbitmq server, and a third container in which I'm trying to start a number of celery workers that can interact with the redis and rabbitmq containers. I'm starting and stopping these workers via a rest api, and have checked that I'm able to do this on my host machine. However, after moving the setup to docker, it seems the workers are not behaving as expected. Whereas on my host machine (windows 10) I was able to see the reply from the rest api and console output from the workers, I can only see the response from the rest api (a log message) and no console output. It also seems that the workers are not accessing the redis and rabbitmq instances.
My docker container is built from a python3.6 (linux) base image. I have checked that everything is installed correctly, and there are no error logs. I build the image with the following dockerfile:
FROM python:3.6
WORKDIR /opt
# create a virtual environment and add it to PATH so that it is applied for
# all future RUN and CMD calls
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# install msodbcsql17
RUN apt-get update \
&& apt-get install -y curl apt-transport-https gnupg2 \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
\
&& curl https://packages.microsoft.com/config/debian/9/prod.list >
/etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install -y msodbcsql17 mssql-tools
# Install Mono for pythonnet.
RUN apt-get update \
&& apt-get install --yes \
dirmngr \
clang \
gnupg \
ca-certificates \
# Dependency for pyodbc.
unixodbc-dev \
&& apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys
3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF \
&& echo "deb http://download.mono-project.com/repo/debian
stretch/snapshots/5.20 main" | tee /etc/apt/sources.list.d/mono-official-
stable.list \
&& apt-get update \
&& apt-get install --yes \
mono-devel=5.20\* \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
COPY src ./src
COPY setup.py ./setup.py
COPY config.json ./config.json
COPY Utility.dll ./Utility.dll
COPY settings_docker.ini ./settings.ini
COPY config.json ./config.json
COPY sql_config.json ./sql_config.json
RUN python3 -m venv $VIRTUAL_ENV \
# From here on, use virtual env's python.
&& venv/bin/pip install --upgrade pip \
&& venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel \
&& venv/bin/pip install --no-cache-dir -r requirements.txt \
# Dependency for pythonnet.
&& venv/bin/pip install --no-cache-dir pycparser \
&& venv/bin/pip install -U --no-cache-dir "pythonnet==2.5.1" \
# && python -m pip install --no-cache-dir "pyodbc==4.0.25" "pythonnet==2.5.1"
EXPOSE 8081
cmd python src/celery_artifacts/docker_workers/worker_app.py
And then run it with this command:
docker run --name app -p 8081:8081 app
I then attach the container to the same bridge network as the other 2:
docker network connect my_network app
Is there a way to see the same console output from my container as the one on the host?

ModuleNotFoundError: No module named 'turbodbc' on running the docker image

I'm using Cloudera Hive ODBC driver in my code and I'm trying to containerize the app.
Below is my Dockerfile,
FROM ubuntu:18.04
FROM continuumio/anaconda3
FROM node:10
RUN conda update -n base -c defaults conda
RUN conda create -n env python=3.7
RUN echo "conda activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
python-pip python-dev python-setuptools \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install pyyaml pandas numpy pymysql sqlalchemy schedule tornado
RUN apt-get update && apt-get install -y --no-install-recommends git unzip unixodbc unixodbc-dev
RUN conda install -c conda-forge turbodbc=3.1.1
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY entry.sh /usr/local/bin/
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
&& apt-get install -y --no-install-recommends dialog \
&& apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& echo "$SSH_PASSWD" | chpasswd
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 5000 2222 22 80 8000
CMD ["entry.sh"]
Building Image is getting successful, but I see when I run the docker image, I see below error
Traceback (most recent call last):
File "app.py", line 14, in <module>
from abc_scheduler import scheduler_main
File "/app/abc_scheduler.py", line 5, in <module>
from methods import Methods
File "/app/methods.py", line 10, in <module>
from utils import *
File "/app/utils.py", line 2, in <module>
from turbodbc import connect, make_options
ModuleNotFoundError: No module named 'turbodbc'
I have tried many other ODBC's inside my Dockerfile, but no luck. Any help would be great.

As suggested by #DavidMaze, I managed create a successful Dockerfile & is shown below
FROM ubuntu:latest
FROM continuumio/anaconda3
FROM node:10
RUN conda update -n base -c defaults conda
RUN conda create -n env python=3.7
RUN echo 'conda init bash' >/.bashrc
RUN echo "conda activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
python-pip python-dev python-setuptools \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
# ==================TURBODBC========================
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y
RUN apt-get install -y alien # optional
COPY ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm /opt/cloudera/
RUN alien /opt/cloudera/ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm
RUN dpkg -i clouderahiveodbc_2.6.1.1001-2_amd64.deb
# ==================END=============================
RUN conda install --name env -c conda-forge turbodbc=4.1.1 tornado=6.0.4 pyyaml pymysql schedule sqlalchemy pyarrow numpy=1.19.3\
pandas=1.1.4 pybind11 pyarrow
COPY odbc.ini /etc/
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
&& apt-get install -y --no-install-recommends dialog \
&& apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& echo "$SSH_PASSWD" | chpasswd
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 9988 2222 22 80 8000
CMD ["entry.sh"]
Keep a copy of ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm in the current directory
Keep the below files as well :
odbc.ini - which has the DB info
entry.sh - which is shell script and has a command - python app.py
ssh_config - file without any extension has the information as shown below
Port 2222
ListenAddress 0.0.0.0
LoginGraceTime 180
X11Forwarding yes
Ciphers aes128-cbc,3des-cbc,aes256-cbc
MACs hmac-sha1,hmac-sha1-96
StrictModes yes
SyslogFacility DAEMON
PrintMotd no
IgnoreRhosts no
#deprecated option
#RhostsAuthentication no
RhostsRSAAuthentication yes
RSAAuthentication no
PasswordAuthentication yes
PermitEmptyPasswords no
PermitRootLogin yes

I want to expand the answer by showing an approach that works without conda being necessary. In other words, a full-pip minimum viable docker setup for installing turbodbc. I've fully documented the solution in this Github comment in the official turbodbc repo.

Avoid dependency hell on docker

I build an AI application in Python involving quiet an amount of Python libraries. At this point, I would like to run my application inside of a docker container to make the AI App a service.
What are my options concerning dependencie so that all necessary libraries are downloaded automatically?
As an weak alternative, I tried this with a "requirement.txt" file on the same level as my Docker build file, but this didn't work.

Your Dockerfile will need instructions to install the requirements, e.g.
COPY requirement.txt requirement.txt
RUN pip install -r requirement.txt

Thank you for the very useful comments:
My dockerfile:
# Python 3.7.3
FROM python:3.7-slim
# Set the working directory to /app
WORKDIR /app
COPY greeter_server.py /app
COPY AspenTechClient.py /app
COPY OpcUa_pb2.py /app
COPY OpcUa_pb2_grpc.py /app
COPY helloworld_pb2.py /app
COPY helloworld_pb2_grpc.py /app
COPY Models.py /app
ADD ./requirement.txt /app
# Training & Validation data we need
RUN mkdir -p /app/output
RUN pip install -r requirement.txt
#RUN pip3 install grpcio grpcio-tools
#RUN pip install protobuf
#RUN pip install pandas
#RUN pip install scipy
#expose ports to outside container for web app access
EXPOSE 10500
# Argument to python command
CMD [ "python", "/app/greeter_server.py" ]
By the tips here, I already added the extra lines for "requirement.txt" and that works like a charm. Thank you very much!
Since I only want to run a deployment in the container, I will forseen trained models so no need for a GPU. For this I have a local machine. With an appropriate mount I deliver the .h5 to the container.
#pyeR_biz: Thank you very much for the tips about pipelines. This is something I didn't have experience with but certainly will do it in the near future.

You have several options. It depends a lot on the use case, the number of containers you will eventually build, production vs dev environment etc.
Generally if you have an AI application you will need a graphics card driver pre-installed on your host system for model training. Which means eventually you'll have to come up with a way to automate driver install or write instructions for end users to do that. For an app you might also need database drivers in the docker image, if your front or back end databases are outside the container. Here is a toned-down example of one of my uses cases with requirement being building docker for a data pipeline.
#Taken from puckel/docker-airflow
#can look up this image name on google to see which OS it is based on.
FROM python:3.6-slim-buster
LABEL maintainer="batman"
# Never prompt the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux
# Set some default configuration for data pipeline management tool called airflow
ARG AIRFLOW_VERSION=1.10.9
ARG AIRFLOW_USER_HOME=/usr/local/airflow
ARG AIRFLOW_DEPS=""
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}
# here install some linux dependencies required to run the pipeline.
# use apt-get install, apt-get auto-remove etc to reduce size of image
# curl and install sql server odbc driver for my linux
RUN set -ex \
&& buildDeps=' freetds-dev libkrb5-dev libsasl2-dev libssl-dev libffi-dev libpq-dev git' \
&& apt-get update -yqq \
&& apt-get upgrade -yqq \
&& apt-get install -yqq --no-install-recommends \
$buildDeps freetds-bin build-essential default-libmysqlclient-dev \
apt-utils curl rsync netcat locales gnupg wget \
&& useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \ #
&& curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install -y msodbcsql17 \
&& ACCEPT_EULA=Y apt-get install -y mssql-tools \
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
&& apt-get purge --auto-remove -yqq $buildDeps \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base
# Install all required packages in python environment from requirements.txt (I generally remove version numbers if my python version are same)
ADD ./requirements.txt /config/
RUN pip install -r /config/requirements.txt
# CLEANUP
RUN apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base
#CONFIGURATION
COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg
# hand ownership of libraries to relevant user
RUN chown -R airflow: ${AIRFLOW_USER_HOME}
#expose ports to outside container for web app access
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
1) Select an appropriate base image which has the operating system you need.
2) Get your gpu drivers installed if you are training a model, not mandatory if you are serving the model

Prevent docker-compose from reinstalling requirements.txt while using a built image

I have an app ABC, which I want to put on docker environment. I built a Dockerfile and got the image abcd1234 which I used in docker-compose.yml
But on trying to build the docker-compose, All the requirements.txt files are getting reinstalled. Can it not use the already existing image and prevent time from reinstalling it?
I'm new to docker and trying to understand all the parameters. Also, is the 'context' correct? in docker-compose.yml or it should contain path inside the Image?
PS, my docker-compose.yml is not in same directory of project because I'll be using multiple images to expose more ports.
docker-compose.yml:
services:
app:
build:
context: /Users/user/Desktop/ABC/
ports:
- "8000:8000"
image: abcd1234
command: >
sh -c "python manage.py migrate &&
python manage.py runserver 0.0.0.0:8000"
environment:
- PROJECT_ENV=development
Dockerfile:
FROM python:3.6-slim-buster AS build
MAINTAINER ABC
ENV PYTHONUNBUFFERED 1
RUN python3 -m venv /venv
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y git && \
apt-get install -y build-essential && \
apt-get install -y awscli && \
apt-get install -y unzip && \
apt-get install -y nano
RUN apt-get install -y libsm6 libxext6 libxrender-dev
COPY . /ABC/
RUN apt-cache search mysql-server
RUN apt-cache search libmysqlclient-dev
RUN apt-get install -y libpq-dev
RUN apt-get install -y postgresql
RUN apt-cache search postgresql-server-dev-9.5
RUN pip install --upgrade awscli==1.14.5 s3cmd==2.0.1 python-magic
RUN pip install -r /ABC/requirements.txt
WORKDIR .
Please guide me on how to tackle these 2 scenarios. Thanks!

The context: directory is the directory on your host system that includes the Dockerfile. It's the same directory you would pass to docker build, and it frequently is just the current directory ..
Within the Dockerfile, Docker can cache individual build steps so that it doesn't repeat them, but only until it reaches the point where something has changed. That "something" can be a changed RUN line, but at the point of your COPY, if any file at all changes in your local source tree that also invalidates the cache for everything after it.
For this reason, a typical Dockerfile has a couple of "phases"; you can repeat this pattern in other languages too. You can restructure your Dockerfile in this order:
# 1. Base information; this almost never changes
FROM python:3.6-slim-buster AS build
MAINTAINER ABC
ENV PYTHONUNBUFFERED 1
WORKDIR /ABC
# 2. Install OS packages. Doesn't depend on your source tree.
# Frequently just one RUN line (but could be more if you need
# packages that aren't in the default OS package repository).
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get upgrade -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential unzip libxrender-dev libpq-dev
# 3. Copy _only_ the file that declares language-level dependencies.
# Repeat starting from here only if this file changes.
COPY requirements.txt .
RUN pip install -r requirements.txt
# 4. Copy the rest of the application in. In a compiled language
# (Javascript/Webpack, Typescript, Java, Go, ...) build it.
COPY . .
# 5. Explain how to run the application.
EXPOSE 8000
CMD python manage.py migrate && \
python manage.py runserver 0.0.0.0:8000

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.