Docker image on GCP: Where to copy files required by Python script

Docker image on GCP: Where to copy files required by Python script - python

I have a Python script that requires OpenCV's haarcascade.xml file to make a facial recognition. I successfully built and pushed Docker image on Kubernetes (Google Cloud Platform) with the following Dockerfile:
FROM python:3.7
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN apt-get update && apt-get install -y sudo && rm -rf /var/lib/apt/lists/*
RUN sudo apt-get update && sudo apt-get install -y apt-utils build-essential cmake && sudo apt-get install -y libgtk-3-dev && sudo apt-get install -y libboost-all-dev && sudo apt-get install -y default-libmysqlclient-dev
RUN pip install setuptools mysqlclient cmake Flask gunicorn pybase64 google-cloud-vision google-cloud-storage protobuf nltk fuzzywuzzy PyPDF2 numpy python-csv google-cloud-language pandas SQLAlchemy PyMySQL pytz Unidecode torch tensorflow==1.15 transformers imutils scikit-learn scikit-image scipy==1.4.1 opencv-python text2num sklearn
RUN pip install dlib
COPY app.py ./app.py
COPY haarcascade_frontalface_default.xml ./haarcascade_frontalface_default.xml
CMD ["python", "./app.py"]
In the Python notebook I have haarcascade.xml file without a path:
def face_recog(image):
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
img = cv2.imread(image)
The pod runs smoothly, no errors, no restarts, but this part of code is not working due to the "missing" haarcascade file.
I know I have two options:
Put the full path of haarcascade in the notebook (which I don't know given is a Docker image)
Fix the COPY haarcascade.xml file in Dockerfile so that Python script finds it.
Any help is appreciated.

Related

Error while running a flask app using its docker image

I made a docker image of a flask app and when I am trying to run that image I am getting this error
ERROR [7/7] COPY ./Users/chaitanya/Documents/python/covidform /opt/source-code 0.0s
[7/7] COPY ./Users/chaitanya/Documents/python/covidform /opt/source-code:failed to compute cache key: "/Users/chaitanya/Documents/python/covidform" not found: not found
Here is the docker file I have created.
FROM ubuntu
RUN apt-get update
RUN set -xe \
&& apt-get update \
&& apt-get -y install python3-pip
#RUN apt-get -y install python3
RUN pip install --upgrade pip
RUN pip install flask
RUN pip install install flask-mysql
COPY ./Users/chaitanya/Documents/python/covidform /opt/source-code
ENTRYPOINT ["python3"]

How can we use opencv in a multistage docker image?

I recently learned about the concept of building docker images based on a multi-staged Dockerfile.
I have been trying simple examples of multi-staged Dockerfiles, and they were working fine. However, when I tried implementing the concept for my own application, I was facing some issues.
My application is about object detection in videos, so I use python and Tensorflow.
Here is my Dockerfile:
FROM python:3-slim AS base
WORKDIR /objectDetector
COPY detect_objects.py .
COPY detector.py .
COPY requirements.txt .
ADD data /objectDetector/data/
ADD models /objectDetector/models/
RUN apt-get update && \
apt-get install protobuf-compiler -y && \
apt-get install ffmpeg libsm6 libxext6 -y && \
apt-get install gcc -y
RUN pip3 install update && python3 -m pip install --upgrade pip
RUN pip3 install tensorflow-cpu==2.9.1
RUN pip3 install opencv-python==4.6.0.66
RUN pip3 install opencv-contrib-python
WORKDIR /objectDetector/models/research
RUN protoc object_detection/protos/*.proto --python_out=.
RUN cp object_detection/packages/tf2/setup.py .
RUN python -m pip install .
RUN python object_detection/builders/model_builder_tf2_test.py
WORKDIR /objectDetector/models/research
RUN pip3 install wheel && pip3 wheel . --wheel-dir=./wheels
FROM python:3-slim
RUN pip3 install update && python3 -m pip install --upgrade pip
COPY --from=base /objectDetector /objectDetector
WORKDIR /objectDetector
RUN pip3 install --no-index --find-links=/objectDetector/models/research/wheels -r requirements.txt
When I try to run my application in the final stage of the container, I receive the following error:
root#3f062f9a5d64:/objectDetector# python detect_objects.py
Traceback (most recent call last):
File "/objectDetector/detect_objects.py", line 3, in <module>
import cv2
ModuleNotFoundError: No module named 'cv2'
So per my understanding, it seems that opencv-python is not successfully moved from the 1st stage to the 2nd.
I have been searching around, and I found some good blogs and questions tackling the issue of multi-staging Dockerfiles, specifically for python libraries. However, it seems I missing something here.
Here are some references that I have been following to solve the issue:
How do I reduce a python (docker) image size using a multi-stage build?
Multi-stage build usage for cuda,cudnn,opencv and ffmpeg #806
So my question is: How can we use opencv in a multistage docker image?

how to integrate torch into a docker image while keeping image size reasonable?

So I've a Flask web app that will be exposing some deep learning models.
I built the image and everything works fine.
the problem is the size of this image is 5.58GB! which is a bit ridiculous.
I have some deep learning models that are copied during the build, I thought they might be the culprit but their size combined does not exceed 300MB so that's definately not it.
upon checking the history and the size of each layer I discovered this:
RUN /bin/sh -c pip install -r requirements.txt is taking up 771MB.
RUN /bin/sh -c pip install torch==1.10.2 is taking up 2.8GB!
RUN /bin/sh -c apt-get install ffmpeg libsm6 libxext6 is taking up 400MB.
so how do I incorporate these libraries while keeping image size reasonable? is it ok to have images of these size when deploying ml models in python?
below is the root directory:
Dockerfile:
FROM python:3.7.13
WORKDIR /app
COPY ["rdm.pt", "autosort_model.pt", "rotated_model.pt", "yolov5x6.pt", "/app/"]
RUN pip install torch==1.10.2
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
COPY . /app
CMD python ./app.py
.dockerignore:
Dockerfile
README.md
__pycache__

By default torch will package CUDA packages and stuff. Add --extra-index-url https://download.pytorch.org/whl/cpu and --no-cache-dir to pip install command if you do not require CUDA.
RUN pip install --no-cache-dir -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
Also it's good practice to remove the apt list cache:
RUN apt-get update \
&& apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
&& rm -rf /var/lib/apt/lists/*

Docker- How to save files locally when I run docker container using bind mount?

I'm using Docker to automate my backend work in Python. I have a file backend.py, which when executed, downloads pdf files and converts them into images.
This is my Dockerfile:
FROM python:3.6.3
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get install -y ghostscript libgs-dev
RUN apt-get install -y libmagickwand-dev imagemagick --fix-missing
RUN apt-get install -y libpng-dev zlib1g-dev libjpeg-dev
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
ADD backend.py .
ADD Vera.ttf .
CMD [ "python", "backend.py" ]
What I want is when I run the Dockerfile using the command:
docker run -d -it --name devtest-1 --mount type=bind,source=D:\projects\imageProject\public\assets,target=/app/data kidsuki-test3
I want the the pdf files and images to get stored in my local machine in the path: "D:\projects\imageProject\public\assets" and also get stored in the container in the path : "/app/data"
But for now, what I'm getting is, it copies the files in my "D:\projects\imageProject\public\assets" folder and stores it in "/app/data" in the docker container devtest-1
Thanks in advance!

problem install pika (rabbitmq sdk in python ) in docker _ no module named 'pika'

I am trying to install rabbitmq (pika) driver in my python container, but in local deployment, there is no problem.
FROM ubuntu:20.04
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN apt-get update && apt-get -y install gcc python3.7 python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python","index.py"]
this is my requerments.txt file :
requests
telethon
Flask
flask-mongoengine
Flask_JWT_Extended
Flask_Bcrypt
flask-restful
flask-cors
jsonschema
werkzeug
pandas
xlrd
Kanpai
pika
Flask-APScheduler
docker build steps complete with no error and install all the dependencies with no error but when I try to run my container it crashes with this error :
no module named 'pika'

installing python3.7 will not work here, you are still using python3.8 by using pip3 command and your CMD will also start python3.8, I suggest you to use python:3.7 base image
so try this:
FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN apt-get update && apt-get -y install gcc
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
COPY . .
CMD ["python","index.py"]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.