How to setup selenium for python in docker? - python

How to setup selenium in docker in an ubuntu environment as I am also using tesseract for OCR and also flask.
How should I make my dockerfile?
This is my dockerfile
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install tesseract-ocr -y \
python3 \
#python-setuptools \
python3-pip \
&& apt-get clean \
&& apt-get autoremove
ADD . /home/App
WORKDIR /home/App
COPY requirements.txt ./
COPY . .
RUN pip3 install -r requirements.txt
VOLUME ["/data"]
EXPOSE 5001
ENTRYPOINT [ "python3" ]
CMD [ "app.py" ]
Any help is greatly appreciated!

Related

pip - is it possible to preinstall a lib, so that any requirements.txt that is run containing that lib won't need to re-build it?

So my scenario is that I'm trying to create a Dockerfile that I can build on my Mac for running Spacy in production. The production server contains a Nvidia GPU with CUDA. To get Spacy to use GPU, I need the lib cupy-cuda117. That lib won't build on my Mac because it can't find the CUDA GPU. So what I'm trying to do is create an image from the Linux server that has the CUDA GPU, that's already pre-build cupy-cuda117 on it. I'll then use that as the parent image for Docker, as all other libs in my requirements.txt will build on my Mac.
My goal at the moment is to build that lib into the server, but I'm not sure the right path forward. Is it sudo pip3 intall cupy-cuda117? Or should I create a venv, and pip3 install cupy-cuda117? Basically my goal is later to add all the other app code and full requirements.txt, and when pip3 install -r requirements.txt is run by Docker, it'll download/build/install everything, but not cupy-cuda117, because hopefully it'll see that it's already been built.
FYI the handling of using GPU on the prod server and CPU on the dev computer i've already got sorted, it's just the building of that one package I'm stuck on. I basically just need it not to try and rebuild on my Mac. Thanks!
FROM "debian:bullseye-20210902-slim" as builder
# install build dependencies
RUN apt-get update -y && apt-get install --no-install-recommends -y build-essential git locales \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN apt update -y && apt upgrade -y && apt install -y sudo
# Install Python 3.9 reqs
RUN sudo apt install -y --no-install-recommends wget libxml2 libstdc++6 zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
# Install Python 3.9
RUN wget --no-check-certificate https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz && \
tar -xf Python-3.9.1.tgz && \
cd Python-3.9.1 && \
./configure --enable-optimizations && \
make -j $(nproc) && \
sudo make altinstall && \
cd .. && \
sudo rm -rf Python-3.9.1 && \
sudo rm -rf Python-3.9.1.tgz
# Install CUDA
RUN wget --no-check-certificate https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run && \
sudo chmod +x cuda_11.7.1_515.65.01_linux.run && \
sudo ./cuda_11.7.1_515.65.01_linux.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-11.7 --samplespath=/usr/local/cuda --no-opengl-libs && \
sudo ln -s /usr/local/cuda-11.7 /usr/local/cuda && \
sudo rm -rf cuda_11.7.1_515.65.01_linux.run
## Add NVIDIA CUDA to PATH and LD_LIBRARY_PATH ##
RUN echo 'case ":${PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${PATH}" ] ; then\n\
PATH=/usr/local/cuda-11.7/bin\n\
else\n\
PATH=/usr/local/cuda-11.7/bin:$PATH\n\
fi\n\
esac\n\
case ":${LD_LIBRARY_PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${LD_LIBRARY_PATH}" ] ; then\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64\n\
else\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH\n\
fi\n\
esac\n\
export PATH LD_LIBRARY_PATH\n\
export GLPATH=/usr/lib/x86_64-linux-gnu\n\
export GLLINK=-L/usr/lib/x86_64-linux-gnu\n\
export DFLT_PATH=/usr/lib\n'\
>> ~/.bashrc
ENV PATH="$PATH:/usr/local/cuda-11.7/bin"
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64"
ENV GLPATH="/usr/lib/x86_64-linux-gnu"
ENV GLLINK="-L/usr/lib/x86_64-linux-gnu"
ENV DFLT_PATH="/usr/lib"
RUN python3.9 -m pip install -U wheel setuptools
RUN sudo pip3.9 install torch torchvision torchaudio
RUN sudo pip3.9 install -U 'spacy[cuda117,transformers]'
# set runner ENV
ENV ENV="prod"
CMD ["bash"]
My local Dockerfile is this:
FROM myacct/myimg:latest
ENV ENV=prod
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
COPY ./requirements /code/requirements
RUN pip3 install --no-cache-dir -r /code/requirements.txt
COPY ./app /code/app
ENV ENV=prod
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

How to use GITHUB_TOKEN in pip's requirements.txt without setting it as env variable in Dockerfile?

I have a private repos that can be installable via python's pip:
requirements.txt
git+https://${GITHUB_TOKEN}#github.com/MY_ACCOUNT/MY_REPO.git
And a Dockerfile:
Dockerfile
FROM python:3.8.11
RUN apt-get update && \
apt-get -y install gcc curl && \
rm -rf /var/lib/apt/lists/*
ARG GITHUB_TOKEN
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
It worked perfectly when i build up an image:
$ docker build . --build-arg GITHUB_TOKEN=THIS_IS_MY_GITHUB_TOKEN -t wow/my_app:latest
But when I inspected image, it shows GITHUB_TOKEN in Cmd section:
$ docker image inspect wow/my_app:latest
...
"ContainerConfig": {
...
"Cmd": [
"|1",
"GITHUB_TOKEN=THIS_IS_MY_GITHUB_TOKEN", # Here!
"/bin/sh",
"-c",
"pip install -r /tmp/requirements.txt"
],
...
},
...
I think this could lead to a security problem. How can I solve this so that anything credential info not appear in docker inspect?
If you build your image using BuildKit, you can take advantage of Docker build secrets.
You would structure your Dockerfile something like this:
FROM python:3.8.11
RUN apt-get update && \
apt-get -y install gcc curl && \
rm -rf /var/lib/apt/lists/*
COPY ./requirements.txt /tmp/requirements.txt
RUN --mount=type=secret,id=GITHUB_TOKEN \
GITHUB_TOKEN=$(cat /run/secrets/GITHUB_TOKEN) \
pip install -r /tmp/requirements.txt
And then if you have a GITHUB_TOKEN environment variable in your local environment, you could run:
docker buildx build --secret id=GITHUB_TOKEN -t myimage .
Or if you have the value in a file, you could run:
docker buildx build \
--secret id=GITHUB_TOKEN,src=github_token.txt \
-t myimage .
In either case, the setting will not be baked into the resulting image. See the linked documentation for more information.

Implementing Poetry inside Docker Container

I currently Have a Docker container I would like to start using poetry inside it.
I have looked at several documentations and examples but unsure if this would be the correct case for me. Poetry and its dependencies are already installed within the project.
The previous dockerfiles is
FROM python:3.9 as base
ENV PYTHONUNBUFFERED 1
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
binutils libproj-dev gdal-bin \
netcat postgresql-client sudo curl \
&& apt-get clean -y \
&& mkdir /code
ADD requirements.txt /code/requirements.txt
RUN pip install -r /code/requirements.txt
FROM base
ADD requirements.txt /code/requirements.txt
RUN pip install -r /code/requirements.txt
COPY . /code
WORKDIR /code
ENTRYPOINT ["./docker-entrypoint.sh"]
I have added..
FROM python:3.9 as base
ENV PYTHONUNBUFFERED 1
WORKDIR /code
From base as builder
ENV PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.0.0 \
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
binutils libproj-dev gdal-bin \
netcat postgresql-client sudo curl \
&& apt-get clean -y \
&& mkdir /code
ADD requirements.txt /code/requirements.txt
RUN pip install -r /code/requirements.txt
RUN curl -sSL https://install.python-poetry.org | python3 - --version $POETRY_VERSION
RUN python -m venv /venv
COPY poetry.lock pyproject.toml /code/
COPY . /code
FROM base as final
ADD requirements.txt /code/requirements.txt
RUN pip install -r /code/requirements.txt
COPY . /code
WORKDIR /code
ENTRYPOINT ["./docker-entrypoint.sh"]
Is this the correct way of implementing poetry inside docker containers? How would I test this out?
Three main things are important while it's poetry,
RUN pip3 install -U pip poetry
RUN poetry config virtualenvs.create false
RUN poetry install --no-interaction --no-ansi
Install the poetry
Set not to create a virtualenv
Install the packages.
How you can test.
You can just build the Dockerfile to create an image.
So, docker build -t tag_name_your_project .

Avoid dependency hell on docker

I build an AI application in Python involving quiet an amount of Python libraries. At this point, I would like to run my application inside of a docker container to make the AI App a service.
What are my options concerning dependencie so that all necessary libraries are downloaded automatically?
As an weak alternative, I tried this with a "requirement.txt" file on the same level as my Docker build file, but this didn't work.
Your Dockerfile will need instructions to install the requirements, e.g.
COPY requirement.txt requirement.txt
RUN pip install -r requirement.txt
Thank you for the very useful comments:
My dockerfile:
# Python 3.7.3
FROM python:3.7-slim
# Set the working directory to /app
WORKDIR /app
COPY greeter_server.py /app
COPY AspenTechClient.py /app
COPY OpcUa_pb2.py /app
COPY OpcUa_pb2_grpc.py /app
COPY helloworld_pb2.py /app
COPY helloworld_pb2_grpc.py /app
COPY Models.py /app
ADD ./requirement.txt /app
# Training & Validation data we need
RUN mkdir -p /app/output
RUN pip install -r requirement.txt
#RUN pip3 install grpcio grpcio-tools
#RUN pip install protobuf
#RUN pip install pandas
#RUN pip install scipy
#expose ports to outside container for web app access
EXPOSE 10500
# Argument to python command
CMD [ "python", "/app/greeter_server.py" ]
By the tips here, I already added the extra lines for "requirement.txt" and that works like a charm. Thank you very much!
Since I only want to run a deployment in the container, I will forseen trained models so no need for a GPU. For this I have a local machine. With an appropriate mount I deliver the .h5 to the container.
#pyeR_biz: Thank you very much for the tips about pipelines. This is something I didn't have experience with but certainly will do it in the near future.
You have several options. It depends a lot on the use case, the number of containers you will eventually build, production vs dev environment etc.
Generally if you have an AI application you will need a graphics card driver pre-installed on your host system for model training. Which means eventually you'll have to come up with a way to automate driver install or write instructions for end users to do that. For an app you might also need database drivers in the docker image, if your front or back end databases are outside the container. Here is a toned-down example of one of my uses cases with requirement being building docker for a data pipeline.
#Taken from puckel/docker-airflow
#can look up this image name on google to see which OS it is based on.
FROM python:3.6-slim-buster
LABEL maintainer="batman"
# Never prompt the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux
# Set some default configuration for data pipeline management tool called airflow
ARG AIRFLOW_VERSION=1.10.9
ARG AIRFLOW_USER_HOME=/usr/local/airflow
ARG AIRFLOW_DEPS=""
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}
# here install some linux dependencies required to run the pipeline.
# use apt-get install, apt-get auto-remove etc to reduce size of image
# curl and install sql server odbc driver for my linux
RUN set -ex \
&& buildDeps=' freetds-dev libkrb5-dev libsasl2-dev libssl-dev libffi-dev libpq-dev git' \
&& apt-get update -yqq \
&& apt-get upgrade -yqq \
&& apt-get install -yqq --no-install-recommends \
$buildDeps freetds-bin build-essential default-libmysqlclient-dev \
apt-utils curl rsync netcat locales gnupg wget \
&& useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \ #
&& curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install -y msodbcsql17 \
&& ACCEPT_EULA=Y apt-get install -y mssql-tools \
&& pip install apache-airflow[crypto,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
&& apt-get purge --auto-remove -yqq $buildDeps \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base
# Install all required packages in python environment from requirements.txt (I generally remove version numbers if my python version are same)
ADD ./requirements.txt /config/
RUN pip install -r /config/requirements.txt
# CLEANUP
RUN apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/* \
/usr/share/man \
/usr/share/doc \
/usr/share/doc-base
#CONFIGURATION
COPY script/entrypoint.sh /entrypoint.sh
COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg
# hand ownership of libraries to relevant user
RUN chown -R airflow: ${AIRFLOW_USER_HOME}
#expose ports to outside container for web app access
EXPOSE 8080 5555 8793
USER airflow
WORKDIR ${AIRFLOW_USER_HOME}
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"]
1) Select an appropriate base image which has the operating system you need.
2) Get your gpu drivers installed if you are training a model, not mandatory if you are serving the model

Dockerfile for Python DJango failing on COPY

I have a Dockerfile that fails on build with the error;
COPY failed: stat /var/lib/docker/tmp/docker-builder476469130/requirements.txt: no such file or directory
The error occurs on the COPY line for the requirments.txt file. I use a pretty standard Dockerfile;
FROM python:3.6.7-slim
# Version: 1.4
# Dockerfile to build the coroner container.
# Install Python and Package Libraries
RUN apt-get update && apt-get upgrade -y && apt-get autoremove && apt-get autoclean
RUN apt-get install -y \
libffi-dev \
libssl-dev \
default-libmysqlclient-dev \
libxml2-dev \
libxslt-dev \
libjpeg-dev \
libfreetype6-dev \
zlib1g-dev \
net-tools \
nano
ARG PROJECT=coroner
ARG PROJECT_DIR=/var/www/${PROJECT}
WORKDIR $PROJECT_DIR
ENV PYTHONUNBUFFERED 1
RUN mkdir -p $PROJECT_DIR
COPY requirements.txt $PROJECT_DIR/requirments.txt
RUN pip install --upgrade pip
RUN pip install -r $PROJECT_DIR/requirements.txt
EXPOSE 8888
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8888"]
I am bashing my head against this and have been praying at the church of google for a while now. I have checked the context and it seems to be correct. my build command is;
sudo docker build -t coroner:dev .
Docker Version Docker version 19.03.6, build 369ce74a3c
Can somebody put me out of my misery, please?
You've got a typo in 'requirements.txt' in the destination, you've put 'requirments.txt'.
However, because you're simply copying this to where you've specified your WORKDIR, you can just do:
COPY requirements.txt .
The file will then be copied into your CWD.

Categories

Resources