I am desperately trying to containerise my web scraping app (in python). It is using Selenium.
I'm receiving an error which indicates that the chromedriver binary needs to run as a REGULAR user.
How can I configure this in a dockerfile?
The error Im getting is
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable may have wrong permissions. Please see https://chromedriver.chromium.org/home
But when I read the chromedriver site I can see the root cause is that it is running the driver as root, and it doesn't like that.
Here is my dockerfile already:
#FROM python:3.9-buster
FROM --platform=linux/amd64 python:3.10-buster
#FROM --platform=linux/arm64/v8 python:3.9-buster
# FROM --platform=linux/amd64 python:3.9
# FROM selenium/node-chrome
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
RUN apt-get update \
&& apt-get -y install gcc make \
&& rm -rf /var/lib/apt/lists/*s
# RUN apt-get update
RUN apt-get install -y xvfb
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils
RUN apt-get install -y chromium
## RUN apt-get install -y chromium-browser
RUN apt-get install chromium-driver
# RUN apt-get install -y google-chrome-stable
# install chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
# RUN dpkg -i google-chrome-stable_current_amd64.deb --fix-missing; apt-get -fy install
RUN wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get update && apt-get install -y wget bzip2 libxtst6 packagekit-gtk3-module libx11-xcb-dev libdbus-glib-1-2 libxt6 libpci-dev && rm -rf /var/lib/apt/lists/*
RUN export PATH=$PATH:'/usr/local/bin/chromedriver'
#download and install chrome
RUN apt update -y
RUN apt install -y google-chrome-stable
RUN apt-get install -y google-chrome-stable
RUN python3 --version
RUN pip3 --version
RUN pip install --no-cache-dir --upgrade pip
RUN apt-get install -y libglib2.0-0 libnss3 libgconf-2-4 libfontconfig1
# COPY chromedriver "/usr/local/bin"
#install python dependencies
COPY requirements.txt requirements.txt
RUN pip install -r ./requirements.txt
#some envs
ENV APP_HOME /app
#set workspace
WORKDIR ${APP_HOME}
RUN chmod -x "/usr/bin/google-chrome"
RUN chmod -x "/usr/local/bin/chromedriver"
#copy local files
COPY . ${APP_HOME}
CMD ["python", "/app/main.py"]
Related
So my scenario is that I'm trying to create a Dockerfile that I can build on my Mac for running Spacy in production. The production server contains a Nvidia GPU with CUDA. To get Spacy to use GPU, I need the lib cupy-cuda117. That lib won't build on my Mac because it can't find the CUDA GPU. So what I'm trying to do is create an image from the Linux server that has the CUDA GPU, that's already pre-build cupy-cuda117 on it. I'll then use that as the parent image for Docker, as all other libs in my requirements.txt will build on my Mac.
My goal at the moment is to build that lib into the server, but I'm not sure the right path forward. Is it sudo pip3 intall cupy-cuda117? Or should I create a venv, and pip3 install cupy-cuda117? Basically my goal is later to add all the other app code and full requirements.txt, and when pip3 install -r requirements.txt is run by Docker, it'll download/build/install everything, but not cupy-cuda117, because hopefully it'll see that it's already been built.
FYI the handling of using GPU on the prod server and CPU on the dev computer i've already got sorted, it's just the building of that one package I'm stuck on. I basically just need it not to try and rebuild on my Mac. Thanks!
FROM "debian:bullseye-20210902-slim" as builder
# install build dependencies
RUN apt-get update -y && apt-get install --no-install-recommends -y build-essential git locales \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN apt update -y && apt upgrade -y && apt install -y sudo
# Install Python 3.9 reqs
RUN sudo apt install -y --no-install-recommends wget libxml2 libstdc++6 zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
# Install Python 3.9
RUN wget --no-check-certificate https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz && \
tar -xf Python-3.9.1.tgz && \
cd Python-3.9.1 && \
./configure --enable-optimizations && \
make -j $(nproc) && \
sudo make altinstall && \
cd .. && \
sudo rm -rf Python-3.9.1 && \
sudo rm -rf Python-3.9.1.tgz
# Install CUDA
RUN wget --no-check-certificate https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run && \
sudo chmod +x cuda_11.7.1_515.65.01_linux.run && \
sudo ./cuda_11.7.1_515.65.01_linux.run --silent --override --toolkit --samples --toolkitpath=/usr/local/cuda-11.7 --samplespath=/usr/local/cuda --no-opengl-libs && \
sudo ln -s /usr/local/cuda-11.7 /usr/local/cuda && \
sudo rm -rf cuda_11.7.1_515.65.01_linux.run
## Add NVIDIA CUDA to PATH and LD_LIBRARY_PATH ##
RUN echo 'case ":${PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${PATH}" ] ; then\n\
PATH=/usr/local/cuda-11.7/bin\n\
else\n\
PATH=/usr/local/cuda-11.7/bin:$PATH\n\
fi\n\
esac\n\
case ":${LD_LIBRARY_PATH}:" in\n\
*:"/usr/local/cuda-11.7/lib64":*)\n\
;;\n\
*)\n\
if [ -z "${LD_LIBRARY_PATH}" ] ; then\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64\n\
else\n\
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH\n\
fi\n\
esac\n\
export PATH LD_LIBRARY_PATH\n\
export GLPATH=/usr/lib/x86_64-linux-gnu\n\
export GLLINK=-L/usr/lib/x86_64-linux-gnu\n\
export DFLT_PATH=/usr/lib\n'\
>> ~/.bashrc
ENV PATH="$PATH:/usr/local/cuda-11.7/bin"
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64"
ENV GLPATH="/usr/lib/x86_64-linux-gnu"
ENV GLLINK="-L/usr/lib/x86_64-linux-gnu"
ENV DFLT_PATH="/usr/lib"
RUN python3.9 -m pip install -U wheel setuptools
RUN sudo pip3.9 install torch torchvision torchaudio
RUN sudo pip3.9 install -U 'spacy[cuda117,transformers]'
# set runner ENV
ENV ENV="prod"
CMD ["bash"]
My local Dockerfile is this:
FROM myacct/myimg:latest
ENV ENV=prod
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
COPY ./requirements /code/requirements
RUN pip3 install --no-cache-dir -r /code/requirements.txt
COPY ./app /code/app
ENV ENV=prod
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
I have a Python program which is to be executed in the Azure Kubernetes.
Below is my docker file - I have Python installed
#Ubuntu Base image with openjdk8 with TomEE
FROM demo.azurecr.io/ubuntu/tomee/openjdk8:8.0.x
RUN apt-get update && apt-get install -y telnet && apt-get install -y ksh && apt-get install -y python2.7.x && apt-get -y clean && rm -rf /var/lib/apt/lists/*
however I don't know how to install PIP and related dependent libraries (eg: pymssql)?
Best option is installing miniconda on docker image. I used it always when I need to have python on docker image without python or pip.
Here is part for installing minicinda in my simple docker image
FROM debian
RUN apt-get update && apt-get install -y curl wget
RUN rm -rf /var/lib/apt/lists/*
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda --version
I'm using Cloudera Hive ODBC driver in my code and I'm trying to containerize the app.
Below is my Dockerfile,
FROM ubuntu:18.04
FROM continuumio/anaconda3
FROM node:10
RUN conda update -n base -c defaults conda
RUN conda create -n env python=3.7
RUN echo "conda activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
python-pip python-dev python-setuptools \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install pyyaml pandas numpy pymysql sqlalchemy schedule tornado
RUN apt-get update && apt-get install -y --no-install-recommends git unzip unixodbc unixodbc-dev
RUN conda install -c conda-forge turbodbc=3.1.1
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY entry.sh /usr/local/bin/
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
&& apt-get install -y --no-install-recommends dialog \
&& apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& echo "$SSH_PASSWD" | chpasswd
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 5000 2222 22 80 8000
CMD ["entry.sh"]
Building Image is getting successful, but I see when I run the docker image, I see below error
Traceback (most recent call last):
File "app.py", line 14, in <module>
from abc_scheduler import scheduler_main
File "/app/abc_scheduler.py", line 5, in <module>
from methods import Methods
File "/app/methods.py", line 10, in <module>
from utils import *
File "/app/utils.py", line 2, in <module>
from turbodbc import connect, make_options
ModuleNotFoundError: No module named 'turbodbc'
I have tried many other ODBC's inside my Dockerfile, but no luck. Any help would be great.
As suggested by #DavidMaze, I managed create a successful Dockerfile & is shown below
FROM ubuntu:latest
FROM continuumio/anaconda3
FROM node:10
RUN conda update -n base -c defaults conda
RUN conda create -n env python=3.7
RUN echo 'conda init bash' >/.bashrc
RUN echo "conda activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
python-pip python-dev python-setuptools \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
# ==================TURBODBC========================
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y
RUN apt-get install -y alien # optional
COPY ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm /opt/cloudera/
RUN alien /opt/cloudera/ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm
RUN dpkg -i clouderahiveodbc_2.6.1.1001-2_amd64.deb
# ==================END=============================
RUN conda install --name env -c conda-forge turbodbc=4.1.1 tornado=6.0.4 pyyaml pymysql schedule sqlalchemy pyarrow numpy=1.19.3\
pandas=1.1.4 pybind11 pyarrow
COPY odbc.ini /etc/
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
&& apt-get install -y --no-install-recommends dialog \
&& apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& echo "$SSH_PASSWD" | chpasswd
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 9988 2222 22 80 8000
CMD ["entry.sh"]
Keep a copy of ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm in the current directory
Keep the below files as well :
odbc.ini - which has the DB info
entry.sh - which is shell script and has a command - python app.py
ssh_config - file without any extension has the information as shown below
Port 2222
ListenAddress 0.0.0.0
LoginGraceTime 180
X11Forwarding yes
Ciphers aes128-cbc,3des-cbc,aes256-cbc
MACs hmac-sha1,hmac-sha1-96
StrictModes yes
SyslogFacility DAEMON
PrintMotd no
IgnoreRhosts no
#deprecated option
#RhostsAuthentication no
RhostsRSAAuthentication yes
RSAAuthentication no
PasswordAuthentication yes
PermitEmptyPasswords no
PermitRootLogin yes
I want to expand the answer by showing an approach that works without conda being necessary. In other words, a full-pip minimum viable docker setup for installing turbodbc. I've fully documented the solution in this Github comment in the official turbodbc repo.
I am building an image using the following Dockerfile:
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install -y python3-pip python3-dev \
&& cd /usr/local/bin \
&& ln -s /usr/bin/python3 python \
&& pip3 install --upgrade pip
# Setup the Python's configs
RUN pip install --upgrade pip && \
pip install --no-cache-dir matplotlib==3.0.2 pandas==0.23.4 numpy==1.16.3 && \
pip install --no-cache-dir pybase64 && \
pip install --no-cache-dir scipy && \
pip install --no-cache-dir dask[complete] && \
pip install --no-cache-dir dash==1.6.1 dash-core-components==1.5.1 dash-bootstrap-components==0.7.1 dash-html-components==1.0.2 dash-table==4.5.1 dash-daq==0.2.2 && \
pip install --no-cache-dir plotly && \
pip install --no-cache-dir adjustText && \
pip install --no-cache-dir networkx && \
pip install --no-cache-dir scikit-learn && \
pip install --no-cache-dir tzlocal
# Setup the R configs
RUN apt-get update
RUN apt-get install -y software-properties-common
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/'
RUN apt update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt install -y r-base
RUN pip install rpy2==2.9.4
RUN apt-get -y install libxml2 libxml2-dev libcurl4-gnutls-dev libssl-dev
RUN echo "r <- getOption('repos'); r['CRAN'] <- 'https://cran.r-project.org'; options(repos = r);" > ~/.Rprofile
RUN Rscript -e "install.packages('BiocManager')"
RUN Rscript -e "BiocManager::install('ggplot2')"
RUN Rscript -e "BiocManager::install('DESeq2')"
RUN Rscript -e "BiocManager::install('RColorBrewer')"
RUN Rscript -e "BiocManager::install('ggrepel')"
RUN Rscript -e "BiocManager::install('factoextra')"
RUN Rscript -e "BiocManager::install('FactoMineR')"
RUN Rscript -e "BiocManager::install('apeglm')"
When I build this on Linux I launched the web-app from the container and it's run fine.
But, when I build this on windows using Docker Toolbox although the installation of factoextra and FactoMineR work successfully, when I launch the web-app it's raise an error:
Error in library("factoextra") : there is no package called ‘factoextra’
Do you have any idea what's might cause this problem? It's very strange because when I build the image the installation of these 2 packages seems to work successfully.
I have a docker file where i am creating a virtual environment in python 2.7.15 and installing all required python dependencies for my project.
Some of the dependencies require gcc to compile like pandas, lz4 .. are failing with below error,
pandas/io/sas/sas.c:4:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'gcc' failed with exit status 1
Even i tired installing python-devel , gcc also in the docker. But doesn't help.
RUN yum install -y python-pip python-devel gcc
Dockerfile:
FROM registry-access-redhat-com.repo.lab.pl.*-*.com/rhel7.5
# CONFIGURE YUM
RUN rm -f /etc/yum.repos.d/*
ADD resources/yum.repos.d/* /etc/yum.repos.d/
RUN echo "sslverify=false" >> /etc/yum.conf
# INSTALL REQUIRED SYSTEM PACKAGES
RUN yum install -y python-pip python-devel gcc && yum clean all && rm -rf /var/cache/yum
RUN yum install -y wget && yum clean all && rm -rf /var/cache/yum && wget http://repo.lab.pl.alcatel-lucent.com/eden-yum-releases/installation-packages-rpm/python-2.7.15-2.x86_64.rpm
RUN yum install -y python-2.7.15-2.x86_64.rpm && yum clean all && rm -rf /var/cache/yum
#DOWNLAOD LATEST PIP
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/c2/d7/90f34cb0d83a6c5631cf71dfe64cc1054598c843a92b400e55675cc2ac37/pip-18.1-py2.py3-none-any.whl
#INSTALL PIP ON PYTHON 2.7.15
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/pip-18.1-py2.py3-none-any.whl
# CREATE foo GROUP AND USER
RUN groupadd foo
RUN useradd -d /home/foo -ms /bin/bash -g foo foo
# SETUP BASHRC for foo user
COPY file/.bashrc /home/foo
COPY file/.bash_profile /home/foo
RUN chown foo:foo /home/foo/.bash_profile
RUN chown foo:foo /home/foo/.bashrc
# SET WORKING DIRECTORY TO /home/foo
WORKDIR /home/foo
#CREATE VIRTUAL ENVIRONMENT
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/e7/16/da8cb8046149d50940c6110310983abb359bbb8cbc3539e6bef95c29428a/setuptools-40.6.2-py2.py3-none-any.whl
RUN wget -P /tmp/ https://files.pythonhosted.org/packages/7c/17/9b7b6cddfd255388b58c61e25b091047f6814183e1d63741c8df8dcd65a2/virtualenv-16.1.0-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/virtualenv-16.1.0-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/setuptools-40.6.2-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/virtualenv -p /usr/local/bin/python2.7 enet
RUN chown -R foo:foo /home/foo/enet
RUN export LD_LIBRARY_PATH=/usr/local/lib
RUN source /home/foo/enet/bin/activate
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install --find-links /tmp --upgrade --no-index /tmp/pip-18.1-py2.py3-none-any.whl
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install scipy
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install openpyxl
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install confluent-kafka
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install cython
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install pandas
RUN LD_LIBRARY_PATH=/usr/local/lib /home/foo/enet/bin/python2.7 /tmp/pip-18.1-py2.py3-none-any.whl/pip install lz4
How can i resolve this issue.
I guess you should:
Install the python development files (as suggest in comment).
Use a -I switch in gcc command line to point the correct path for python include files.