I have a python script that scrapes some web information using selenium. I've build a docker image of my project:
FROM python:3.7-slim
COPY requirements.txt ./
RUN pip install --upgrade pip && pip install -r requirements.txt
COPY . .
RUN pip install -e .
CMD ["python", "src/project/scraper.py"]
I get the following error when I run it: selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
The chromedriver.exe file is located in a data folder and the .py script refers to the right place (it does run locally).
Does anyone know how I would be able to run chrome in this container?
map structure is as follows:
|-- data
| |--chromedriver.exe
| |--file.csv
|-- src
| |--project
| |--scraper.py
|-- Dockerfile
|-- requirements.txt
Let me share what has worked for me in the past.
Try installing chrome, chromedriver, and the PATH from within the DockerFile.
Using Python 3.8. You can try changing it to 3.7 and see if it works for you.
NOT configured for multi-stage builds.
i.e. You may want to remove "FROM python:3.7-slim" before appending your part at the end.
FROM python:3.8 AS builder
RUN apt-get update; apt-get clean
# Install chrome dependencies
RUN apt-get install -y x11vnc xvfb fluxbox wget wmctrl unzip
# Set up the Chrome PPA
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
# Update the package list and install chrome
RUN apt-get update -y
RUN apt-get install -y google-chrome-stable
# Set up Chromedriver Environment variables
# Download and install Chromedriver
RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip"
# Put Chromedriver into the PATH
RUN python -m venv /opt/venv
# Make sure we use the virtualenv:
ENV PATH="/opt/venv/bin:$PATH"
As of today (Jan 25th, 2021), I can check the latest stable version (released Jan 19th, 2021) of google chrome is 88.0.4324.96. So if the above don't work, try changing the Chromedriver version so that it matches with the installed chrome browser.
I am expecting you are using Linux containers in Docker, since python:3.7-slim is a Linux image. You cannot execute Windows binaries (.exe) files in Linux. Therefore you need to install chromedriver on Linux: How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04
Your Dockerfile should look something like this
FROM python:3.7-slim
# install chromedriver
RUN apt-get update && \
apt-get install -y unzip xvfb libxi6 libgconf-2-4 && \
apt-get install default-jdk && \
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add && \
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
apt-get -y update && \
wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip && \
mv chromedriver /usr/bin/chromedriver && \
chown root:root /usr/bin/chromedriver && \
chmod +x /usr/bin/chromedriver
COPY requirements.txt ./
RUN pip install --upgrade pip && pip install -r requirements.txt
COPY . .
RUN pip install -e .
CMD ["python", "src/project/scraper.py"]
I'm trying to dockerize and run the web scrapper developed using the selenium library in python. I used Windows 10 for development. It ran well there. While running the same script as a docker image, I'm getting multiple issues. This is how I connect the driver in windows.
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
I didn't use options as I don't have any use cases. As I got root user error while running in docker I added the option and ran the code as below.
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options = chrome_options, service=Service(ChromeDriverManager().install()))
Still, it didn't start. So I configured it by hardcoding the driver path.
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=driverPath,options=option)
Even then it didn't get started as the display was not configured. So configured the headless argument and ran, but in the end, I got the below error.
Tkinter.TclError: no display name and no $DISPLAY environment variable
So I tried to start the display by the below code.
if platform.system() == 'Linux':
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 800))
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=driverPath,options=option)
But it is not running, it is frozen and not creating the driver session.
This is my Dockerfile
FROM python
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
RUN apt-get update && apt-get -y install google-chrome-stable
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN apt-get install xvfb mesa-utils -y \
&& apt install freeglut3-dev -y
RUN mkdir -p /app/drivers
ADD requirements.txt /app
ADD sample.py /app
COPY run.sh /app
COPY drivers /app/drivers
COPY csv /app/csv
RUN pip3 install -r requirements.txt
CMD ./run.sh
#Xvfb :99 -screen 0 640x480x8 -nolisten tcp &
python3 ./sample.py
What are the mistakes I made in the code? And how to run the selenium python app with display in docker? Thank you.
Seems the display can't be enabled in the python jar. So I have created the python image from the ubuntu image as said in this site. There I have installed the python and the other dependencies required for my application. And now I'm able run the application without any issues.
FROM ubuntu
#Enabling noninteractive environment and setting Timezone to install python3-tk without any interruption
# python
RUN export TZ=Asia/Kolkata
RUN apt-get update
RUN apt-get install -y python3 python3-setuptools python3-pip python3-tk
ENV DEBIAN_FRONTEND noninteractive
# Essential tools and xvfb
RUN apt-get update && apt-get install -y \
software-properties-common \
unzip \
curl \
# Chrome browser to run the tests
RUN curl https://dl-ssl.google.com/linux/linux_signing_key.pub -o /tmp/google.pub \
&& cat /tmp/google.pub | apt-key add -; rm /tmp/google.pub \
&& echo 'deb http://dl.google.com/linux/chrome/deb/ stable main' > /etc/apt/sources.list.d/google.list \
&& mkdir -p /usr/share/desktop-directories \
&& apt-get -y update && apt-get install -y google-chrome-stable
# Disable the SUID sandbox so that chrome can launch without being in a privileged container
RUN dpkg-divert --add --rename --divert /opt/google/chrome/google-chrome.real /opt/google/chrome/google-chrome \
&& echo "#!/bin/bash\nexec /opt/google/chrome/google-chrome.real --no-sandbox --disable-setuid-sandbox \"\$#\"" > /opt/google/chrome/google-chrome \
&& chmod 755 /opt/google/chrome/google-chrome
# Chrome Driver
RUN mkdir -p /opt/selenium \
&& curl http://chromedriver.storage.googleapis.com/2.45/chromedriver_linux64.zip -o /opt/selenium/chromedriver_linux64.zip \
&& cd /opt/selenium; unzip /opt/selenium/chromedriver_linux64.zip; rm -rf chromedriver_linux64.zip; ln -fs /opt/selenium/chromedriver /usr/local/bin/chromedriver;
# display
RUN export DISPLAY=:20
RUN Xvfb :20 -screen 0 1366x768x16 &
RUN mkdir -p /app
ADD requirements.txt /app
ADD app.py /app
RUN pip3 install -r requirements.txt
CMD ./run.sh
I need a Dockerfile to run my Python script. The script uses Selenium, so I need to load a driver for it to work. An ordinary .exe file - driver is not suitable, so according to the advice of the administrators of the hosting where the script is located I need to create a Dockerfile for the script to work properly.
The main problem is that I simply can not run my script, because I do not understand how to load the required driver on the server.
This is a sample code of what should be in the Dockerfile.
FROM python:3
RUN apt-get update -y
RUN apt-get install -y wget
RUN wget -O $HOME/geckodriver.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz
RUN tar xf $HOME/geckodriver.tar.gz -C $HOME
RUN cp $HOME/geckodriver /usr/local/bin/geckodriver
RUN chmod +x /usr/local/bin/geckodriver
RUN rm -f $HOME/geckodriver $HOME/geckodriver.tar.gz
This is the code used in the Python script
options = Options()
driver = webdriver.Chrome(options=options, executable_path=r"chromedriver.exe")
big_stat = driver.find_element(by=By.CLASS_NAME, value="rating-product__numb")
I can redo this snippet of code to make it work on Firefox, if necessary.
This is what the directories of the hosting where all the files are located look like
The directories of the hosting
For getting Selenium to work with Python using a Dockerfile, here's an existing SeleniumBase Dockerfile.
For instructions on using it, see the README.
For building, it's basically this:
Non Apple M1 Mac:
docker build -t seleniumbase .
If running on an Apple M1 Mac, use this instead:
docker build --platform linux/amd64 seleniumbase .
Before building the Dockerfile, you'll need to clone SeleniumBase.
Here's what the Dockerfile currently looks like:
FROM ubuntu:18.04
# Install Python and Basic Python Tools
RUN apt-get -o Acquire::Check-Valid-Until=false -o Acquire::Check-Date=false update
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev python-distribute
RUN alias python=python3
RUN echo "alias python=python3" >> ~/.bashrc
# Install Bash Command Line Tools
RUN apt-get -qy --no-install-recommends install \
sudo \
unzip \
wget \
curl \
libxi6 \
libgconf-2-4 \
vim \
xvfb \
&& rm -rf /var/lib/apt/lists/*
# Install Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
apt-get -yqq update && \
apt-get -yqq install google-chrome-stable && \
rm -rf /var/lib/apt/lists/*
# Install Firefox
RUN apt-get -qy --no-install-recommends install \
$(apt-cache depends firefox | grep Depends | sed "s/.*ends:\ //" | tr '\n' ' ') \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
&& wget --no-check-certificate -O firefox-esr.tar.bz2 \
'https://download.mozilla.org/?product=firefox-esr-latest&os=linux64&lang=en-US' \
&& tar -xjf firefox-esr.tar.bz2 -C /opt/ \
&& ln -s /opt/firefox/firefox /usr/bin/firefox \
&& rm -f /tmp/firefox-esr.tar.bz2
# Configure Virtual Display
RUN set -e
RUN echo "Starting X virtual framebuffer (Xvfb) in background..."
RUN Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
RUN export DISPLAY=:99
RUN exec "$#"
# Update Python Version
RUN apt-get update -y
RUN apt-get -qy --no-install-recommends install python3.8
RUN rm /usr/bin/python3
RUN ln -s python3.8 /usr/bin/python3
# Allow Special Characters in Python Programs
RUN echo "export PYTHONIOENCODING=utf8" >> ~/.bashrc
# Set up SeleniumBase
COPY sbase /SeleniumBase/sbase/
COPY seleniumbase /SeleniumBase/seleniumbase/
COPY examples /SeleniumBase/examples/
COPY integrations /SeleniumBase/integrations/
COPY requirements.txt /SeleniumBase/requirements.txt
COPY setup.py /SeleniumBase/setup.py
RUN find . -name '*.pyc' -delete
RUN find . -name __pycache__ -delete
RUN pip3 install --upgrade pip
RUN pip3 install --upgrade setuptools
RUN pip3 install --upgrade setuptools-scm
RUN cd /SeleniumBase && ls && pip3 install -r requirements.txt --upgrade
RUN cd /SeleniumBase && pip3 install .
# Download WebDrivers
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.31.0/geckodriver-v0.31.0-linux64.tar.gz
RUN tar -xvzf geckodriver-v0.31.0-linux64.tar.gz
RUN chmod +x geckodriver
RUN mv geckodriver /usr/local/bin/
RUN wget https://chromedriver.storage.googleapis.com/2.44/chromedriver_linux64.zip
RUN unzip chromedriver_linux64.zip
RUN chmod +x chromedriver
RUN mv chromedriver /usr/local/bin/
# Create entrypoint and grab example tests
COPY integrations/docker/docker-entrypoint.sh /
COPY integrations/docker/run_docker_test_in_firefox.sh /
COPY integrations/docker/run_docker_test_in_chrome.sh /
RUN chmod +x *.sh
COPY integrations/docker/docker_config.cfg /SeleniumBase/examples/
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["/bin/bash"]
I need to put into a Docker container my little Flask app that goes and check what type of Google Tags my company's clients have installed. For that i need to have selenium-wire . You supply a website and you get a json back telling you which tags are installed ( a bit like http://gachecker.com/ ). Now it works just fine with the Flask App. The issue arises when i try to put it into Docker, here is my docker script:
FROM python:3.9 WORKDIR /bziiit_checker_app
RUN pip install flask flask_restful requests BeautifulSoup4 selenium-wire undetected-chromedriver chromedriver-py
COPY ./app ./app
CMD ["python", "./app/main.py"]
Once it's in Docker and try to run it, i get that message
"selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH"
Which is a common issue when the chromedriver.exe file is not in the working directory. But it IS.
Do i need to set the PATH when i'm creating the virtual environment, and if so how do i do that?
Again, i'm good at A.I, terrible at app development.
I'm using Python 3.9 and am on Windows 10, Visual Studio Code, and Flask
Thank you
After a few days of pain and suffering i finally worked it out, so here is the Docker file i created to get chromedriver to work in a Docker container.
This works on Windows 10 using VS code
FROM python:3.8
# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
# Updating apt to see and install Google Chrome
RUN apt-get -y update
# Magic happens
RUN apt-get install -y google-chrome-stable
# Installing Unzip
RUN apt-get install -yqq unzip
# Download the Chrome Driver
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# Set display port as an environment variable
COPY ./app ./app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD ["python", "./main.py"]
Then, in your script, add those arguments to Chromedriver's options otherwise it'll give you an error message telling you that "Chromedriver has exited abnormally"
option = webdriver.ChromeOptions()
I hope this will save someone all the headache that problem gave me
You will also have to install chrome driver and chrome inside your container
RUN add-apt-repository -y ppa:openjdk-r/ppa
RUN apt-get install -y openjdk-12-jre cron wget unzip
ARG CHROME_VERSION=78.0.3904.87-1
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get update -qqy \
&& apt-get -qqy install google-chrome-stable=$CHROME_VERSION \
&& rm /etc/apt/sources.list.d/google-chrome.list \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/* \
&& sed -i 's/"$HERE\/chrome"/"$HERE\/chrome" --no-sandbox/g' /opt/google/chrome/google-chrome
RUN wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip \
&& rm -rf /opt/chromedriver \
&& unzip /tmp/chromedriver_linux64.zip -d /opt \
&& rm /tmp/chromedriver_linux64.zip \
&& mv /opt/chromedriver /opt/chromedriver-$CHROME_DRIVER_VERSION \
&& chmod 755 /opt/chromedriver-$CHROME_DRIVER_VERSION \
&& ln -fs /opt/chromedriver-$CHROME_DRIVER_VERSION /usr/bin/chromedriver
I have a script python which should output a file csv. I'm trying to have this file in the current working directory but without success.
This is my Dockerfile
FROM python:3.6.4
RUN apt-get update && apt-get install -y libaio1 wget unzip
WORKDIR /opt/oracle
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-
basiclite-linuxx64.zip && \ unzip instantclient-basiclite-linuxx64.zip && rm
-f instantclient-basiclite-linuxx64.zip && \ cd /opt/oracle/instantclient*
&& rm -f jdbc occi mysql *README jar uidrvci genezi adrci && \ echo
/opt/oracle/instantclient > /etc/ld.so.conf.d/oracle-instantclient.conf &&
RUN pip install --upgrade pip
COPY . /app
RUN pip install --upgrade pip
RUN pip install pystan
RUN apt-get -y update && python3 -m pip install cx_Oracle --upgrade
RUN pip install -r requirements.txt
CMD [ "python", "Main.py" ]
And run the container with the following command
docker container run -v $pwd:/home/learn/rstudio_script/output image
This is bad practice to bind a volume just to have 1 file on your container be saved onto your host.
Instead, what you should leverage is the copy command:
docker cp <containerId>:/file/path/within/container /host/path/target
You can set this command to auto execute with bash, after your docker run.
So something like:
# this stores the container id
CONTAINER_ID=$(docker run -dit img)
docker cp $CONTAINER_ID:/some_path host_path
If you are adamant on using a bind volume, then as the others have pointed out, the issue is most likely your python script isn't outputting the csv to the correct path.
Your script Main.py is probably not trying to write to /home/learn/rstudio_script/output. The working directory in the container is /app because of the last WORKDIR directive in the Dockerfile. You can override that at runtime with --workdir but then the CMD would have to be changed as well.
One solution is to have your script write files to /output/ and then run it like this:
docker container run -v $PWD:/output/ image
I'm trying to connect to an Oracle database at my company through my docker container that contains some of my python scripts with the package cx_Oracle. After i build and run the container, i get the following error:
conn = cx_Oracle.connect("{0}/{1}#{2}".format(configOracle["username"], configOracle["password"],r"ed03:1521/configOracle["servername"]))
cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://oracle.github.io/odpi/doc/installation.html#linux for help
I have an Oracle config file where the username, password, and server name are coming from and being filled in correctly. I can't seem to get it to work even after downloading the latest client from https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html.
My directory structure looks like this:
Here is my Dockerfile:
FROM python:3.7.5
#Oracle Client setup
ENV ORACLE_HOME /opt/oracle/instantclient_19_5
COPY instantclient/* /tmp/
mkdir -p /opt/oracle && \
unzip "/tmp/instantclient*.zip" -d /opt/oracle && \
ln -s $ORACLE_HOME/libclntsh.so.19.1 $ORACLE_HOME/libclntsh.so
# Working directory
# Copying requirements.txt before entire build step
COPY requirements.txt /src/requirements.txt
RUN pip install --upgrade pip
# Installing necessary packages
RUN pip install -r requirements.txt
# Copying rest of files
COPY . /src
CMD ["python3", "/src/hello_oracle.py"]
Here is my requirements.txt file:
After many hours trying it, I finally solved it with this Dockerfile
Note I am using python 3.7, Django 3.0, Oracle Database 12c and Pipenv for package management
FROM python:3.7.5-slim-buster
# Installing Oracle instant client
WORKDIR /opt/oracle
RUN apt-get update && apt-get install -y libaio1 wget unzip \
&& wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip \
&& unzip instantclient-basiclite-linuxx64.zip \
&& rm -f instantclient-basiclite-linuxx64.zip \
&& cd /opt/oracle/instantclient* \
&& rm -f *jdbc* *occi* *mysql* *README *jar uidrvci genezi adrci \
&& echo /opt/oracle/instantclient* > /etc/ld.so.conf.d/oracle-instantclient.conf \
&& ldconfig
COPY . . # Copy my project folder content into /app container directory
RUN pip3 install pipenv
RUN pipenv install
# For this statement to work you need to add the next two lines into Pipfilefile
# [scripts]
# server = "python manage.py runserver"
ENTRYPOINT ["pipenv", "run", "server"]
The latest release of the Python driver for Oracle got renamed to python-oracledb and is now a 'thin' driver by default. It does not need Instant Client - it's optional. See the release announcement. The Dockerfile can simply be like:
FROM python:3.10-bullseye
RUN python -m pip install oracledb
If you want the option to use the 'Thick' mode of python-oracledb, then you could use a Dockerfile like:
FROM python:3.10-bullseye
WORKDIR /opt/oracle
RUN apt-get update && apt-get install -y libaio1
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip && \
unzip instantclient-basiclite-linuxx64.zip && rm -f instantclient-basiclite-linuxx64.zip && \
cd /opt/oracle/instantclient* && rm -f *jdbc* *occi* *mysql* *README *jar uidrvci genezi adrci && \
echo /opt/oracle/instantclient* > /etc/ld.so.conf.d/oracle-instantclient.conf && ldconfig
RUN python -m pip install oracledb
Oracle has Python cx_Oracle Dockerfiles at https://github.com/oracle/docker-images/tree/master/OracleLinuxDevelopers and cx_Oracle containers at https://github.com/orgs/oracle/packages
There is a two-part blog post series Docker for Oracle Database Applications in Node.js and Python that shows various ways to install. Also there is an Oracle webcast recording discussing cx_Oracle and Docker here.
If you are still using the cx_Oracle namespace, you always need to install Instant Client so a solution is to use:
FROM python:3.10-bullseye
RUN apt-get update && apt-get install -y libaio1
WORKDIR /opt/oracle
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip && \
unzip instantclient-basiclite-linuxx64.zip && rm -f instantclient-basiclite-linuxx64.zip && \
cd /opt/oracle/instantclient* && rm -f *jdbc* *occi* *mysql* *README *jar uidrvci genezi adrci && \
echo /opt/oracle/instantclient* > /etc/ld.so.conf.d/oracle-instantclient.conf && ldconfig
RUN python -m pip install cx_Oracle
If you use a different base image you may need to explicitly install wget and unzip.