Docker image with python3, chromedriver, chrome & selenium - python

My objective is to scrape the web with Selenium driven by Python from a docker container.
I've looked around for and not found a docker image with all of the following installed:
Python 3
ChromeDriver
Chrome
Selenium
Is anyone able to link me to a docker image with all of these installed and working together?
Perhaps building my own isn't as difficult as I think, but it's alluded me thus far.
Any and all advice appreciated.

Try https://github.com/SeleniumHQ/docker-selenium.
It has python installed:
$ docker run selenium/standalone-chrome python3 --version
Python 3.5.2
The instructions indicate you start it with
docker run -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome
Edit:
To allow selenium to run through python it appears you need to install the packages. Create this Dockerfile:
FROM selenium/standalone-chrome
USER root
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python3 get-pip.py
RUN python3 -m pip install selenium
Then you could run it with
docker build . -t selenium-chrome && \
docker run -it selenium-chrome python3
The advantage compared to the plain python docker image is that you won't need to install the chromedriver itself since it comes from selenium/standalone-chrome.

I like Harald's solution.
However, as of 2021, my environment needed some modifications.
Docker version 20.10.5, build 55c4c88
I changed the Dockerfile as follows.
FROM selenium/standalone-chrome
USER root
RUN apt-get update && apt-get install python3-distutils -y
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python3 get-pip.py
RUN python3 -m pip install selenium

https://hub.docker.com/r/joyzoursky/python-chromedriver/
It uses python3 as base image and install chromedriver, chrome and selenium (as a pip package) to build. I used the alpine based python3 version for myself, as the image size is smaller.
$ cd [your working directory]
$ docker run -it -v $(pwd):/usr/workspace joyzoursky/python-chromedriver:3.6-alpine3.7-selenium sh
/ # cd /usr/workspace
See if the images suit your case, as you could pip install selenium with other packages together by a requirements.txt file to build your own image, or take reference from the Dockerfiles of this.
If you want to pip install more packages apart from selenium, you could build your own image as this example:
First, in your working directory, you may have a requirements.txt storing the package versions you want to install:
selenium==3.8.0
requests==2.18.4
urllib3==1.22
... (your list of packages)
Then create the Dockerfile in the same directory like this:
FROM joyzoursky/python-chromedriver:3.6-alpine3.7
RUN mkdir packages
ADD requirements.txt packages
RUN pip install -r packages/requirements.txt
Then build the image:
docker build -t yourimage .
This differs with the selenium official one as selenium is installed as a pip package to a python base image. Yet it is hosted by individual so may have higher risk of stopping maintenance.

Related

How to build a Python project for a specific version of Python?

I have an app that I would like to deploy to AWS Lambda and for this reason it has to have Python 3.9.
I have the following in the pyproject.toml:
name = "app"
readme = "README.md"
requires-python = "<=3.9"
version = "0.5.4"
If I try to pip install all the dependencies I get the following error:
ERROR: Package 'app' requires a different Python: 3.11.1 not in '<=3.9'
Is there a way to specify the Python version for this module?
I see there is a lot of confusion about this. I simply want to specify 3.9 "globally" for my build. So when I build the layer for the lambda with the following command it runs:
pip install . -t pyhon/
Right now it has only Python 3.11 packaged. For example:
❯ ls -larth python/ | grep sip
siphash24.cpython-311-darwin.so
When I try to use the layer created this way it fails to load the required library.
There are multiple ways of solving this.
Option 1 (using pip's built in facilities to restrict Python version)
pip install . \
--python-version "3.9" \
--platform "manylinux2010" \
--only-binary=:all: -t python/
Another way of solving this is with Docker:
FROM python:3.9.16-bullseye
RUN useradd -m -u 5000 app || :
RUN mkdir -p /opt/app
RUN chown app /opt/app
USER app
WORKDIR /opt/app
RUN python -m venv venv
ENV PATH="/opt/app/venv/bin:$PATH"
RUN pip install pip --upgrade
RUN mkdir app
RUN touch app/__init__.py
COPY pyproject.toml README.md ./
RUN pip install . -t python/
This way there is no change to create a layer for AWS Lambda that is newer than Python 3.9.

How to install non Python software in a container built from python image

I have a Python application that I want to containerise with Docker. I'm using a Dockerfile such as:
FROM python:3.9
WORKDIR /app
COPY myapp ./myapp
COPY requires.txt .
RUN pip install --no-cache-dir -r requires.txt
Now, my problem is that I have to install certain dependencies that are not just Python packages, for example tesseract.
How do I install this in the built image? Is the python:3.9 image built on top an OS, such as Ubuntu, where I can add software? If so, how? Or does it make more sense to start with a proper OS image, let's say debian por instance, and from there install tesseract and Python (using RUN apt-get ...), and then using python -m pip ... to fulfill the Python of dependences?
The python:3.9 image is based on the debian:bullseye image, so you can use apt to install tesseract like this
FROM python:3.9
RUN apt update && apt install -y tesseract-ocr
WORKDIR /app
COPY myapp ./myapp
COPY requires.txt .
RUN pip install --no-cache-dir -r requires.txt
As for starting from a base OS image or from a python image, I always try to get as much for free as I can and use the image that I have to modify the least.
Another way is to load the required .tar files, that you need to install specific package, or just use precompilied binary.
as you can see in documentation, they have binaries for ubuntu, debian and windows - (load here, for ubuntu docker image).
You can call them straight from supported python library or command line.
Also you should ensure to provide required language data (add those files in dockerfile), if required.

how to successfully run docker image as container

Below my docker file,
FROM python:3.9.0
ARG WORK_DIR=/opt/quarter_1
RUN apt-get update && apt-get install cron -y && apt-get install -y default-jre
# Install python libraries
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
WORKDIR $WORK_DIR
EXPOSE 8888
VOLUME /home/data/quarter_1/
# Copy etl code
# copy code on container under your workdir "/opt/quarter_1"
COPY . .
I tried to connect to the server then i did the build with docker build -t my-python-app .
when i tried to run the container from a build image i got nothing and was not able to do it.
docker run -p 8888:8888 -v /home/data/quarter_1/:/opt/quarter_1 image_id
work here is opt
Update based on comments
If I understand everything you've posted correctly, my suggestion here is to use a base Docker Jupyter image, modify it to add your pip requirements, and then add your files to the work path. I've tested the following:
Start with a dockerfile like below
FROM jupyter/base-notebook:python-3.9.6
COPY requirements.txt /tmp/requirements.txt
RUN pip install --upgrade pip && pip install -r /tmp/requirements.txt
COPY ./quarter_1 /home/jovyan/quarter_1
Above assumes you are running the build from the folder containing dockerfile, "requirements.txt", and the "quarter_1" folder with your build files.
Note "home/joyvan" is the default working folder in this image.
Build the image
docker build -t biwia-jupyter:3.9.6 .
Start the container with open port to 8888. e.g.
docker run -p 8888:8888 biwia-jupyter:3.9.6
Connect to the container to access token. A few ways to do but for example:
docker exec -it CONTAINER_NAME bash
jupyter notebook list
Copy the token in the URL and connect using your server IP and port. You should be able to paste the token there, and afterwards access the folder you copied into the build, as I did below.
Jupyter screenshot
If you are deploying the image to different hosts this is probably the best way to do it using COPY/ADD etc., but otherwise look at using Docker Volumes which give you access to a folder (for example quarter_1) from the host, so you don't constantly have to rebuild during development.
Second edit for Python 3.9.0 request
Using the method above, 3.9.0 is not immediately available from DockerHub. I doubt you'll have much compatibility issues between 3.9.0 and 3.9.6, but we'll build it anyway. We can download the dockerfile folder from github, update a build argument, create our own variant with 3.9.0, and do as above.
Assuming you have git. Otherwise download the repo manually.
Download the Jupyter Docker stack repo
git clone https://github.com/jupyter/docker-stacks
change into the base-notebook directory of the cloned repo
cd ./base-notebook
Build the image with python 3.9.0 instead
docker build --build-arg PYTHON_VERSION=3.9.0 -t jupyter-base-notebook:3.9.0 .
Create the version with your copied folders and 3.9.0 version from the steps above, replacing the first line in the dockerfile instead with:
FROM jupyter-base-notebook:3.9.0
I've tested this and it works, running Python 3.9.0 without issue.
There are lots of ways to build Jupyter images, this is just one method. Check out docker hub for Jupyter to see their variants.

python cannot load en_core_web_lg module in azure app service with docker image

I have a flask python app that uses a spacy model (md or lg). I am running in a docker container in VSCode and all work correctly on my laptop.
When I push the image to my azure container registry the app restarts but it doesn't seem to get past this line in the log:
Initiating warmup request to the container.
If I comment out the line nlp = spacy.load('en_core_web_lg'), the website loads fine (of course it doesn't work as expected).
I am installing the model in the docker file after installing the requirements.txt:
RUN python -m spacy download en_core_web_lg.
Docker file:
FROM python:3.6
EXPOSE 5000
# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE 1
# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1
# steps needed for scipy
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev libc-dev build-essential
RUN pip install -U pip
# Install pip requirements
ADD requirements.txt.
RUN python -m pip install -r requirements.txt
RUN python -m spacy download en_core_web_md
WORKDIR /app
ADD . /app
# During debugging, this entry point will be overridden. For more information, refer to https://aka.ms/vscode-docker-python-debug
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "Application.webapp:app"]
Try using en_core_web_sm instead en_core_web_lg.
You can install the module by python -m spacy download en_core_web_sm
Noticed you asked your question over on MSDN. If en_core_web_sm worked but _md and _lg doesn't, increase your timeout by setting WEBSITES_CONTAINER_START_TIME_LIMIT to a value up to 1800 sec). The size might be taking a while to load the image and simply times out.
If you already done that, email us at AzCommunity[at]microsoft[dot]com ATTN Ryan so we can take a closer look. Include your subscription id and app service name.

Python- Unable to Train Tensorflow Model Container in Sagemaker

I'm fairly new to Sagemaker and Docker.I am trying to train my own custom object detection algorithm in Sagemaker using an ECS container. I'm using this repo's files:
https://github.com/svpino/tensorflow-object-detection-sagemaker
I've followed the instructions exactly, and I'm able to run the image in a container perfectly fine on my local machine. But when I push the image to ECS to run in Sagemaker, I get the following message in Cloudwatch:
I understand that for some reason, when deployed to ECS suddenly the image can't find python. At the top of my training script is the text #!/usr/bin/env python. I've tried to run the *which python * command and changed up text to point to #!/usr/local/bin python, but I just get additional errors. I don't understand why this image would work on my local (tested with both docker on windows and docker CE for WSL). Here's a snippet of the docker file:
ARG ARCHITECTURE=1.15.0-gpu
FROM tensorflow/tensorflow:${ARCHITECTURE}-py3
RUN apt-get update && apt-get install -y --no-install-recommends \
wget zip unzip git ca-certificates curl nginx python
# We need to install Protocol Buffers (Protobuf). Protobuf is Google's language and platform-neutral,
# extensible mechanism for serializing structured data. To make sure you are using the most updated code,
# replace the linked release below with the latest version available on the Git repository.
RUN curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.10.1/protoc-3.10.1-linux-x86_64.zip
RUN unzip protoc-3.10.1-linux-x86_64.zip -d protoc3
RUN mv protoc3/bin/* /usr/local/bin/
RUN mv protoc3/include/* /usr/local/include/
# Let's add the folder that we are going to be using to install all of our machine learning-related code
# to the PATH. This is the folder used by SageMaker to find and run our code.
ENV PATH="/opt/ml/code:${PATH}"
RUN mkdir -p /opt/ml/code
WORKDIR /opt/ml/code
RUN pip install --upgrade pip
RUN pip install cython
RUN pip install contextlib2
RUN pip install pillow
RUN pip install lxml
RUN pip install matplotlib
RUN pip install flask
RUN pip install gevent
RUN pip install gunicorn
RUN pip install pycocotools
# Let's now download Tensorflow from the official Git repository and install Tensorflow Slim from
# its folder.
RUN git clone https://github.com/tensorflow/models/ tensorflow-models
RUN pip install -e tensorflow-models/research/slim
# We can now install the Object Detection API, also part of the Tensorflow repository. We are going to change
# the working directory for a minute so we can do this easily.
WORKDIR /opt/ml/code/tensorflow-models/research
RUN protoc object_detection/protos/*.proto --python_out=.
RUN python setup.py build
RUN python setup.py install
# If you are interested in using COCO evaluation metrics, you can tun the following commands to add the
# necessary resources to your Tensorflow installation.
RUN git clone https://github.com/cocodataset/cocoapi.git
WORKDIR /opt/ml/code/tensorflow-models/research/cocoapi/PythonAPI
RUN make
RUN cp -r pycocotools /opt/ml/code/tensorflow-models/research/
# Let's put the working directory back to where it needs to be, copy all of our code, and update the PYTHONPATH
# to include the newly installed Tensorflow libraries.
WORKDIR /opt/ml/code
COPY /code /opt/ml/code
ENV PYTHONPATH=${PYTHONPATH}:tensorflow-models/research:tensorflow-models/research/slim:tensorflow-models/research/object_detection
RUN chmod +x /opt/ml/code/train
CMD ["/bin/bash","-c","chmod +x /opt/ml/code/train && /opt/ml/code/train"]

Categories

Resources