How to create Dockerfile with R + Anaconda3 + non-root User - python

I need to create a Dockerfile that emulates a normal workspace.
We have a virtual machine where we train models.
We Use R and Python3.
I want to automate some of the processes without changing the codebase.
e.g. ~ must point to a /home/<some user>
Biggest problem is Anaconda3 in docker. because every RUN is a standalone login.

Basis for my answer: https://github.com/xychelsea/anaconda3-docker/blob/main/Dockerfile
I've created my own mini R package installer:
install_r_packages.sh
#!/bin/bash
input="r-requirements.txt"
Rscript -e "install.packages('remotes')"
IFS='='
while IFS= read -r line; do
read -r package version <<<$line
package=$(echo "$package" | sed 's/ *$//g')
version=$(echo "$version" | sed 's/ *$//g')
if ! [[ ($package =~ ^#.*) || (-z $package) ]]; then
Rscript -e "remotes::install_version('$package', version = '$version')"
fi
done <$input
r-requirement
# packages for rmarkdown
htmltools=0.5.2
jsonlite=1.7.2
...
rmarkdown=2.11
# more packages
...
Dockerfile
FROM debian:bullseye
RUN apt-get update
# install R
RUN apt-get install -y r-base r-base-dev libatlas3-base r-recommended libssl-dev openssl \
libcurl4-openssl-dev libfontconfig1-dev libxml2-dev xml2 pandoc lua5.3 clang
ENV ARROW_S3=ON \
LIBARROW_MINIMAL=false \
LIBARROW_BINARY=true \
RSTUDIO_PANDOC=/usr/lib/rstudio-server/bin/pandoc \
TZ=Etc/UTC
COPY r-requirements.txt .
COPY scripts/install_r_packages.sh scripts/install_r_packages.sh
RUN bash scripts/install_r_packages.sh
# create user
ENV REPORT_USER="reporter"
ENV PROJECT_HOME=/home/${REPORT_USER}/<project>
RUN useradd -ms /bin/bash ${REPORT_USER} \
&& mkdir /data \
&& mkdir /opt/mlflow \
&& chown -R ${REPORT_USER}:${REPORT_USER} /data \
&& chown -R ${REPORT_USER}:${REPORT_USER} /opt/mlflow
# copy project files
WORKDIR ${PROJECT_HOME}
COPY src src
... bla bla bla ...
COPY requirements.txt .
RUN chown -R ${REPORT_USER}:${REPORT_USER} ${PROJECT_HOME}
# Install python Anaconda env
ENV ANACONDA_PATH="/opt/anaconda3"
ENV PATH=${ANACONDA_PATH}/bin:${PATH}
ENV ANACONDA_INSTALLER=Anaconda3-2021.11-Linux-x86_64.sh
RUN mkdir ${ANACONDA_PATH} \
&& chown -R ${REPORT_USER}:${REPORT_USER} ${ANACONDA_PATH}
RUN apt-get install -y wget
USER ${REPORT_USER}
RUN wget https://repo.anaconda.com/archive/${ANACONDA_INSTALLER} \
&& /bin/bash ${ANACONDA_INSTALLER} -b -u -p ${ANACONDA_PATH} \
&& chown -R ${REPORT_USER} ${ANACONDA_PATH} \
&& rm -rvf ~/${ANACONDA_INSTALLER}.sh \
&& echo ". ${ANACONDA_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc \
&& echo "conda activate base" >> ~/.bashrc
RUN pip3 install --upgrade pip \
&& pip3 install -r requirements.txt \
&& pip3 install awscli
# run training and report
ENV PYTHONPATH=/home/${REPORT_USER}/<project> \
MLFLOW_TRACKING_URI=... \
MLFLOW_EXPERIMENT_NAME=...
CMD dvc config core.no_scm true \
&& dvc repro

Related

Unable to install R 4.1.3 in fastapi Docker image

I'm using a dockerfile which uses tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim-2021-06-09 as base image and installs the required linux package and also installs r-recommended and r-base. Earlier below dockerfile works fine. But when I tried to update the image with tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim-2023-01-09 as base image, unable to install the r-recommended and r-base with version 4.1.2-1~bustercran.0.
Dockerfile :
# Download IspR from IspR project pipeline and extract the folder and rename it as r-scripts. Place the r-scripts directory in backend root directory.
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim-2023-01-09
COPY key_gnupg.gpg /app/key_gnupg.gpg
COPY packages.txt /app/packages.txt
RUN echo "Acquire::Check-Valid-Until \"false\";\nAcquire::Check-Date \"false\";" | cat > /etc/apt/apt.conf.d/10no--check-valid-until
RUN apt-get update && apt-get install -y gnupg2=2.2.27-2+deb11u2 && \
echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran40/" >> /etc/apt/sources.list.d/cran.list && \
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B8F25A8A73EACF41 && \
apt-get update
RUN apt-get install --no-install-recommends -y $(cat /app/packages.txt) && \
rm -rf /var/lib/apt/lists/* && \
apt-get purge --auto-remove && \
apt-get clean
COPY . /app
WORKDIR /app/r-scripts
RUN R -e "install.packages('remotes')"
RUN renv_version=`cat renv.lock | grep -A3 "renv" | grep -e "Version" | cut -d ':' -f2 | sed "s/\"//g" | sed "s/\,//g"|awk '{$1=$1};1'` && \
R -e "remotes::install_github('rstudio/renv#${renv_version}')" && \
rm -rf /app
CMD ["/bin/bash"]
packages.txt
git=1:2.20.1-2+deb10u3
pkg-config=0.29-6
liblapack-dev=3.8.0-2
gfortran=4:8.3.0-1
libxml2=2.9.4+dfsg1-7+deb10u3
libxml2-dev=2.9.4+dfsg1-7+deb10u3
libssl-dev=1.1.1n-0+deb10u1
libcurl4-openssl-dev=7.64.0-4+deb10u2
libnlopt-dev=2.4.2+dfsg-8+b1
libpcre2-8-0=10.32-5
build-essential=12.6
r-recommended=4.1.2-1~bustercran.0
r-base=4.1.2-1~bustercran.0
curl=7.64.0-4+deb10u2
postgresql=11+200+deb10u4
libpq-dev=11.14-0+deb10u1
libblas-dev=3.8.0-2
libgomp1=8.3.0-6
zlib1g-dev=1:1.2.11.dfsg-1
zlib1g=1:1.2.11.dfsg-1
Error MEssage :
E: Version '4.1.2-1~bustercran.0' for 'r-base-core' was not found
How to install the 4.1.2 version of r-base using this Dockerfile?

docker python:3.9 image gives error for scripts

We are using python:3.9 image for the base, and run some command on that.
Base Image
########################
# Base Image Section #
########################
#
# Creates an image with the common requirements for a flask app pre-installed
# Start with a smol OS
FROM python:3.9
# Install basic requirements
RUN apt-get -q update -o Acquire::Languages=none && apt-get -yq install --no-install-recommends \
apt-transport-https \
ca-certificates && \
apt-get autoremove -yq && apt-get clean && rm -rf "/var/lib/apt/lists"/*
# Install CA certs
# Prefer the mirror for package downloads
COPY ["ca_certs/*.crt", "/usr/local/share/ca-certificates/"]
RUN update-ca-certificates && \
mv /etc/apt/sources.list /etc/apt/sources.list.old && \
printf 'deb https://mirror.company.com/debian/ buster main contrib non-free\n' > /etc/apt/sources.list && \
cat /etc/apt/sources.list.old >> /etc/apt/sources.list
# Equivalent to `cd /app`
WORKDIR /app
# Fixes a host of encoding-related bugs
ENV LC_ALL=C.UTF-8
# Tells `apt` and others that no one is sitting at the keyboard
ENV DEBIAN_FRONTEND=noninteractive
# Set a more helpful shell prompt
ENV PS1='[\u#\h \W]\$ '
#####################
# ONBUILD Section #
#####################
#
# ONBUILD commands take effect when another image is built using this one as a base.
# Ref: https://docs.docker.com/engine/reference/builder/#onbuild
#
#
# And that's it! The base container should have all your dependencies and ssl certs pre-installed,
# and will copy your code over when used as a base with the "FROM" directive.
ONBUILD ARG BUILD_VERSION
ONBUILD ARG BUILD_DATE
# Copy our files into the container
ONBUILD ADD . .
# pre_deps: packages that need to be installed before code installation and remain in the final image
ONBUILD ARG pre_deps
# build_deps: packages that need to be installed before code installation, then uninstalled after
ONBUILD ARG build_deps
# COMPILE_DEPS: common packages needed for building/installing Python packages. Most users won't need to adjust this,
# but you could specify a shorter list if you didn't need all of these.
ONBUILD ARG COMPILE_DEPS="build-essential python3-dev libffi-dev libssl-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev g++ unixodbc-dev"
# ssh_key: If provided, writes the given string to ~/.ssh/id_rsa just before Python package installation,
# and deletes it before the layer is written.
ONBUILD ARG ssh_key
# If our python package is installable, install system packages that are needed by some python libraries to compile
# successfully, then install our python package. Finally, delete the temporary system packages.
ONBUILD RUN \
if [ -f setup.py ] || [ -f requirements.txt ]; then \
install_deps="$pre_deps $build_deps $COMPILE_DEPS" && \
uninstall_deps=$(python3 -c 'all_deps=set("'"$install_deps"'".split()); to_keep=set("'"$pre_deps"'".split()); print(" ".join(sorted(all_deps-to_keep)), end="")') && \
apt-get -q update -o Acquire::Languages=none && apt-get -yq install --no-install-recommends $install_deps && \
if [ -n "${ssh_key}" ]; then \
mkdir -p ~/.ssh && chmod 700 ~/.ssh && printf "%s\n" "${ssh_key}" > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa && \
printf "%s\n" "StrictHostKeyChecking=no" > ~/.ssh/config && chmod 600 ~/.ssh/config || exit 1 ; \
fi ; \
if [ -f requirements.txt ]; then \
pip3 install --no-cache-dir --compile -r requirements.txt || exit 1 ; \
elif [ -f setup.py ]; then \
pip3 install --no-cache-dir --compile --editable . || exit 1 ; \
fi ; \
if [ -n "${ssh_key}" ]; then \
rm -rf ~/.ssh || exit 1 ; \
fi ; \
fi
We build this image last year, and it was working fine, but we decided to use latest changes and build new base image, once we build it, it start failing for last RUN command.
DEBU[0280] Deleting in layer: map[]
INFO[0281] Cmd: /bin/sh
INFO[0281] Args: [-c if [ -f setup.py ] || [ -f requirements.txt ]; then install_deps="$pre_deps $build_deps $COMPILE_DEPS" && uninstall_deps=$(python3 -c 'all_deps=set("'"$install_deps"'".split()); to_keep=set("'"$pre_deps"'".split()); print(" ".join(sorted(all_deps-to_keep)), end="")') && apt-get -q update -o Acquire::Languages=none && apt-get -yq install --no-install-recommends $install_deps && if [ -n "${ssh_key}" ]; then mkdir -p ~/.ssh && chmod 700 ~/.ssh && printf "%s\n" "${ssh_key}" > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa && printf "%s\n" "StrictHostKeyChecking=no" > ~/.ssh/config && chmod 600 ~/.ssh/config || exit 1 ; fi ; if [ -f requirements.txt ]; then pip3 install --no-cache-dir --compile -r requirements.txt || exit 1 ; elif [ -f setup.py ]; then pip3 install --no-cache-dir --compile --editable . || exit 1 ; fi ; if [ -n "${ssh_key}" ]; then rm -rf ~/.ssh || exit 1 ; fi ; fi]
INFO[0281] Running: [/bin/sh -c if [ -f setup.py ] || [ -f requirements.txt ]; then install_deps="$pre_deps $build_deps $COMPILE_DEPS" && uninstall_deps=$(python3 -c 'all_deps=set("'"$install_deps"'".split()); to_keep=set("'"$pre_deps"'".split()); print(" ".join(sorted(all_deps-to_keep)), end="")') && apt-get -q update -o Acquire::Languages=none && apt-get -yq install --no-install-recommends $install_deps && if [ -n "${ssh_key}" ]; then mkdir -p ~/.ssh && chmod 700 ~/.ssh && printf "%s\n" "${ssh_key}" > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa && printf "%s\n" "StrictHostKeyChecking=no" > ~/.ssh/config && chmod 600 ~/.ssh/config || exit 1 ; fi ; if [ -f requirements.txt ]; then pip3 install --no-cache-dir --compile -r requirements.txt || exit 1 ; elif [ -f setup.py ]; then pip3 install --no-cache-dir --compile --editable . || exit 1 ; fi ; if [ -n "${ssh_key}" ]; then rm -rf ~/.ssh || exit 1 ; fi ; fi]
error building image: error building stage: failed to execute command: starting command: fork/exec /bin/sh: exec format error
We label the image, based on date, to know when it was working, we have base image, build on 12-09-22 works fine.
Something new in python:3.9 cause this issue. Same script was working.

Jupyter notebook gui missing Java kernel

I'm using below docker to use IJava kernel to my jupyter notebook.
FROM ubuntu:18.04
ARG NB_USER="some-user"
ARG NB_UID="1000"
ARG NB_GID="100"
RUN apt-get update || true && \
apt-get install -y sudo && \
useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \
chmod g+w /etc/passwd && \
echo "${NB_USER} ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers && \
# Prevent apt-get cache from being persisted to this layer.
rm -rf /var/lib/apt/lists/*
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y locales && \
sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
dpkg-reconfigure --frontend=noninteractive locales && \
update-locale LANG=en_US.UTF-8
RUN apt-get install -y \
openjdk-11-jdk-headless \
python3-pip git curl unzip
RUN ln -s /usr/bin/python3 /usr/bin/python & \
pip3 install --upgrade pip
RUN pip3 install packaging jupyter ipykernel awscli jaydebeapi
RUN python -m ipykernel install --sys-prefix
# Install Java kernel
RUN mkdir ijava-kernel/ && cd ijava-kernel && curl -LO https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip && \
unzip ijava-1.3.0.zip && \
python install.py --sys-prefix && \
rm -rf ijava-kernel/
RUN jupyter kernelspec list
ENV SHELL=/bin/bash
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64/
WORKDIR /home/$NB_USER
USER $NB_UID
As soon as I run the docker image, inside the container:
some-user#023f579253ec:~$ jupyter kernelspec list ─╯
Available kernels:
python3 /usr/local/share/jupyter/kernels/python3
java /usr/share/jupyter/kernels/java
some-user#023f579253ec:~$
As well as, the console with kernel java is also installed and working as per README.md
jupyter console --kernel java
In [2]: String helloWorld = "Hello world!"
In [3]: helloWorld
Out[3]: Hello world!
But as soon as I run open the jupyter notebook inside the container, I only see Python3 kernel not the Java. see attached image.
can anyone help me out to add the Java Kernel to Notebook GUI?
This is an open issue on IJava's GitHub. The discussion thread mentions a few Docker Images that address your issue: 1, 2, 3.

How to make a Dockerfile?

I need a Dockerfile to run my Python script. The script uses Selenium, so I need to load a driver for it to work. An ordinary .exe file - driver is not suitable, so according to the advice of the administrators of the hosting where the script is located I need to create a Dockerfile for the script to work properly.
The main problem is that I simply can not run my script, because I do not understand how to load the required driver on the server.
This is a sample code of what should be in the Dockerfile.
FROM python:3
RUN apt-get update -y
RUN apt-get install -y wget
RUN wget -O $HOME/geckodriver.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz
RUN tar xf $HOME/geckodriver.tar.gz -C $HOME
RUN cp $HOME/geckodriver /usr/local/bin/geckodriver
RUN chmod +x /usr/local/bin/geckodriver
RUN rm -f $HOME/geckodriver $HOME/geckodriver.tar.gz
This is the code used in the Python script
options = Options()
options.add_argument('headless')
options.add_argument('window-size=1920x935')
driver = webdriver.Chrome(options=options, executable_path=r"chromedriver.exe")
driver.get(f"https://www.wildberries.ru/catalog/{id}/feedbacks?imtId={imt_id}")
time.sleep(5)
big_stat = driver.find_element(by=By.CLASS_NAME, value="rating-product__numb")
I can redo this snippet of code to make it work on Firefox, if necessary.
This is what the directories of the hosting where all the files are located look like
The directories of the hosting
For getting Selenium to work with Python using a Dockerfile, here's an existing SeleniumBase Dockerfile.
For instructions on using it, see the README.
For building, it's basically this:
Non Apple M1 Mac:
docker build -t seleniumbase .
If running on an Apple M1 Mac, use this instead:
docker build --platform linux/amd64 seleniumbase .
Before building the Dockerfile, you'll need to clone SeleniumBase.
Here's what the Dockerfile currently looks like:
FROM ubuntu:18.04
#=======================================
# Install Python and Basic Python Tools
#=======================================
RUN apt-get -o Acquire::Check-Valid-Until=false -o Acquire::Check-Date=false update
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev python-distribute
RUN alias python=python3
RUN echo "alias python=python3" >> ~/.bashrc
#=================================
# Install Bash Command Line Tools
#=================================
RUN apt-get -qy --no-install-recommends install \
sudo \
unzip \
wget \
curl \
libxi6 \
libgconf-2-4 \
vim \
xvfb \
&& rm -rf /var/lib/apt/lists/*
#================
# Install Chrome
#================
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
apt-get -yqq update && \
apt-get -yqq install google-chrome-stable && \
rm -rf /var/lib/apt/lists/*
#=================
# Install Firefox
#=================
RUN apt-get -qy --no-install-recommends install \
$(apt-cache depends firefox | grep Depends | sed "s/.*ends:\ //" | tr '\n' ' ') \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
&& wget --no-check-certificate -O firefox-esr.tar.bz2 \
'https://download.mozilla.org/?product=firefox-esr-latest&os=linux64&lang=en-US' \
&& tar -xjf firefox-esr.tar.bz2 -C /opt/ \
&& ln -s /opt/firefox/firefox /usr/bin/firefox \
&& rm -f /tmp/firefox-esr.tar.bz2
#===========================
# Configure Virtual Display
#===========================
RUN set -e
RUN echo "Starting X virtual framebuffer (Xvfb) in background..."
RUN Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
RUN export DISPLAY=:99
RUN exec "$#"
#=======================
# Update Python Version
#=======================
RUN apt-get update -y
RUN apt-get -qy --no-install-recommends install python3.8
RUN rm /usr/bin/python3
RUN ln -s python3.8 /usr/bin/python3
#=============================================
# Allow Special Characters in Python Programs
#=============================================
RUN export PYTHONIOENCODING=utf8
RUN echo "export PYTHONIOENCODING=utf8" >> ~/.bashrc
#=====================
# Set up SeleniumBase
#=====================
COPY sbase /SeleniumBase/sbase/
COPY seleniumbase /SeleniumBase/seleniumbase/
COPY examples /SeleniumBase/examples/
COPY integrations /SeleniumBase/integrations/
COPY requirements.txt /SeleniumBase/requirements.txt
COPY setup.py /SeleniumBase/setup.py
RUN find . -name '*.pyc' -delete
RUN find . -name __pycache__ -delete
RUN pip3 install --upgrade pip
RUN pip3 install --upgrade setuptools
RUN pip3 install --upgrade setuptools-scm
RUN cd /SeleniumBase && ls && pip3 install -r requirements.txt --upgrade
RUN cd /SeleniumBase && pip3 install .
#=====================
# Download WebDrivers
#=====================
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.31.0/geckodriver-v0.31.0-linux64.tar.gz
RUN tar -xvzf geckodriver-v0.31.0-linux64.tar.gz
RUN chmod +x geckodriver
RUN mv geckodriver /usr/local/bin/
RUN wget https://chromedriver.storage.googleapis.com/2.44/chromedriver_linux64.zip
RUN unzip chromedriver_linux64.zip
RUN chmod +x chromedriver
RUN mv chromedriver /usr/local/bin/
#==========================================
# Create entrypoint and grab example tests
#==========================================
COPY integrations/docker/docker-entrypoint.sh /
COPY integrations/docker/run_docker_test_in_firefox.sh /
COPY integrations/docker/run_docker_test_in_chrome.sh /
RUN chmod +x *.sh
COPY integrations/docker/docker_config.cfg /SeleniumBase/examples/
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["/bin/bash"]

cronjob inside docker - can't capture logs

I'm trying to run a cronjob inside a docker container, and the logs (created with python logging) from docker logs my_container or from /var/log/cron.log. Neither is working. I tried a bunch of solutions I found in stackoverflow.
This is my Dockerfile:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Minsk
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update && apt-get install -y \
python3-dev \
python3-tk \
python3-pip \
libglib2.0-0\
libsm6 \
postgresql-server-dev-all \
postgresql-common \
openssh-client \
libxext6 \
nano \
pkg-config \
rsync \
cron \
&& \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/*
RUN pip3 install --upgrade setuptools
RUN pip3 install numpy
ADD requirements.txt /requirements.txt
RUN pip3 install -r /requirements.txt && rm /requirements.txt
RUN touch /var/log/cron.log
COPY crontab /etc/cron.d/cjob
RUN chmod 0644 /etc/cron.d/cjob
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED 1
ADD . /code
WORKDIR /code
COPY ssh_config /etc/ssh/ssh_config
CMD cron -f
and this is how I run it:
nvidia-docker run -d \
-e DISPLAY=unix$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /media/storage:/opt/images/ \
-v /home/user/.aws/:/root/.aws/ \
--net host \
my_container
I tried different things such as:
Docker ubuntu cron tail logs not visible
See cron output via docker logs, without using an extra file
But I don't get any logs.
Change your chmod code to 755 if you're trying to execute something from there. You might also want to add an -R parameter while at that.
Next, add the following to your Dockerfile before chmod layer.
# Symlink the cron to stdout
RUN ln -sf /dev/stdout /var/log/cron.log
And add this as your final layer
# Run the command on container startup
CMD cron && tail -F /var/log/cron.log 2>&1
Referenced this from the first link that you mentioned. This should work.

Categories

Resources