How to run python files kept in separate folders from a Dockerfile?

How to run python files kept in separate folders from a Dockerfile? - python

i have a dockerfile which looks like this:
FROM python:3.7-slim-stretch
ENV PIP pip
RUN \
$PIP install --upgrade pip && \
$PIP install scikit-learn && \
$PIP install scikit-image && \
$PIP install rasterio && \
$PIP install geopandas && \
$PIP install matplotlib
COPY sentools sentools
COPY data data
COPY vegetation.py .
Now in my project i have two python files vegetation and forest. i have kept each of them in separate folders. How can i create separate docker images for both python files and execute the containers for them separately?

If the base code is same, and only the container is supposed to run up with different Python Script, So then I will suggest using single Docker and you will not worry about the management of two docker image.
Set vegetation.py to default, when container is up without passing ENV it will run vegetation.py and if the ENV FILE_TO_RUN override during run time, the specified file will be run.
FROM python:3.7-alpine3.9
ENV FILE_TO_RUN="/vegetation.py"
COPY vegetation.py /vegetation.py
CMD ["sh", "-c", "python $FILE_TO_RUN"]
Now, if you want to run forest.py then you can just pass the path file to ENV.
docker run -it -e FILE_TO_RUN="/forest.py" --rm my_image
or
docker run -it -e FILE_TO_RUN="/anyfile_to_run.py" --rm my_image
updated:
You can manage with args+env in your docker image.
FROM python:3.7-alpine3.9
ARG APP="default_script.py"
ENV APP=$APP
COPY $APP /$APP
CMD ["sh", "-c", "python /$APP"]
Now build with ARGs
docker build --build-arg APP="vegetation.py" -t app_vegetation .
or
docker build --build-arg APP="forest.py" -t app_forest .
Now good to run
docker run --rm -it app_forest
copy both
FROM python:3.7-alpine3.9
# assign some default script name to args
ARG APP="default_script.py"
ENV APP=$APP
COPY vegetation.py /vegetation.py
COPY forest.py /forest.py
CMD ["sh", "-c", "python /$APP"]

If you insist in creating separate images, you can always use the ARG command.
FROM python:3.7-slim-stretch
ARG file_to_copy
ENV PIP pip
RUN \
$PIP install --upgrade pip && \
$PIP install scikit-learn && \
$PIP install scikit-image && \
$PIP install rasterio && \
$PIP install geopandas && \
$PIP install matplotlib
COPY sentools sentools
COPY data data
COPY $file_to_copy .
And the build the image like that:
docker build --buid-arg file_to_copy=vegetation.py .
or like that
docker build --buid-arg file_to_copy=forest.py .

When you start a Docker container, you can specify what command to run at the end of the docker run command. So you can build a single image that contains both scripts and pick which one runs when you start the container.
The scripts should be "normally" executable: they need to have the executable permission bit set, and they need to start with a line like
#!/usr/bin/env python3
and you should be able to locally (outside of Docker) run
. some_virtual_environment/bin/activate
./vegetation.py
Once you've gotten through this, you can copy the content into a Docker image
FROM python:3.7-slim-stretch
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY sentools sentools
COPY data data
COPY vegetation.py forest.py .
CMD ["./vegetation.py"]
Then you can build and run this image with either script.
docker build -t trees .
docker run --rm trees ./vegetation.py
docker run --rm trees ./forest.py
If you actually want this to be two separate images, you can create two separate Dockerfiles that differ only in their final COPY and CMD lines, and use the docker build -f option to pick which one to use.
$ tail -2 Dockerfile.vegetation
COPY vegetation.py ./
CMD ["./vegetation.py"]
$ docker build -t vegetation -f Dockerfile.vegetation .
$ docker run --rm vegetation

Related

/bin/sh python not found error when running docker base image

I am trying to run a docker base image but am encountering the error /bin/sh: 1: python: not found. I am first building a parent image and then modifying it using the bash script below
#!/usr/bin/env bash
docker build -t <image_name>:latest .
docker run <image_name>:latest
docker push <image_name>:latest
and the Dockerfile
FROM ubuntu:18.04
# Installing Python
RUN apt-get update \
&& apt-get install -y python3-pip python3-dev \
&& cd /usr/local/bin \
&& ln -s /usr/bin/python3 python \
&& pip3 install Pillow boto3
WORKDIR /app
After that, I run the following script to create and run the base image:
#!/usr/bin/env bash
docker build -t <base_image_name>:latest .
docker run -it <base_image_name>:latest
with the following Dockerfile:
FROM <image_name>:latest
COPY app.py /app
# Run app.py when the container launches
CMD python /app/app.py
I have also tried installing python through the Dockerfile of the base image, but I still get the same error.

IMHO a better solution would be to use one of the official python images.
FROM python:3.9-slim
RUN pip install --no-cache-dir Pillow boto3
WORKDIR /app
To fix the issue of python not being found -- instead of
cd /usr/local/bin \
&& ln -s /usr/bin/python3 python
OP should symlink to /usr/bin/python, not /usr/local/bin/python as they did in the original post. Another way to do this is with an absolute symlink as below.
ln -s /usr/bin/python3 /usr/bin/python

A more elegant docker run command

I have built a docker image using a Dockerfile that does the following:
FROM my-base-python-image
WORKDIR /opt/data/projects/project/
RUN mkdir files
RUN rm -rf /etc/yum.repos.d/*.repo
COPY rss-centos-7-config.repo /etc/yum.repos.d/
COPY files/ files/
RUN python -m venv /opt/venv && . /opt/venv/activate
RUN yum install -y unzip
WORKDIR files/
RUN unzip file.zip && rm -rf file.zip && . /opt/venv/bin/activate && python -m pip install *
WORKDIR /opt/data/projects/project/
That builds an image that allows me to run a custom command. In a terminal, for instance, here is the commmand I run after activating my project venv:
python -m pathA.ModuleB -a inputfile_a.json -b inputfile_b.json -c
Arguments a & b are custom tags to identify input files. -c calls a block of code.
So to run the built image successfully, I run the container and map local files to input files:
docker run --rm -it -v /local/inputfile_a.json:/opt/data/projects/project/inputfile_a.json -v /local/inputfile_b.json:/opt/data/projects/project/inputfile_b.json image-name:latest bash -c 'source /opt/venv/bin/activate && python -m pathA.ModuleB -a inputfile_a.json -b inputfile_b.json -c'
Besides shortening file paths, is there anythin I can do to shorten the docker run command? I'm thinking that adding a CMD and/or ENTRYPOINT to the Dockerfile would help, but I cannot figure out how to do it as I get errors.

There are a couple of things you can do to improve this.
The simplest is to run the application outside of Docker. You mention that you have a working Python virtual environment. A design goal of Docker is that programs in containers can't generally access files on the host, so if your application is all about reading and writing host files, Docker may not be a good fit.
Your file paths inside the container are fairly long, and this is bloating your -v mount options. You don't need an /opt/data/projects/project prefix; it's very typical just to use short paths like /app or /data.
You're also installing your application into a Python virtual environment, but inside a Docker image, which provides its own isolation. As you're seeing in your docker run command and elsewhere, the mechanics of activating a virtual environment in Docker are a little hairy. It's also not necessary; just skip the virtual environment setup altogether. (You can also directly run /opt/venv/bin/python and it knows it "belongs to" a virtual environment, without explicitly activating it.)
Finally, in your setup.py file, you can use a setuptools entry_points declaration to provide a script that runs your named module.
That can reduce your Dockerfile to more or less
FROM my-base-python-image
# OS-level setup
RUN rm -rf /etc/yum.repos.d/*.repo
COPY rss-centos-7-config.repo /etc/yum.repos.d/
RUN yum install -y unzip
# Copy the application in
WORKDIR /app/files
COPY files/ ./
RUN unzip file.zip \
&& rm file.zip \
&& pip install *
# Typical runtime metadata
WORKDIR /app
CMD main-script --help
And then when you run it, you can:
docker run --rm -it \
-v /local:/data \ # just map the entire directory
image-name:latest \
main-script -a /data/inputfile_a.json -b /data/inputfile_b.json -c
You can also consider the docker run -w /data option to change the current directory, which would add a Docker-level argument but slightly shorten the script command line.

Problem with multi-stage Dockerfile (Python - venv)

I'm trying to create a Python webapp docker image using multi-stage, to shrink the image size... right now it's around 300mb... it's also using virtual enviroment.
The docker image builds and runs fine up untill the point I need to add multi-stage so I know something is going wrong after that.... Could you help me out identifying what's wrong?
FROM python:3.8.3-alpine AS origin
RUN apk update && apk add git
RUN apk --no-cache add py3-pip build-base
RUN pip install -U pip
RUN pip install virtualenv
RUN virtualenv venv
RUN source venv/bin/activate
WORKDIR /opt/app
COPY . .
RUN pip install -r requirements.txt
## Works fine until this point ""
FROM alpine:latest
WORKDIR /opt/app
COPY --from=origin /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH" VIRTUAL_ENV="/opt/venv"
COPY . /opt/app/
CMD [ "file.py" ]
ENTRYPOINT ["python"]
Without the VENV it looks something like this (still throwing error "sh: python: not found"):
FROM python:3.8.3-alpine AS origin
WORKDIR /opt/app
RUN apk update && apk add git
RUN apk --no-cache add py3-pip build-base
RUN pip install -U pip
COPY . .
RUN pip install -r requirements.txt
FROM alpine:latest
WORKDIR /home
COPY --from=origin /opt/app .
CMD sh -c 'python file.py'

You still need pyhton in your runtime container, since you changed your last image to just alpine it wouldn't work. Just a tip, combine your CMD and ENTRYPOINT under one of them, there is generally no need for having two of them. Try to use only ENTRYPOINT since you can pass CMD easily in runtime for example to activate debug mode more easily.
EDIT: Please stay away from alpine for python apps as you can get some weird issues about it. You can use "python_version-slim-buster" images, they are small enough.

bulding docker image for several script

I have 2 python script and one R script but the main scripts to run are the python script (I call the R script in one of the python scripts). I have to dockerize all these script. to do so I have made Dockerfile which is here:
FROM python:3.7
WORKDIR /opt/app/
ADD ./ ./
RUN pip3.7 install -r ./requirements.txt
CMD python3.7 qc.py
CMD python3.7 cano.py
So, I have 2 questions:
1- shall I include the R script in the Dockerfile? (that is myscript.r)
2- before running the docker image I need to build the image. if I had one script (qc.py) to run I will use the following command to build image:
sudo docker build -t qc .
but what would be the command to build the image for the Dockerfile with more than one script?

The docker image produced when calling docker build should stay separate from the execution of the scripts.
To execute something that's inside of an image, you can use docker run.
Using your example:
This is the directory with your Dockerfile in it:
$ tree .
├── Dockerfile
├── cano.py
├── myscript.r
├── qc.py
└── requirements.txt
0 directories, 5 files
We want to build a docker image that has all of the R and Python scripts in it, and all of the dependencies to execute those scripts, but we don't necessarily want to run them yet.
In your Dockerfile, you don't have the dependencies needed to run myscript.r because the base image (FROM python:3.7) doesn't have the required packages installed. I looked up what was required to run an R script in the r-base repo on docker hub and in the repo on github, and then added it to the Dockerfile.
FROM python:3.7
# Install the dependencies for R
RUN apt-get update && apt-get install -y r-base r-base-dev r-recommended
# Add all of the scripts to the /opt/app/ path inside of the image
ADD . /opt/app/
# Change the working directory inside of the image to /opt/app/
WORKDIR /opt/app/
# Install the python dependencies in /opt/app/requirements.txt using pip
RUN pip3.7 install -r ./requirements.txt
# This command just shows info about the contents of the image. It doesn't run any
# scripts, since that will be done _AFTER_ the image is built.
CMD pwd && ls -AlhF ./
Notice that the default CMD doesn't run any of the scripts. Instead we can do that using the docker run command from the terminal:
# The --rm removes the container after executing, and the -it makes the container interactive
$ docker run --rm -it qc python cano.py
Hello world! (from cano.py)
Now, putting it all together:
# Starting in the directory with your Dockerfile in it
$ ls .
Dockerfile cano.py myscript.r qc.py requirements.txt
# Build the docker image, and tag it as "qc"
$ docker build -t qc .
Sending build context to Docker daemon 6.656kB
Step 1/6 : FROM python:3.7
---> fbf9f709ca9f
Step 2/6 : RUN apt-get update && apt-get install -y r-base r-base-dev r-recommended
# ...lots of output...
Successfully tagged qc:latest
# Run the scripts
$ docker run --rm -it qc python cano.py
Hello world! (from cano.py)
$ docker run --rm -it qc python qc.py
Hello world! (from qc.py)
$ docker run --rm -it qc Rscript myscript.r
[1] "Hello world! (from myscript.r)"
I've collected all of the example code in this github gist to make it easier to see everything in one place.

Docker python output csv file

I have a script python which should output a file csv. I'm trying to have this file in the current working directory but without success.
This is my Dockerfile
FROM python:3.6.4
RUN apt-get update && apt-get install -y libaio1 wget unzip
WORKDIR /opt/oracle
RUN wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-
basiclite-linuxx64.zip && \ unzip instantclient-basiclite-linuxx64.zip && rm
-f instantclient-basiclite-linuxx64.zip && \ cd /opt/oracle/instantclient*
&& rm -f jdbc occi mysql *README jar uidrvci genezi adrci && \ echo
/opt/oracle/instantclient > /etc/ld.so.conf.d/oracle-instantclient.conf &&
ldconfig
RUN pip install --upgrade pip
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install pystan
RUN apt-get -y update && python3 -m pip install cx_Oracle --upgrade
RUN pip install -r requirements.txt
CMD [ "python", "Main.py" ]
And run the container with the following command
docker container run -v $pwd:/home/learn/rstudio_script/output image

This is bad practice to bind a volume just to have 1 file on your container be saved onto your host.
Instead, what you should leverage is the copy command:
docker cp <containerId>:/file/path/within/container /host/path/target
You can set this command to auto execute with bash, after your docker run.
So something like:
#!/bin/bash
# this stores the container id
CONTAINER_ID=$(docker run -dit img)
docker cp $CONTAINER_ID:/some_path host_path
If you are adamant on using a bind volume, then as the others have pointed out, the issue is most likely your python script isn't outputting the csv to the correct path.

Your script Main.py is probably not trying to write to /home/learn/rstudio_script/output. The working directory in the container is /app because of the last WORKDIR directive in the Dockerfile. You can override that at runtime with --workdir but then the CMD would have to be changed as well.
One solution is to have your script write files to /output/ and then run it like this:
docker container run -v $PWD:/output/ image

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.