Using Jenkins and Docker to run Python scripts

Using Jenkins and Docker to run Python scripts - python

I would like to run Python scripts in various stages of Jenkins (pipeline) jobs, abroad a wide range of agents. I want the same Python environment for all of these, so I'm considering using Docker for this purpose.
I'm considering using Docker to build an image that contains the Python environment (with installed packages, etc.), that then allows an external Python script based on argument input:
docker run my_image my_python_file.py
My question is now, how should the infrastructure be? I see that the Python docker distribution is 688MB, and transferring this image to all steps would surely be an overhead? However, they are all on the same network, so perhaps it wouldn't be a big issue.
Updates. So, my Dockerfile looks like this:
FROM python:3.6-slim-jessie
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python3"]
Then I build the image using
>docker build ./ -t my-app
which successfully builds the image and install my requirements. Then I want to start the image as daemon using
> docker run -dit my-app
Then I execute the process using
> docker exec -d {DAEMON_ID} my-script.py

Run your Docker container as a daemon process, and every time you need to run your Python script, call docker exec.
docker exec -d <your-container> <your-python-file.py>

Using a Docker agents for build is an effective way to have distributed and reproducible builds.
I see that the Python docker distribution is 688MB, and transferring this image to all steps would surely be an overhead?
You should consider using smaller Docker images. There are alpine and slim docker images for python. You should consider using these first. The size of the alpine python image is 89.2MB.
Also the most of the image layers will be cached by Docker, so you will be pulling a some layers with significantly smaller sizes.

Related

Dockerizing an API built with python on local machine

I have cloned a repository of an API built with python on my local machine and my goal is to be able to send requests and receive responses locally.
I'm not familiar python but the code is very readable and easy to understand, however the repository contains some dependencies and configuration files to Dockerise (and I'm not familiar with Docker and containers too) .
so what are the steps to follow in order to be able to interact with the API locally?.
Here are some files in the repository for config and requirements :
requirements.txt file :
fastapi==0.70.0
pytest==7.0.1
requests==2.27.1
uvicorn==0.15.0
Dockerfile file :
FROM tiangolo/uvicorn-gunicorn:python3.9
COPY ./requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
COPY ./app /app
i already installed Python3 and docker so what's next ?

Adjust Dockerfile
Assuming all code is in the /app directory you have already copied over all your code and installed all the dependencies required for the application.
But you are missing - at least (see disclaimer) - one essential line in the Dockerfile which is actually the most important line as it is the CMD command to tell Docker which command/ process should be executed when the container starts.
I am not familiar with the particular base image you are using (which is defined using the FROM command) but after googling I found this repo which suggests the following line, which does make a lot of sense to me as it starts a web server:
# open port 80 on the container to make it accesable from the outside
EXPOSE 80
# line as described in repo to start the web server
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
This should start the web server on port 80 using the application stored in a variable app in your main.py when the container starts.
Build and run container
When you have added that you need to build your image using docker build command.
docker build -t asmoun/my-container .
This builds an container image asmoun/my-container using the Dockerfile in the current directory, hence the .. So make sure you execute that when in the directory with the Dockerfile. This will take some time as the base image has to download and dependencies need to be installed.
You now have an image that you can run using docker run command:
docker run --name my-fastapi-container -d -p 80:80 asmoun/my-container
This will start a container called my-fastapi-container using the image asmoun/my-container in detached mode (-d option that makes sure your TTY is not attached to the container) and define a port mapping, which maps the port 80 on the host to port 80 on the container, which we have previously exposed in the Dockerfile (EXPOSE 80).
You should now see some ID getting printed to your console. This means the container has started. You can check its state using docker ps -a and you should see it marked as running. If it is, you should be able to connect to localhost:80 now. If it is not use docker logs my-fastapi-container to view the logs of the container and you'll hopefully learn more.
Disclaimer
Please be aware that this is only a minimal guide on how you should be able to get a simple FastAPI container up and running, but some parameters could well be different depending on the application (e.g. name of main.py could be server.py or similar stuff) in which case you will need to adjust some of the parameters but the overall process (1. adjust Dockerfile, 2. build container, 3. run container) should work. It's also possible that your application expects some other stuff to be present in the container which would need to be defined in the Dockerfile but neither me, nor you (presumably) know this, as the Dockerfile provided seems to be incomplete. This is just a best effort answer.
I have tried to link all relevant resources and commands so you can have a look at what some of them do and which options/ parameters might be of interest for you.

Python Script Execution with AWS BAtch

I created a simple docker container to run a python script from within the container:
FROM python:3
WORKDIR /usr/src/app
COPY . .
CMD ["test.py"]
ENTRYPOINT ["python3"]
I build it, docker build -t hello-demo . and then run it docker run -it hello-demo test.py and I get my output.
But what I want to do is to be able to rerun this not on my laptop but using AWS Batch. But I am not sure how to identify the container name Batch creates. When I manually build the container I specify what I am calling it, but I am not sure how to call the correct container when running my docker run command.
Any thoughts? Or am I going about this wrong?
Thanks!
D

First, distinguish the Docker image from the Docker container. You build the Docker image and the container is a running instance of the image (refer to the Docker overview for more details). When you run the Docker image locally you specify what Docker image to run and also (optionally) the name of the container.
When using AWS Batch, you create a job definition which defines what Docker image it will use. Thus, the Docker image already has to exist in some registry, be it Docker Hub, ECR or any other registry. You can also define other container properties like a command that will be run.
To actually run the Batch job, you submit it to the queue using the job definition (which specifies the Docker image as stated above).

How to efficiently input files with docker

I am starting to get a hand on docker and try to containerized some of the applications I use. Thanks to the tutorial I was able to create docker images and containers but now I am trying to thing about the most efficient and practical ways to do things.
To present my use-case, I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
Normally I would run it through :
python process.py -i path_of_the_input_image -o path_of_the_output_image
Then, the way I do the connection input/output with my docker is the following. First I create the docker file :
FROM python:3.6.8
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD python ./process.py -i ./input_output/input.jpg -o ./input_output/output.jpg
And then after building the image, I run docker run mapping the a local folder with the input_output folder of docker:
docker run -v C:/local_folder/:/app/input_output my_docker_image
This seems to work, but is not really practical, as I have to create locally a specific folder to mount it to the docker container. So here are the questions I am asking myself :
Is there a more practical ways of doings things ? To directly send one single input file and directly receive one single output files from the output of a docker container ?
When I run the docker image, what happens (If I understand correctly) is that it will create a docker container that will run my program once process.py once and then just sits there doing nothing. Even after finishing running process.py it will still be there listed in the command "docker ps -a". Is this behaviour expected ? Is there a way to automatically delete finished container ? Am I using docker run the right way ?
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?

I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
That's most efficiently done without Docker; just run the python command you already have. If your application has interesting Python library dependencies, you can install them in a virtual environment to avoid conflicts with the system Python installation.
When I run the Docker image...
...the container runs its main command (docker run command arguments, Dockerfile CMD, possibly combined with an entrypoint from the some sources), and when that command exits, the container exits. It will be listed in docker ps -a output, but as "Stopped" (probably with status 0 for a successful completion). You can docker run --rm to have the container automatically delete itself.
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?
Wrap it in a network service, like a Flask application. As long as this is running, you can use a tool like curl to do an HTTP POST with the input JPEG file as the body, and get the output JPEG file as the response. Avoid using local files and Docker together whenever that's an option (prefer network I/O for process inputs and outputs; prefer a database to local-file storage).

Why are volume mounts not practical?
I would argue that Dockerising your application is not practical, but you've chosen to do so for, presumably very good, reasons. Volume mounts are simply an extension to this. If you want to get data in/out of your container, the 'normal' way to do this is by using volume mounts as you have done. Sure, you could use docker cp to copy the files manually, but that's not really practical either.
As far as the process exiting goes, normally, once the main process exits, the container exits. docker ps -a shows stopped containers as well as running ones. You should see that it says Exited n minutes(hours, days etc) ago. This means that your container has run and exited, correctly. You can remove it with docker rm <containerid>.
docker ps (no -a) will only show the running ones, btw.
If you use the --rm flag in your Docker run command, it will be removed when it exits, so you won't see it in the ps -a output. Stopped containers can be started again, but that's rather unusual.
Another solution might be to change your script to wait for incoming files and process them as they are received. Then you can leave the container running, and it will just process them as needed. If doing this, make sure that your idle loop has a sleep or something in it to ensure that you don't consume too many resources.

Is there a way to modify files inside docker via PyCharm?

I want to modify files inside docker container with PyCharm. Is there possibility of doing such thing?

What you want to obtain is called Bind Mounting and it can be obtained adding -v parameter to your run command, here's an example with an nginx image:
docker run --name=nginx -d -v ~/nginxlogs:/var/log/nginx -p 5000:80 nginx
The specific parameter obtaining this result is -v.
-v ~/nginxlogs:/var/log/nginx sets up a bindmount volume that links the /var/log/nginx directory from inside the Nginx container to the ~/nginxlogs directory on the host machine.
Docker uses a : to split the host’s path from the container path, and the host path always comes first.
In other words the files that you edit on your local filesystem will be synced to the Docker folder immediately.
Source

Yes. There are multiple ways to do this, and you will need to have PyCharm installed inside the container.
Following set of instructions should work -
docker ps - This will show you details of running containers
docker exec -it *<name of container>* /bin/bash
At this point you will oh shell inside the container. If PyCharm is not installed, you will need to install. Following should work -
sudo apt-get install pycharm-community
Good to go!
Note: The installation is not persistence across Docker image builds. You should add the installation step of PyCharm on DockerFile if you need to access it regularly.

How to persist my notebooks and data in my Docker image/container

I am new to Docker and I am confused about containers and images somehow. I want to sue Docker for Tensorflow development. All I need is to have an easy way to write Jupyter Notebooks and use GPU powered Tensorflow.
I have the latest Tensorflow Jupyter Python 3 Image already. I run the Image with
docker run --rm --runtime=nvidia -v -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
How can I make it so that my data when I work in that Image and add and edit my Jupyter Notebooks won't get lost after I exit the process. I know that Docker Images aren't meant to persist state but I am so new to this I just want something to work in with persistent data. Can someone help me guide me through this or point to a resource which will answer all my prayers?
I would also like to move some stuff into the Container that is going to be run so that I can access some custom Python libs because they contain some things that my Notebooks need to import!
Side questions:
--rm removes the container or whatever by default I run it without this flag still my data was lost
-v is for volumes? I tried with -v Bachelor:/app to mount a volume like so. It apparently doesn't make any difference. I don't know how to use the volume Bachelor that I created. Instead there are a multitude of unnamed volumes being created that are not usable whenever I run this
-it does also something no idea what
-p is the port number right?

Use Docker volumes:
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers
Example:
docker run --runtime=nvidia -v ${SOURCE_FOLDER}:${DEST_FOLDER} -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
Change SOURCE_FOLDER and DEST_FOLDER accordingly (use absolute paths!).
Now if you navigate to localhost:8888 and create a notebook on DEST_FOLDER, it also should be available on SOURCE_FOLDER.
As for your side questions:
--it runs a container in interactive mode. You generally add /bin/bash after the run command, so you can start an interactive bash session inside the container.
--rm cleans the container after it exists.
Those options aren't really necessary for your use case. Just remember to use docker ps and docker rm <ID> to clean up your container after you're done.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.