Running a Python script automatically when launching a Docker container

Running a Python script automatically when launching a Docker container - python

Is it possible to run a python script automatically upon starting a Docker container?
My command to attach to an image is:
docker run -i -t --entrypoint /bin/bash myimage -s
Is there a way to add an additional command that runs a script upon launching it?
I would prefer not to use a Dockerfile as some of the python modules I use are from private repos and need to be downloaded manually, so a Dockerfile would not completely build the image I want.

As a matter of fact there is. Just don't use --entrypoint. Instead:
docker run -it myimage /bin/bash -c /run.sh
Obviously, this assumes that the image itself contains a simple Bash script at the location /run.sh.
#!/bin/bash
command1
command2
command3
...
If you don't want that, you can mount the current folder inside the running container and run a local script:
docker run -it -v $(pwd):/mnt myimage /bin/bash -c /mnt/run.sh
ENTRYPOINT vs. CMD seems to be a common cause of confusion.
Think about it this way:
ENTRYPOINT is a way to hard-code a certain behavior that cannot be changed after setting it up.
CMD is the default way to supply a command to be run.
Docker containers can be set up to run as self-contained applications. If you're so inclined, you could create throwaway containers that accept command line arguments (a file for example), pull that in, work their magic and return you a processed file. Some people use this to set up build environments with different configurations and just run them on demand, not cluttering up their host machine.
However, your usage scenario feels tedious, since you are apparently doing the setup by hand. It would be easier to set the download credentials as environment variables, like this:
docker run -d -e "DEEP=purple" -e "LED=zeppelin" myimage /bin/bash -c /run.sh
You can then use those within the script as placeholders. This way, you get the best of both worlds. For added security, your run.sh should unset the environment variables once they have been used, like this:
#!/bin/bash
command1
command2
command3
...
unset DEEP
unset LED

Related

Why a dockerized script have a different behaviour when I docker run or I docker execute it?

I'm using a python script for send websocket notification,
as suggested here.
The script is _wsdump.py and I have a script script.sh that is:
#!/bin/sh
set -o allexport
. /root/.env set
env
python3 /utils/_wsdump.py "wss://mywebsocketserver:3000/message" -t "message" &
If I try to dockerizing this script with this Dockerfile:
FROM python:3.8-slim-buster
RUN set -xe \
pip install --upgrade pip wheel && \
pip3 install websocket-client
ENV TZ="Europe/Rome"
ADD utils/_wsdump.py /utils/_wsdump.py
ADD .env /root/.env
ADD script.sh /
ENTRYPOINT ["./script.sh"]
CMD []
I have a strange behaviour:
if I execute docker run -it --entrypoint=/bin/bash mycontainer and after that I call the script.sh everything works fine and I receive the notification.
if I run mycontainer with docker run mycontainer I see no errors but the notification doesn't arrive.
What could be the cause?

Your script doesn't launch a long-running process; it tries to start something in the background and then completes. Since the script completes, and it's the container's ENTRYPOINT, the container exits as well.
The easy fix is to remove the & from the end of the last line of the script to cause the Python process to run in the foreground, and the container will stay alive until the process completes.
There's a more general pattern of an entrypoint wrapper script that I'd recommend adopting here. If you look at your script, it does two things: (1) set up the environment, then (2) run the actual main container command. I'd suggest using the Docker CMD for that actual command
# end of Dockerfile
ENTRYPOINT ["./script.sh"]
CMD python3 /utils/_wsdump.py "wss://mywebsocketserver:3000/message" -t "message"
You can end the entrypoint script with the magic line exec "$#" to run the CMD as the actual main container process. (Technically, it replaces the current shell script with a command constructed by replaying the command-line arguments; in a Docker context the CMD is passed as arguments to the ENTRYPOINT.)
#!/bin/sh
# script.sh
# set up the environment
. /root/.env set
# run the main container command
exec "$#"
With this use you can debug the container setup by replacing the command part (only), like
docker run --rm your-image env
to print out its environment. The alternate command env will replace the Dockerfile CMD but the ENTRYPOINT will remain in place.

You install script.sh to the root dir /, but your ENTRYPOINT is defined to run the relative path ./script.sh.
Try changing ENTRYPOINT to reference the absolute path /script.sh instead.

How to not stop nohup and get output files

I’m new to working on Linux. I apologize if this is a dumb question. Despite searching for more than a week, I was not able to derive a clear answer to my question.
I’m running a very long Python program on Nvidia CPUs. The output is several csv files. It takes long to compute the output, so I use nohup to be able to exit the process.
Let’s say main.py file is this
import numpy as p
import pandas as pd
if __name__ == ‘__main__’:
a = np.arange(1,1000)
data = a*2
filename = ‘results.csv’
output = pd.DataFrame(data, columns = [“Output”])
output.to_csv(filename)
The calculations for data is more complicated, of course. I build a docker container, and run this program inside this container. When I use python main.py for a smaller-sized example, there is no problem. It writes the csv files.
My question is this:
When I do nohup python main.py &, I check what’s going on with tail -f nohup.out in the docker container, I get what it is doing at that time but I cannot exit it and let the execution run its course. It just stops there. How can I exit safely from the screen that comes with tail -f nohup.out?
I tried not checking the condition of the code and letting the code continue for two days, then I returned. The output of tail -f nohup.out indicated that the execution finished but csv files were nowhere to be seen. It is somehow bundled up inside nohup.out or does it indicate something else is wrong?

If you're going to run this setup in a Docker container:
A Docker container runs only one process, as a foreground process; when that process exits the container completes. That process is almost always the script or server you're trying to run and not an interactive shell. But;
It's possible to use Docker constructs to run the container itself in the background, and collect its logs while it's running or after it completes.
A typical Dockerfile for a Python program like this might look like:
FROM python:3.10
# Create and use some directory; it can be anything, but do
# create _some_ directory.
WORKDIR /app
# Install Python dependencies as a separate step. Doing this first
# saves time if you repeat `docker build` without changing the
# requirements list.
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy in the rest of the application.
COPY . .
# Set the main container command to be the script.
CMD ["./main.py"]
The script should be executable (chmod +x main.py on your host) and begin with a "shebang" line (#!/usr/bin/env python3) so the system knows where to find the interpreter.
You will hear recommendations to use both CMD and ENTRYPOINT for the final line. It doesn't matter much to your immediate question. I prefer CMD for two reasons: it's easier to launch an alternate command to debug your container (docker run --rm your-image ls -l vs. docker run --rm --entrypoint ls your-image -l), and there's a very useful pattern of using ENTRYPOINT to do some initial setup (creating environment variables dynamically, running database migrations, ...) and then launching CMD.
Having built the image, you can use the docker run -d option to launch it in the background, and then run docker logs to see what comes out of it.
# Build the image.
docker build -t long-python-program .
# Run it, in the background.
docker run -d --name run1 long-python-program
# Review its logs.
docker logs run1
If you're running this to produce files that need to be read back from the host, you need to mount a host directory into your container at the time you start it. You need to make a couple of changes to do this successfully.
In your code, you need to write the results somewhere different than your application code. You can't mount a host directory over the /app directory since it will hide the code you're actually trying to run.
data_dir = os.getenv('DATA_DIR', 'data')
filename = os.path.join(data_dir, 'results.csv')
Optionally, in your Dockerfile, create this directory and set a pointer to it. Since my sample code gets its location from an environment variable you can again use any path you want.
# Create the data directory.
RUN mkdir /data
ENV DATA_DIR=/data
When you launch the container, the docker run -v option mounts filesystems into the container. For this sort of output file you're looking for a bind mount that directly attaches a host directory to the container.
docker run -d --name run2 \
-v "$PWD/results:/data" \
long-python-program
In this example so far we haven't set the USER of the program, and it will run as root. You can change the Dockerfile to set up an alternate USER (which is good practice); you do not need to chown anything except the data directory to be owned by that user (leaving your code owned by root and not world-writeable is also good practice). If you do this, when you launch the container (on native Linux) you need to provide the host numeric user ID that can write to the host directory; you do not need to make other changes in the Dockerfile.
docker run -d --name run2 \
-u $(id -u) \
-v "$PWD/results:/data" \
long-python-program

1- Container is a foreground process. Use CMD or Entrypoint in Dockerfile.
2- Map volume in docker to linux directory's.

GitHub Action I wrote doesn't have access to repo's files that is calling the action

A sample repo with the directory structure of what I'm working on is on GitHub here. To run the GitHub Action, you just need to go to the Action tab of the repo and run the Action manually.
I have a custom GitHub Action I've written as well with python as the base image in the Docker container but want the python version to be an input for the GitHub Action. In order to do so, I am creating a second intermediate Docker container to run with the python version input argument.
The problem I'm running into is I don't have access to the original repo's files that is calling the GitHub Action. For example, say the repo is called python-sample-project and has folder structure:
python-sample-project
│ main.py
│ file1.py
│
└───folder1
│ │ file2.py
I see main.py, file1.py, and folder1/file2.py in entrypoint.sh. However, in docker-action/entrypoint.sh I only see the linux folder structure and the entrypoint.sh file copied over in docker-action/Dockerfile.
In the Alpine example I'm using, the action entrypoint.sh script looks like this:
#!/bin/sh -l
ALPINE_VERSION=$1
cd /docker-action
docker build -t docker-action --build-arg alpine_version="$ALPINE_VERSION" . && docker run docker-action
In docker-action/ I have a Dockerfile and entrypoint.sh script that should run for the inner container with the dynamic version of Alpine (or Python)
The docker-action/Dockerfile is as follows:
# Container image that runs your code
ARG alpine_version
FROM alpine:${alpine_version}
# Copies your code file from your action repository to the filesystem path `/` of the container
COPY entrypoint.sh /entrypoint.sh
RUN ["chmod", "+x", "/entrypoint.sh"]
# Code file to execute when the docker container starts up (`entrypoint.sh`)
ENTRYPOINT ["/entrypoint.sh"]
In the docker-action/entrypoint I run ls but I do not see the repository files.
Is it possible to access the main.py, file1.py, and folder1/file2.py in entrypoint.sh in the docker-action/entrypoint.sh?

There's generally two ways to get files from your repository available to a docker container you build and run. You either (1) add the files to the image when you build it or (2) mount the files into the container when you run it. There are some other ways, like specifying volumes, but that's probably out of scope for this case.
The Dockerfile docker-action/Dockerfile does not copy any files except for the entrypoint.sh script. Your entrypoint.sh also does not provide any mount points when running the container. Hence, the outcome you observe is the expected outcome based on these facts.
In order to resolve this, you must either (1) add COPY/ADD statements to your Dockerfile to copy files into the image (and set appropriate build context) OR (2) mount the files into the container when it runs by adding -v /source-path:/container-path to the docker run command in your entrypoint.sh.
See references:
COPY reference
Docker run reference
Though, this approach of building another container just to get a user-provided python version is a highly questionable practice for GitHub Actions and should probably be avoided. Consider leaning on the setup-python action instead.
The docker-in-docker problem
Nevertheless, if you continue this route and want to go about mounting the directory, you'll have to keep in mind that, when invoking docker from within a docker action on GitHub, the filesystem in the mount specification refers to the filesystem of the docker host, not the filesystem of the container.
It works on my machine?!
Counter to what you might experience running docker on a local system for example, this does not work in GitHub -- the working directory is not mounted:
docker run -v $(pwd):/opt/workspace \
--workdir /opt/workspace \
--entrypoint /bin/ls \
my-container "-R"
This doesn't work either:
docker run -v $GITHUB_WORKSPACE:$GITHUB_WORKSPACE \
--workdir $GITHUB_WORKSPACE \
--entrypoint /bin/ls \
my-container "-R"
This kind of thing would work perfectly fine if you tried it on a system running docker locally. What gives?
Dealing with the devil (daemon)
In Actions, the starting working directory where files are checked out into $GITHUB_WORKSPACE. In docker actions, that's /github/workspace. The workspace files populate into the workspace when your action runs by the Actions runner mounting the workspace from the host where the docker daemon is running.
You can see that in the command run when your action starts:
/usr/bin/docker run --name f884202608aa2bfab75b6b7e1f87b3cd153444_f687df --label f88420 --workdir /github/workspace --rm -e INPUT_ALPINE-VERSION -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_RUN_ATTEMPT -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_NAME -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/my-repo/my-repo":"/github/workspace" f88420:2608aa2bfab75b6b7e1f87b3cd153444 "3.9.5"
The important bits are this:
-v "/home/runner/work/my-repo/my-repo":"/github/workspace"
-v "/var/run/docker.sock":"/var/run/docker.sock"
/home/runner/work/my-repo/my-repo is the path on the host, where the repository files are. As mentioned, that first line is what gets it mounted into /github/workspace in your action container when it gets run.
The second line is mounting the docker socket from the host to the action container. This means any time you call docker within your action, you're actually talking to the docker daemon outside of your container. This is important because that means when you use the -v argument inside your action, the arguments need to reflect directories that exist outside of the container.
So, what you would actually need to do instead is this:
docker run -v /home/runner/work/my-repo/my-repo:/opt/workspace \
--workdir /opt/workspace \
--entrypoint /bin/ls \
my-container "-R"
Becoming useful to others
And that works. If you only use it for the project itself. However, you have (among others) a remaining problem if you want this action to be consumable by other projects. How do you know where the workspace is on the host? This path will change for each repository, after all. GitHub does not guarantee these paths, either. They may be different on different platforms, or your action may be running on a self-hosted runner.
So how do you content with that problem? There is no inbuilt environment variable that points to this directory you need specifically, unfortunately. However, by relying on implementation detail, you might be able to get away with using the $RUNNER_WORKSPACE variable, which will point, in this case to /home/runner/work/your-project. This is not the same place as the origin of $GITHUB_WORKSPACE but it's close. You can use the GITHUB_REPOSITORY variable to build the path, though this isn't guaranteed to always be the case afaik:
PROJECT_NAME="$(basename ${GITHUB_REPOSITORY})"
WORKSPACE="${RUNNER_WORKSPACE}/${PROJECT_NAME}"
You also have some other things to fix like the working directory form which you build.
TL;DR
You need to mount files in the container when you run it. In GitHub, you're running docker-in-docker, so paths you need to use to mount files work different, so you need to find the correct paths to pass to docker when called from within your action container.
A minimally working solution for the example project you linked is this entrypoint.sh in the root of the repo looks like this:
#!/usr/bin/env sh
ALPINE_VERSION=$1
docker build -t docker-action \
-f ./docker-action/Dockerfile \
--build-arg alpine_version="$ALPINE_VERSION" \
./docker-action
PROJECT_NAME="$(basename ${GITHUB_REPOSITORY})"
WORKSPACE="${RUNNER_WORKSPACE}/${PROJECT_NAME}"
docker run --workdir=$GITHUB_WORKSPACE \
-v $WORKSPACE:$GITHUB_WORKSPACE \
docker-action "$#"
There are probably further concerns with your action, depending on what it does, like making available all the default and user-defined environment variables for the action to the 'inner' container, if that's important.
So, is this possible? Sure. Is it reasonable just to get a dynamic version of alpine/python? I don't think so. There's probably better ways of accomplishing what you want to do, like using setup-python, but that sounds like a different question.

Pass environment variables as input to Docker entrypoint

Suppose I have
ENTRYPOINT ["python", "myscript.py"]
and myscript.py has an argument --envvar
That is, if I were to run it locally, I would run python myscript --envvar $envvar
Is there any way to provide this argument in Docker, given that I've already chosen to make Python my entrypoint?

If it’s really an environment variable, use the docker run -e option.
docker run -e VAR=value myimage
Alternately, anything you specify as a “command”, either things after the image name in the docker run command or a Dockerfile CMD directive, get passed as command-line arguments to the entrypoint.
# note: your local shell expands $envvar
docker run myimage --envvar "$envvar"

Docker interactive mode and executing script

I have a Python script in my docker container that needs to be executed, but I also need to have interactive access to the container once it has been created ( with /bin/bash ).
I would like to be able to create my container, have my script executed and be inside the container to see the changes/results that have occurred (no need to manually execute my python script).
The current issue I am facing is that if I use the CMD or ENTRYPOINT commands in the docker file I am unable to get back into the container once it has been created. I tried using docker start and docker attach but I'm getting the error:
sudo docker start containerID
sudo docker attach containerID
"You cannot attach to a stepped container, start it first"
Ideally, something close to this:
sudo docker run -i -t image /bin/bash python myscript.py
Assume my python script contains something like (It's irrelevant what it does, in this case it just creates a new file with text):
open('newfile.txt','w').write('Created new file with text\n')
When I create my container I want my script to execute and I would like to be able to see the content of the file. So something like:
root#66bddaa892ed# sudo docker run -i -t image /bin/bash
bash4.1# ls
newfile.txt
bash4.1# cat newfile.txt
Created new file with text
bash4.1# exit
root#66bddaa892ed#
In the example above my python script would have executed upon creation of the container to generate the new file newfile.txt. This is what I need.

My way of doing it is slightly different with some advantages.
It is actually multi-session server rather than script but could be even more usable in some scenarios:
# Just create interactive container. No start but named for future reference.
# Use your own image.
docker create -it --name new-container <image>
# Now start it.
docker start new-container
# Now attach bash session.
docker exec -it new-container bash
Main advantage is you can attach several bash sessions to single container. For example I can exec one session with bash for telling log and in another session do actual commands.
BTW when you detach last 'exec' session your container is still running so it can perform operations in background

You can run a docker image, perform a script and have an interactive session with a single command:
sudo docker run -it <image-name> bash -c "<your-script-full-path>; bash"
The second bash will keep the interactive terminal session open, irrespective of the CMD command in the Dockerfile the image has been created with, since the CMD command is overwritten by the bash - c command above.
There is also no need to appending a command like local("/bin/bash") to your Python script (or bash in case of a shell script).
Assuming that the script has not yet been transferred from the Docker host to the docker image by an ADD Dockerfile command, we can map the volumes and run the script from there:
sudo docker run -it -v <host-location-of-your-script>:/scripts <image-name> bash -c "/scripts/<your-script-name>; bash"
Example: assuming that the python script in the original question is already on the docker image, we can omit the -v option and the command is as simple as follows:
sudo docker run -it image bash -c "python myscript.py; bash"

Why not this?
docker run --name="scriptPy" -i -t image /bin/bash python myscript.py
docker cp scriptPy:/path/to/newfile.txt /path/to/host
vim /path/to/host
Or if you want it to stay on the container
docker run --name="scriptPy" -i -t image /bin/bash python myscript.py
docker start scriptPy
docker attach scriptPy
Hope it was helpful.

I think this is what you mean.
Note: THis uses Fabric (because I'm too lazy and/or don't have the time to work out how to wire up stdin/stdout/stderr to the terminal properly but you could spend the time and use straight subprocess.Popen):
Output:
$ docker run -i -t test
Entering bash...
[localhost] local: /bin/bash
root#66bddaa892ed:/usr/src/python# cat hello.txt
Hello World!root#66bddaa892ed:/usr/src/python# exit
Goodbye!
Dockerfile:
# Test Docker Image
FROM python:2
ADD myscript.py /usr/bin/myscript
RUN pip install fabric
CMD ["/usr/bin/myscript"]
myscript.py:
#!/usr/bin/env python
from __future__ import print_function
from fabric.api import local
with open("hello.txt", "w") as f:
f.write("Hello World!")
print("Entering bash...")
local("/bin/bash")
print("Goodbye!")

Sometimes, your python script may call different files in your folder, like another python scripts, CSV files, JSON files etc.
I think the best approach would be sharing the dir with your container, which would make easier to create one environment that has access to all the required files
Create one text script
sudo nano /usr/local/bin/dock-folder
Add this script as its content
#!/bin/bash
echo "IMAGE = $1"
## image name is the first param
IMAGE="$1"
## container name is created combining the image and the folder address hash
CONTAINER="${IMAGE}-$(pwd | md5sum | cut -d ' ' -f 1)"
echo "${IMAGE} ${CONTAINER}"
# remove the image from this dir, if exists
## rm remove container command
## pwd | md5 get the unique code for the current folder
## "${IMAGE}-$(pwd | md5sum)" create a unique name for the container based in the folder and image
## --force force the container be stopped and removed
if [[ "$2" == "--reset" || "$3" == "--reset" ]]; then
echo "## removing previous container ${CONTAINER}"
docker rm "${CONTAINER}" --force
fi
# create one special container for this folder based in the python image and let this folder mapped
## -it interactive mode
## pwd | md5 get the unique code for the current folder
## --name="${CONTAINER}" create one container with unique name based in the current folder and image
## -v "$(pwd)":/data create ad shared volume mapping the current folder to the /data inside your container
## -w /data define the /data as the working dir of your container
## -p 80:80 some port mapping between the container and host ( not required )
## pyt#hon name of the image used as the starting point
echo "## creating container ${CONTAINER} as ${IMAGE} image"
docker create -it --name="${CONTAINER}" -v "$(pwd)":/data -w /data -p 80:80 "${IMAGE}"
# start the container
docker start "${CONTAINER}"
# enter in the container, interactive mode, with the shared folder and running python
docker exec -it "${CONTAINER}" bash
# remove the container after exit
if [[ "$2" == "--remove" || "$3" == "--remove" ]]; then
echo "## removing container ${CONTAINER}"
docker rm "${CONTAINER}" --force
fi
Add execution permission
sudo chmod +x /usr/local/bin/dock-folder
Then you can call the script into your project folder calling:
# creates if not exists a unique container for this folder and image. Access it using ssh.
dock-folder python
# destroy if the container already exists and replace it
dock-folder python --replace
# destroy the container after closing the interactive mode
dock-folder python --remove
This call will create a new python container sharing your folder. This makes accessible all the files in the folder as CSVs or binary files.
Using this strategy, you can quickly test your project in a container and interact with the container to debug it.
One issue with this approach is about reproducibility. That is, you may install something using your shell script that is required to your application run. But, this change just happened inside of your container. So, anyone that will try to run your code will have to figure out what you have done to run it and do the same.
So, if you can run your project without installing anything special, this approach may suits you well. But, if you had to install or change some things in your container to be able to run your project, probably you need to create a Dockerfile to save these commands. That will make all the steps from loading the container, making the required changes and loading the files easy to replicate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.