Dockerfile WORKDIR distracts running program from layer?

Dockerfile WORKDIR distracts running program from layer? - python

I made the Dockerfile for making Docker image that runnable from AWS Batch, contains multiple layers, copy files to '/opt', which I set it as WORKDIR.
I have to run a program called 'BLAST', which is a single .exe program, requires several parameters including the location of DB.
When I run the image, the error comes out with it cannot find the mounted DB location. Full error message is b'BLAST Database error: No alias or index file found for nucleotide database [/mnt/fsx/ntdb/nt] in search path [/opt:/fsx/ntdb:]\n'] where /mnt/fsx/ntdb/nt is the DB path.
The only assumption is because I gave WORKDIR in my Dockerfile so the default workspace is set as '/opt:'.
I wonder how should I fix this issue. By removing WORKDIR ? or something else?
My Dockerfile looks like below
# Set Work dir
ARG FUNCTION_DIR="/opt"
# Get layers
FROM (aws-account).dkr.ecr.(aws-region).amazonaws.com/uclust AS layer_1
FROM (aws-account).dkr.ecr.(aws-region).amazonaws.com/blast AS layer_2
FROM public.ecr.aws/lambda/python:3.9
# Copy arg and set work dir
ARG FUNCTION_DIR
COPY . ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
# Copy from layers
COPY --from=layer_1 /opt/ .
RUN true
COPY --from=layer_2 /opt/ .
RUN true
COPY . ${FUNCTION_DIR}/
RUN true
# Copy and Install required libraries
COPY requirements.txt .
RUN true
RUN pip3 install -r requirements.txt
# To run lambda handler
RUN pip install \
--target "${FUNCTION_DIR}" \
awslambdaric
# To run blast
RUN yum -y install libgomp
# See files inside image
RUN dir -s
# Get permissions for files
RUN chmod +x /opt/main.py
RUN chmod +x /opt/mode/submit/main.py
# Set Entrypoint and CMD
ENTRYPOINT [ "python3" ]
CMD [ "-m", "awslambdaric", "main.lambda_handler" ]
Edit: Further info I found, When looking at the error, the BLAST program trying to search db at the path /opt:/fsx/ntdb:, which is the combination of path set as WORKDIR in Dockerfile and BLASTDB path set by os.environ.['BLASTDB'] (os.environ['BLASTDB'] description.).

Figured out the problem after many debug trials. So the problem was neither WORKDIR nor os.environ.['BLASTDB']. The paths were correctly defined, and the BLAST program searching [/opt:/fsx/ntdb:] was correct way according to what is says in here
Current working directory (*)
User's HOME directory (*)
Directory specified by the NCBI environment variable
The standard system directory (“/etc” on Unix-like systems, and given by the environment variable SYSTEMROOT on Windows)
The actual solution was checking whether file system is correctly mounted or not and the permission of the files inside the file system. Initially I thought file system was mounted correctly since I already tested from other Batch submit job many times, but only the mount folder is created, files were not exist. Therefore, even though the program tried to find the index file, it could not find any so the error came out.

Related

Trying to supply PGPASS to Docker Image

New to Docker here. I'm trying to create a basic Dockerfile where I run a python script that runs some queries in postgres through psycopg2. I currently have a pgpass file setup in my environment variables so that I can run these tools without supplying a password in the code. I'm trying to replicate this in Docker. I have windows on my local.
FROM datascienceschool/rpython as base
RUN mkdir /test
WORKDIR /test
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY main_airflow.py /test
RUN cp C:\Users\user.name\AppData\Roaming\postgresql\pgpass.conf /test
ENV PGPASSFILE="test/pgpass.conf"
ENV PGUSER="user"
ENTRYPOINT ["python", "/test/main_airflow.py"]
This is what I've tried in my Dockerfile. I've tried to copy over my pgpassfile and set it as my environment variable. Apologies if I have a forward/backslashes wrong, or syntax. I'm very new to Docker, Linux, etc.
Any help or alternatives would be appreciated

It's better to pass your secrets into the container at runtime than it is to include the secret in the image at build-time. This means that the Dockerfile doesn't need to know anything about this value.
For example
$ export PGPASSWORD=<postgres password literal>
$ docker run -e PGPASSWORD <image ref>
Now in that example, I've used PGPASSWORD, which is an alternative to PGPASSFILE. It's a little more complicated to do this same if you're using a file, but that would be something like this:
The plan will be to mount the credentials as a volume at runtime.
FROM datascienceschool/rpython as base
RUN mkdir /test
WORKDIR /test
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY main_airflow.py /test
ENV PGPASSFILE="/credentials/pgpass.conf"
ENV PGUSER="user"
ENTRYPOINT ["python", "/test/main_airflow.py"]
As I said above, we don't want to include the secrets in the image. We are going to indicate where the file will be in the image, but we don't actually include it yet.
Now, when we start the image, we'll mount a volume containing the file at the location specified in the image, /credentials
$ docker run --mount src="<host path to directory>",target="/credentials",type=bind <image ref>
I haven't tested this so you may need to adjust the exact paths and such, but this is the idea of how to set sensitive values in a docker container

Docker - No such file or directory when running the image

FROM python:3.10
COPY requirements.txt .
RUN pip install -r requirements.txt
#Make a copy of the current directory
COPY / ./
#Display list of files in directory
RUN ls /
ENTRYPOINT ["python", "/main.py"]
So this is my current dockerfile and the list displays as this when I build.
Directory List
This is the code that is giving me the issue
d1 = open(r"backend_resources\results\orlando_averaged_2019-01-01.geojson")
And throwing me this error when I run the image
FileNotFoundError: [Errno 2] No such file or directory: 'backend_resources\\results\\orlando_averaged_2019-01-01.geojson'
However you will notice that in the image with the list, backend_resources and the other files within it do exist, and to my knowledge are in the correct directories for this code to run properly, still new to Docker so I definitely could be doing something wrong

I think problem is chosen path style. You use path in windows style.
If you image based on unix system (Debian, Alpine, etc.), use path in unix style.
d1 = open(r"/backend_resources/results/orlando_averaged_2019-01-01.geojson")

Python application in a container using Podman

I'd like to build a container using Podman which would contains the following:
a Python application
the Python modules I developed but which are not stored at the same place than the Python application
the Python environment (made with miniconda/mambaforge)
a mounted folder for input data
a mounted folder for output data
To do that, I've added a Dockerfile in my home directory. Below is the content of this Dockerfile:
FROM python:3
# Add the Python application
ADD /path/to/my_python_app /my_python_app
# Add the Python modules used by the Python application
ADD /path/to/my_modules /my_modules
# Add the whole mambaforge folder (contains the virtual envs) with the exact same path than the local one
ADD /path/to/mambaforge /exact/same/path/to/mambaforge
# Create a customized .bashrc (contains 'export PATH' to add mamba path and 'export PYTHONPATH' to add my_modules path)
ADD Dockerfile_bashrc /root/.bashrc
Then, I build the container with:
podman build -t python_app .
And run it with:
podman run -i -t -v /path/to/input/data:/mnt/input -v /path/to/output/data:/mnt/output python_app /bin/bash
In the Dockerfile, note I add the whole mambaforge (it is like miniconda). Is it possible to only add the virtual environment? I found I needed to add the whole mambaforge because I need to activate the virtual environment with mamba/conda activate my_env. Which I do in a .bashrc (with the conda initialization) that I put in /root/.bashrc. In this file, I also do export PYTHONPATH="/my_modules:$PYTHONPATH".
I'd also like to add the following line in my Dockerfile to execute automatically the Python application when running the container.
CMD ["python", "/path/to/my_python_app/my_python_app.py"]
However, this doesn't work because it seems the container needs to be run interactively in order to load the .bashrc first.
All of this is kludge and I'd like to know if there is a simpler and better way to do that?
Many thanks for your help!

How to not stop nohup and get output files

I’m new to working on Linux. I apologize if this is a dumb question. Despite searching for more than a week, I was not able to derive a clear answer to my question.
I’m running a very long Python program on Nvidia CPUs. The output is several csv files. It takes long to compute the output, so I use nohup to be able to exit the process.
Let’s say main.py file is this
import numpy as p
import pandas as pd
if __name__ == ‘__main__’:
a = np.arange(1,1000)
data = a*2
filename = ‘results.csv’
output = pd.DataFrame(data, columns = [“Output”])
output.to_csv(filename)
The calculations for data is more complicated, of course. I build a docker container, and run this program inside this container. When I use python main.py for a smaller-sized example, there is no problem. It writes the csv files.
My question is this:
When I do nohup python main.py &, I check what’s going on with tail -f nohup.out in the docker container, I get what it is doing at that time but I cannot exit it and let the execution run its course. It just stops there. How can I exit safely from the screen that comes with tail -f nohup.out?
I tried not checking the condition of the code and letting the code continue for two days, then I returned. The output of tail -f nohup.out indicated that the execution finished but csv files were nowhere to be seen. It is somehow bundled up inside nohup.out or does it indicate something else is wrong?

If you're going to run this setup in a Docker container:
A Docker container runs only one process, as a foreground process; when that process exits the container completes. That process is almost always the script or server you're trying to run and not an interactive shell. But;
It's possible to use Docker constructs to run the container itself in the background, and collect its logs while it's running or after it completes.
A typical Dockerfile for a Python program like this might look like:
FROM python:3.10
# Create and use some directory; it can be anything, but do
# create _some_ directory.
WORKDIR /app
# Install Python dependencies as a separate step. Doing this first
# saves time if you repeat `docker build` without changing the
# requirements list.
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy in the rest of the application.
COPY . .
# Set the main container command to be the script.
CMD ["./main.py"]
The script should be executable (chmod +x main.py on your host) and begin with a "shebang" line (#!/usr/bin/env python3) so the system knows where to find the interpreter.
You will hear recommendations to use both CMD and ENTRYPOINT for the final line. It doesn't matter much to your immediate question. I prefer CMD for two reasons: it's easier to launch an alternate command to debug your container (docker run --rm your-image ls -l vs. docker run --rm --entrypoint ls your-image -l), and there's a very useful pattern of using ENTRYPOINT to do some initial setup (creating environment variables dynamically, running database migrations, ...) and then launching CMD.
Having built the image, you can use the docker run -d option to launch it in the background, and then run docker logs to see what comes out of it.
# Build the image.
docker build -t long-python-program .
# Run it, in the background.
docker run -d --name run1 long-python-program
# Review its logs.
docker logs run1
If you're running this to produce files that need to be read back from the host, you need to mount a host directory into your container at the time you start it. You need to make a couple of changes to do this successfully.
In your code, you need to write the results somewhere different than your application code. You can't mount a host directory over the /app directory since it will hide the code you're actually trying to run.
data_dir = os.getenv('DATA_DIR', 'data')
filename = os.path.join(data_dir, 'results.csv')
Optionally, in your Dockerfile, create this directory and set a pointer to it. Since my sample code gets its location from an environment variable you can again use any path you want.
# Create the data directory.
RUN mkdir /data
ENV DATA_DIR=/data
When you launch the container, the docker run -v option mounts filesystems into the container. For this sort of output file you're looking for a bind mount that directly attaches a host directory to the container.
docker run -d --name run2 \
-v "$PWD/results:/data" \
long-python-program
In this example so far we haven't set the USER of the program, and it will run as root. You can change the Dockerfile to set up an alternate USER (which is good practice); you do not need to chown anything except the data directory to be owned by that user (leaving your code owned by root and not world-writeable is also good practice). If you do this, when you launch the container (on native Linux) you need to provide the host numeric user ID that can write to the host directory; you do not need to make other changes in the Dockerfile.
docker run -d --name run2 \
-u $(id -u) \
-v "$PWD/results:/data" \
long-python-program

1- Container is a foreground process. Use CMD or Entrypoint in Dockerfile.
2- Map volume in docker to linux directory's.

Exporting files from docker volume to another directory

I have a python code which reads data from the file and do some calculation and save the result to the output file. The code also saves the logs in log file. So in my current directory, I have below files:
1. code.py --> The main python application
2. input.json --> This json file is used to take input data
3. output.json --> The output data is saved in this file.
4. logfile.log --> This file saves the log.
All the above file is inside the directory Application. Full path is /home/user/Projects/Application/. Now when I am running the code.py I am getting the expected results. So I converted the above code into docker by using below Dockerfile:
FROM python:3
ADD code.py /
ADD input.json /
ADD output.json /
ADD logfile.log /
CMD [ "python3", "./code.py" ]
When I am running the docker container, it is running fine but I cannot see the output data and logs in output.json and logfile.log. Then I searched for these file in the file system and found these files in below directory:
/var/lib/docker/overlay2/7c237c143f9f2e711832daccecdfb29abaf1e37a4714f34f34870e0ee4b1af07/diff/home/user/Projects/Application/
and all my files were in that directory. I checked for the logs and the data, it was there. Then I understood that all the files will be saved inside the docker volumes and not in our current directory.
Is there any way I can keep the files and all the data in my current directory /home/user/Projects/Application/ instead of docker because in this way it will be easy for me to check the outputs.
Thanks

The files are located under the docker overlay volume because you didn’t do volume mounting. To overcome this, you can modify your Dockerfile to look similar to this:
FROM python:3
RUN mkdir /app
ADD code.py /app
ADD input.json /app
ADD output.json /app
ADD logfile.log /app
WORKDIR /app
VOLUME /app
CMD [ "python3", "./code.py" ]
Then in your docker run command, make sure you pass this option:
-v /home/user/Projects/Application:/app
More information about container options can be found at https://www.aquasec.com/wiki/display/containers/Docker+Containers.
If you are using docker compose, you need to add:
volumes:
- /home/user/Projects/Application: /var/www/app

You may try to run your container as below: [you may not need to build your image]
docker run --rm -v /home/user/Projects/Application/:/home/user/Projects/Application/ -d python:3 /home/user/Projects/Application/code.py
-v ; bind mount local folder into your container at /home/user/Projects/Application/.
Feel free to take out --rm if you don't need that.
Please make sure code.py write logs to /home/user/Projects/Application/logfile.log
To verify files and folders are there are not by running command:
docker run --rm -v /home/user/Projects/Application/:/home/user/Projects/Application/ -d python:3 sh
This will drop you in a terminal, you can list files and make sure required files and configs are there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.