How to efficiently input files with docker

How to efficiently input files with docker - python

I am starting to get a hand on docker and try to containerized some of the applications I use. Thanks to the tutorial I was able to create docker images and containers but now I am trying to thing about the most efficient and practical ways to do things.
To present my use-case, I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
Normally I would run it through :
python process.py -i path_of_the_input_image -o path_of_the_output_image
Then, the way I do the connection input/output with my docker is the following. First I create the docker file :
FROM python:3.6.8
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD python ./process.py -i ./input_output/input.jpg -o ./input_output/output.jpg
And then after building the image, I run docker run mapping the a local folder with the input_output folder of docker:
docker run -v C:/local_folder/:/app/input_output my_docker_image
This seems to work, but is not really practical, as I have to create locally a specific folder to mount it to the docker container. So here are the questions I am asking myself :
Is there a more practical ways of doings things ? To directly send one single input file and directly receive one single output files from the output of a docker container ?
When I run the docker image, what happens (If I understand correctly) is that it will create a docker container that will run my program once process.py once and then just sits there doing nothing. Even after finishing running process.py it will still be there listed in the command "docker ps -a". Is this behaviour expected ? Is there a way to automatically delete finished container ? Am I using docker run the right way ?
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?

I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
That's most efficiently done without Docker; just run the python command you already have. If your application has interesting Python library dependencies, you can install them in a virtual environment to avoid conflicts with the system Python installation.
When I run the Docker image...
...the container runs its main command (docker run command arguments, Dockerfile CMD, possibly combined with an entrypoint from the some sources), and when that command exits, the container exits. It will be listed in docker ps -a output, but as "Stopped" (probably with status 0 for a successful completion). You can docker run --rm to have the container automatically delete itself.
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?
Wrap it in a network service, like a Flask application. As long as this is running, you can use a tool like curl to do an HTTP POST with the input JPEG file as the body, and get the output JPEG file as the response. Avoid using local files and Docker together whenever that's an option (prefer network I/O for process inputs and outputs; prefer a database to local-file storage).

Why are volume mounts not practical?
I would argue that Dockerising your application is not practical, but you've chosen to do so for, presumably very good, reasons. Volume mounts are simply an extension to this. If you want to get data in/out of your container, the 'normal' way to do this is by using volume mounts as you have done. Sure, you could use docker cp to copy the files manually, but that's not really practical either.
As far as the process exiting goes, normally, once the main process exits, the container exits. docker ps -a shows stopped containers as well as running ones. You should see that it says Exited n minutes(hours, days etc) ago. This means that your container has run and exited, correctly. You can remove it with docker rm <containerid>.
docker ps (no -a) will only show the running ones, btw.
If you use the --rm flag in your Docker run command, it will be removed when it exits, so you won't see it in the ps -a output. Stopped containers can be started again, but that's rather unusual.
Another solution might be to change your script to wait for incoming files and process them as they are received. Then you can leave the container running, and it will just process them as needed. If doing this, make sure that your idle loop has a sleep or something in it to ensure that you don't consume too many resources.

Related

Output from Python running on Docker back to host with docker-compose

Im working on a OpenCv project and as many know, the instalation of that on windows is iritating. So, what i want to do is to run the project in a docker container and store the output to a folder on the host computer. In simple terms it is something like this:
Program python / opencv code
Build Docker image
Run Docker image --> Saves the output data somewhere
In some way - get access to the output data on host.
Now, i have been trying to find many ways of dooing this, and i will probably at a later time send it using other means. However, for development i need this slightly more direct approach. It also has somthing to do with colaboration with others.
Simple Docker file that can be used as the base:
FROM python:3
WORKDIR /usr/src/app
COPY . .
CMD [ "python", "./script.py" ]
Lets say that script.py creates a file called output.txt. I want that output.txt stored at my E: drive.
How to do this automatically - without having to do multiple comandline operations?
TLDR; How to get files from Docker container to host? Goal: File physically stored on E:

There are two ways to do this. First is to mount a docker volume.
docker run --name=name -d -v /path/in/host:/path/in/docker name
By mounting a volume like this, whatever you write in the directory you've mounted will be written in the host automatically. Check this for more information on volumes.
The second way is to copy the files from a container to the host like this
docker cp <containerId>:/file/path/within/container /host/path/target
Check docs here for more info on this.

How can I run luigid and luigi task within docker? [duplicate]

I have built a base image from Dockerfile named centos+ssh. In centos+ssh's Dockerfile, I use CMD to run ssh service.
Then I want to build a image run other service named rabbitmq,the Dockerfile:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD /opt/mq/sbin/rabbitmq-server start
To start rabbitmq container，run：
docker run -d -p 222:22 -p 4149:4149 rabbitmq
but ssh service doesn't work, it sense rabbitmq's Dockerfile CMD override centos's CMD.
How does CMD work inside docker image?
If I want to run multiple service, how to? Using supervisor?

You are right, the second Dockerfile will overwrite the CMD command of the first one. Docker will always run a single command, not more. So at the end of your Dockerfile, you can specify one command to run. Not more.
But you can execute both commands in one line:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD service sshd start && /opt/mq/sbin/rabbitmq-server start
What you could also do to make your Dockerfile a little bit cleaner, you could put your CMD commands to an extra file:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD sh /home/centos/all_your_commands.sh
And a file like this:
service sshd start &
/opt/mq/sbin/rabbitmq-server start

Even though CMD is written down in the Dockerfile, it really is runtime information. Just like EXPOSE, but contrary to e.g. RUN and ADD. By this, I mean that you can override it later, in an extending Dockerfile, or simple in your run command, which is what you are experiencing. At all times, there can be only one CMD.
If you want to run multiple services, I indeed would use supervisor. You can make a supervisor configuration file for each service, ADD these in a directory, and run the supervisor with supervisord -c /etc/supervisor to point to a supervisor configuration file which loads all your services and looks like
[supervisord]
nodaemon=true
[include]
files = /etc/supervisor/conf.d/*.conf
If you would like more details, I wrote a blog on this subject here: http://blog.trifork.com/2014/03/11/using-supervisor-with-docker-to-manage-processes-supporting-image-inheritance/

While I respect the answer from qkrijger explaining how you can work around this issue I think there is a lot more we can learn about what's going on here ...
To actually answer your question of "why" ... I think it would for helpful for you to understand how the docker stop command works and that all processes should be shutdown cleanly to prevent problems when you try to restart them (file corruption etc).
Problem: What if docker did start SSH from it's command and started RabbitMQ from your Docker file? "The docker stop command attempts to stop a running container first by sending a SIGTERM signal to the root process (PID 1) in the container." Which process is docker tracking as PID 1 that will get the SIGTERM? Will it be SSH or Rabbit?? "According to the Unix process model, the init process -- PID 1 -- inherits all orphaned child processes and must reap them. Most Docker containers do not have an init process that does this correctly, and as a result their containers become filled with zombie processes over time."
Answer: Docker simply takes that last CMD as the one that will get launched as the root process with PID 1 and get the SIGTERM from docker stop.
Suggested solution: You should use (or create) a base image specifically made for running more than one service, such as phusion/baseimage
It should be important to note that tini exists exactly for this reason, and as of Docker 1.13 and up, tini is officially part of Docker, which tells us that running more than one process in Docker IS VALID .. so even if someone claims to be more skilled regarding Docker, and insists that you absurd for thinking of doing this, know that you are not. There are perfectly valid situations for doing so.
Good to know:
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
http://www.techbar.me/stopping-docker-containers-gracefully/
https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/
https://github.com/phusion/baseimage-docker#docker_single_process

The official docker answer to Run multiple services in a container.
It explains how you can do it with an init system (systemd, sysvinit, upstart) , a script (CMD ./my_wrapper_script.sh) or a supervisor like supervisord.
The && workaround can work only for services that starts in background (daemons) or that will execute quickly without interaction and release the prompt. Doing this with an interactive service (that keeps the prompt) and only the first service will start.

To address why CMD is designed to run only one service per container, let's just realize what would happen if the secondary servers run in the same container are not trivial / auxiliary but "major" (e.g. storage bundled with the frontend app). For starters, it would break down several important containerization features such as horizontal (auto-)scaling and rescheduling between nodes, both of which assume there is only one application (source of CPU load) per container. Then there is the issue of vulnerabilities - more servers exposed in a container means more frequent patching of CVEs...
So let's admit that it is a 'nudge' from Docker (and Kubernetes/Openshift) designers towards good practices and we should not reinvent workarounds (SSH is not necessary - we have docker exec / kubectl exec / oc rsh designed to replace it).
More info
https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container

How to persist my notebooks and data in my Docker image/container

I am new to Docker and I am confused about containers and images somehow. I want to sue Docker for Tensorflow development. All I need is to have an easy way to write Jupyter Notebooks and use GPU powered Tensorflow.
I have the latest Tensorflow Jupyter Python 3 Image already. I run the Image with
docker run --rm --runtime=nvidia -v -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
How can I make it so that my data when I work in that Image and add and edit my Jupyter Notebooks won't get lost after I exit the process. I know that Docker Images aren't meant to persist state but I am so new to this I just want something to work in with persistent data. Can someone help me guide me through this or point to a resource which will answer all my prayers?
I would also like to move some stuff into the Container that is going to be run so that I can access some custom Python libs because they contain some things that my Notebooks need to import!
Side questions:
--rm removes the container or whatever by default I run it without this flag still my data was lost
-v is for volumes? I tried with -v Bachelor:/app to mount a volume like so. It apparently doesn't make any difference. I don't know how to use the volume Bachelor that I created. Instead there are a multitude of unnamed volumes being created that are not usable whenever I run this
-it does also something no idea what
-p is the port number right?

Use Docker volumes:
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers
Example:
docker run --runtime=nvidia -v ${SOURCE_FOLDER}:${DEST_FOLDER} -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
Change SOURCE_FOLDER and DEST_FOLDER accordingly (use absolute paths!).
Now if you navigate to localhost:8888 and create a notebook on DEST_FOLDER, it also should be available on SOURCE_FOLDER.
As for your side questions:
--it runs a container in interactive mode. You generally add /bin/bash after the run command, so you can start an interactive bash session inside the container.
--rm cleans the container after it exists.
Those options aren't really necessary for your use case. Just remember to use docker ps and docker rm <ID> to clean up your container after you're done.

How can I make Docker container shared files with host appeared in the container?

I'm trying to create a container to run a program. I'm using a pre-configured image and now I need to run the program. However, it's a machine learning program and I need a dataset from my computer to run.
The file is too large to be copied to the container. It would be best if the program running in the container searched the dataset in a local directory of my computer, but I don't know how I can do this.
Well, I have made the shared folder from my machine appeared using docker run -it -v ~/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py in the container, but all files in folder ydk-py are not shown. This is the safe, usually-desired behavior. But for development and instance setup, it would be immensely useful to have access to an existing file structure.

docker run with -v will automatically mount sub-directories. In your case you are using relative path, which you need to use absolute path as per this documentation.
So change your command from
docker run -it -v ~/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py
to
docker run -it -v /home/<what ever user>/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py
it will work.
Make sure you have enough permissions on directory that you are trying to mount.

Docker development workflow

What's the proper development workflow for code that runs in a Docker container?
Solomon Hykes said that the "official" workflow involves building and running a new Docker image for each Git commit. That makes sense, but what if I want to test a change before committing it to the Git repo?
I can think of two ways to do it:
Run the code on a local development server (e.g., the Django development server). Edit a file; test on the dev server; make a Git commit; rebuild the Docker image with the new code; test again on the local Docker container.
Don't run a local dev server. Instead, build and run a new Docker image each time I edit a file, and then test the change on local Docker container.
Both approaches are pretty inefficient. Is there a better way?

A more efficient way is to run a new container from the latest image that was built (which then has the latest code).
You could start that container starting a bash shell so that you will be able to edit files from inside the container:
docker run -it <some image> bash -l
You would then run the application in that container to test the new code.
Another way to alter files in that container is to start it with a volume. The idea is to alter files in a directory on the docker host instead of messing with files from the command line from the container itself:
docker run -it -v /home/joe/tmp:/data <some image>
Any file that you will put in /home/joe/tmp on your docker host will be available under /data/ in the container. Change /data to whatever path is suitable for your case and hack away.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.