How to persist my notebooks and data in my Docker image/container

How to persist my notebooks and data in my Docker image/container - python

I am new to Docker and I am confused about containers and images somehow. I want to sue Docker for Tensorflow development. All I need is to have an easy way to write Jupyter Notebooks and use GPU powered Tensorflow.
I have the latest Tensorflow Jupyter Python 3 Image already. I run the Image with
docker run --rm --runtime=nvidia -v -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
How can I make it so that my data when I work in that Image and add and edit my Jupyter Notebooks won't get lost after I exit the process. I know that Docker Images aren't meant to persist state but I am so new to this I just want something to work in with persistent data. Can someone help me guide me through this or point to a resource which will answer all my prayers?
I would also like to move some stuff into the Container that is going to be run so that I can access some custom Python libs because they contain some things that my Notebooks need to import!
Side questions:
--rm removes the container or whatever by default I run it without this flag still my data was lost
-v is for volumes? I tried with -v Bachelor:/app to mount a volume like so. It apparently doesn't make any difference. I don't know how to use the volume Bachelor that I created. Instead there are a multitude of unnamed volumes being created that are not usable whenever I run this
-it does also something no idea what
-p is the port number right?

Use Docker volumes:
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers
Example:
docker run --runtime=nvidia -v ${SOURCE_FOLDER}:${DEST_FOLDER} -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
Change SOURCE_FOLDER and DEST_FOLDER accordingly (use absolute paths!).
Now if you navigate to localhost:8888 and create a notebook on DEST_FOLDER, it also should be available on SOURCE_FOLDER.
As for your side questions:
--it runs a container in interactive mode. You generally add /bin/bash after the run command, so you can start an interactive bash session inside the container.
--rm cleans the container after it exists.
Those options aren't really necessary for your use case. Just remember to use docker ps and docker rm <ID> to clean up your container after you're done.

Related

How to efficiently input files with docker

I am starting to get a hand on docker and try to containerized some of the applications I use. Thanks to the tutorial I was able to create docker images and containers but now I am trying to thing about the most efficient and practical ways to do things.
To present my use-case, I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
Normally I would run it through :
python process.py -i path_of_the_input_image -o path_of_the_output_image
Then, the way I do the connection input/output with my docker is the following. First I create the docker file :
FROM python:3.6.8
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD python ./process.py -i ./input_output/input.jpg -o ./input_output/output.jpg
And then after building the image, I run docker run mapping the a local folder with the input_output folder of docker:
docker run -v C:/local_folder/:/app/input_output my_docker_image
This seems to work, but is not really practical, as I have to create locally a specific folder to mount it to the docker container. So here are the questions I am asking myself :
Is there a more practical ways of doings things ? To directly send one single input file and directly receive one single output files from the output of a docker container ?
When I run the docker image, what happens (If I understand correctly) is that it will create a docker container that will run my program once process.py once and then just sits there doing nothing. Even after finishing running process.py it will still be there listed in the command "docker ps -a". Is this behaviour expected ? Is there a way to automatically delete finished container ? Am I using docker run the right way ?
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?

I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
That's most efficiently done without Docker; just run the python command you already have. If your application has interesting Python library dependencies, you can install them in a virtual environment to avoid conflicts with the system Python installation.
When I run the Docker image...
...the container runs its main command (docker run command arguments, Dockerfile CMD, possibly combined with an entrypoint from the some sources), and when that command exits, the container exits. It will be listed in docker ps -a output, but as "Stopped" (probably with status 0 for a successful completion). You can docker run --rm to have the container automatically delete itself.
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?
Wrap it in a network service, like a Flask application. As long as this is running, you can use a tool like curl to do an HTTP POST with the input JPEG file as the body, and get the output JPEG file as the response. Avoid using local files and Docker together whenever that's an option (prefer network I/O for process inputs and outputs; prefer a database to local-file storage).

Why are volume mounts not practical?
I would argue that Dockerising your application is not practical, but you've chosen to do so for, presumably very good, reasons. Volume mounts are simply an extension to this. If you want to get data in/out of your container, the 'normal' way to do this is by using volume mounts as you have done. Sure, you could use docker cp to copy the files manually, but that's not really practical either.
As far as the process exiting goes, normally, once the main process exits, the container exits. docker ps -a shows stopped containers as well as running ones. You should see that it says Exited n minutes(hours, days etc) ago. This means that your container has run and exited, correctly. You can remove it with docker rm <containerid>.
docker ps (no -a) will only show the running ones, btw.
If you use the --rm flag in your Docker run command, it will be removed when it exits, so you won't see it in the ps -a output. Stopped containers can be started again, but that's rather unusual.
Another solution might be to change your script to wait for incoming files and process them as they are received. Then you can leave the container running, and it will just process them as needed. If doing this, make sure that your idle loop has a sleep or something in it to ensure that you don't consume too many resources.

Is there a way to modify files inside docker via PyCharm?

I want to modify files inside docker container with PyCharm. Is there possibility of doing such thing?

What you want to obtain is called Bind Mounting and it can be obtained adding -v parameter to your run command, here's an example with an nginx image:
docker run --name=nginx -d -v ~/nginxlogs:/var/log/nginx -p 5000:80 nginx
The specific parameter obtaining this result is -v.
-v ~/nginxlogs:/var/log/nginx sets up a bindmount volume that links the /var/log/nginx directory from inside the Nginx container to the ~/nginxlogs directory on the host machine.
Docker uses a : to split the host’s path from the container path, and the host path always comes first.
In other words the files that you edit on your local filesystem will be synced to the Docker folder immediately.
Source

Yes. There are multiple ways to do this, and you will need to have PyCharm installed inside the container.
Following set of instructions should work -
docker ps - This will show you details of running containers
docker exec -it *<name of container>* /bin/bash
At this point you will oh shell inside the container. If PyCharm is not installed, you will need to install. Following should work -
sudo apt-get install pycharm-community
Good to go!
Note: The installation is not persistence across Docker image builds. You should add the installation step of PyCharm on DockerFile if you need to access it regularly.

How can I make Docker container shared files with host appeared in the container?

I'm trying to create a container to run a program. I'm using a pre-configured image and now I need to run the program. However, it's a machine learning program and I need a dataset from my computer to run.
The file is too large to be copied to the container. It would be best if the program running in the container searched the dataset in a local directory of my computer, but I don't know how I can do this.
Well, I have made the shared folder from my machine appeared using docker run -it -v ~/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py in the container, but all files in folder ydk-py are not shown. This is the safe, usually-desired behavior. But for development and instance setup, it would be immensely useful to have access to an existing file structure.

docker run with -v will automatically mount sub-directories. In your case you are using relative path, which you need to use absolute path as per this documentation.
So change your command from
docker run -it -v ~/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py
to
docker run -it -v /home/<what ever user>/Volumes/Data/Studies/PhD\Work/gitlab/J2/ydk-py:/ydk-py ydkdev/ydk-py
it will work.
Make sure you have enough permissions on directory that you are trying to mount.

Docker development workflow

What's the proper development workflow for code that runs in a Docker container?
Solomon Hykes said that the "official" workflow involves building and running a new Docker image for each Git commit. That makes sense, but what if I want to test a change before committing it to the Git repo?
I can think of two ways to do it:
Run the code on a local development server (e.g., the Django development server). Edit a file; test on the dev server; make a Git commit; rebuild the Docker image with the new code; test again on the local Docker container.
Don't run a local dev server. Instead, build and run a new Docker image each time I edit a file, and then test the change on local Docker container.
Both approaches are pretty inefficient. Is there a better way?

A more efficient way is to run a new container from the latest image that was built (which then has the latest code).
You could start that container starting a bash shell so that you will be able to edit files from inside the container:
docker run -it <some image> bash -l
You would then run the application in that container to test the new code.
Another way to alter files in that container is to start it with a volume. The idea is to alter files in a directory on the docker host instead of messing with files from the command line from the container itself:
docker run -it -v /home/joe/tmp:/data <some image>
Any file that you will put in /home/joe/tmp on your docker host will be available under /data/ in the container. Change /data to whatever path is suitable for your case and hack away.

Debug Python in Docker Container

I have a docker container running a python server, mounted on my local volume (so it gets updated if I restart the container for instance)
However, this gets tremendously hard to debug. Im using PyCharm professional IDEA.
Ive tried following the guides on how to debug inside docker containers, but it only shows how to do it when you start the container inside PyCharm, in my case I got a big Terraform stuff going on to setup all the environment, so I gotta find a way of attaching to the container python interpreter or something like that.
Would any1 have any ideas or guides on this ?
Thanks !

There are many details missing that would be needed to get a full view, but there are generally two ways to debug containers: 1) debug a running container and 2) debug a container image.
Debugging Container Images and Failed Builds
The latter is much easier because you can look at the history of a particular image and run a layer inside it.
First, we take a look at our locally built images:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 77af4d6b9913 19 hours ago 1.089 GB
committ latest b6fa739cedf5 19 hours ago 1.089 GB
Next, we can pick a particular image and run docker history on it:
$ docker history 77af4d6b9913
IMAGE CREATED CREATED BY SIZE COMMENT
3e23a5875458 8 days ago /bin/sh -c #(nop) ENV LC_ALL=C.UTF-8 0 B
8578938dd170 8 days ago /bin/sh -c dpkg-reconfigure locales && loc 1.245 MB
be51b77efb42 8 days ago /bin/sh -c apt-get update && apt-get install 338.3 MB
4b137612be55 6 weeks ago /bin/sh -c #(nop) ADD jessie.tar.xz in / 121 MB
Then we can pick a layer anywhere in the history of the image and run that interactively:
$ docker run -it --rm 3e23a5875458 /bin/sh
This will dump you into a shell where you can run whatever the next command in the image-build process would be. This is super useful if your docker build command failed and you need to understand why, but it can also be useful if you just want to look at how things are set-up inside a particular container (such as your Python interpreter, dependencies, PATH, etc.).
Attaching to a Running Container
This can be a little more confusing, but similarly, you can run a command inside a runnning container using exec. For instance, I often want to make sure my environment variables are set correctly, so I'll run something like this:
$ docker exec my_container env
You can use this to create a shell inside the running container as well:
$ docker exec -it my_container /bin/sh
This is generic stuff, but useful broadly for debugging containers.
Note: I am using /bin/sh above because a lot of small base images (like Alpine) don't have bash installed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.