I have a Python Program which runs perfectly as standalone program
Time Taken - 5 days
I dockerize the program and execute it with 10% of dataset
docker runs and the program executes successfully
When i use full dataset(108K records) and build and run the new docker
The Docker starts running for 4 hours and logs the steps perfectly
After 4 hours no logging is done
when i inspect using htop no resource is being used
htop image - sys resource use
for docker stats it is not using any resource
docker stats image
For docker ps it shows the image is running
docker ps image
Kindly let me know what I am doing wrong
Is docker has any limits to running a program or logging data
Are you running docker directly on linux or are you using OSX/Windows for it, if so, you might be hitting memory limits.
If running on the cloud (AWS...) check that the machine has no expiry or something like that. I recommend trying to run that locally first.
Related
I am running a python script inside a kubernetes pod with kubectl exec -it bash.Its a long running script which might take a day to complete.i executed the python script from my laptop inside the kubernetes pod.
If i close my laptop,will the script stop running inside the pod?
If you are running Kubernetes on Cloud, the script will continue until it is finished succefully or throws an error even if you close your laptop,
Othewise, if you are running local Kubernetes Cluster, for example: with minikube, cluster will shut down and so is your script
It's not possible to know the answer without at least the following information:
laptop OS (including distribution and version)
whether your k8s is running directly on your laptop or on remote hardware
I'll assume you're running linux. If you are running a productive k8s locally on your laptop (in which case, why?), then you likely have to change the settings in your desktop environment, or temporarily disable acpid, or your virtualised cluster will cease to exist when the power turns off. All of the former is completely dependent on your hardware and software.
If the process is running remotely (on other hardware), turning off your laptop will not make a difference to the running script. Read the man page for kubectl-exec:
-i, --stdin=false
Pass stdin to the container
-t, --tty=false
Stdin is a TTY
The arguments for an interactive shell are just about mapping stdin to the container; kubectl won't kill your remote process if your laptop turns off, loses network connectivity etc.
I am trying to run my code within a docker container hosted on an AWS EC2 machine.
It seems that PyCharm can connect to the interpreter because it can show the list of installed packages when looking at the interpreter configuration.
However, when I try to open a Python console, or when I try to run a Python script, I have the error:
3987f6fc2476:/usr/bin/python3 /opt/.pycharm_helpers/pydev/pydevconsole.py --mode=server --port=55516
Couldn't connect to console process.
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
Happy to provide more information. What is possibly going wrong here? The error seems pretty generic.
EDIT: PyCharm can start the docker container but still the Python console won't work. On the server, docker ps returns:
ecd6a7220b55 9e1ad5b17633 "/usr/bin/python3 /o…" 1 second ago Up Less than a second 22/tcp, 0.0.0.0:50219->50219/tcp dreamy_matsumoto
Turns out that the issue with that PyCharm used a random port every time it starts a Python console when connecting to remote docker container. If we could open all the inbound ports on the EC2, this feature would work. Of course there is nothing worst from a security perspective. Do NOT do this. (but if you really want to do it you'll need to setup docker through TCP)
I am starting to get a hand on docker and try to containerized some of the applications I use. Thanks to the tutorial I was able to create docker images and containers but now I am trying to thing about the most efficient and practical ways to do things.
To present my use-case, I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
Normally I would run it through :
python process.py -i path_of_the_input_image -o path_of_the_output_image
Then, the way I do the connection input/output with my docker is the following. First I create the docker file :
FROM python:3.6.8
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD python ./process.py -i ./input_output/input.jpg -o ./input_output/output.jpg
And then after building the image, I run docker run mapping the a local folder with the input_output folder of docker:
docker run -v C:/local_folder/:/app/input_output my_docker_image
This seems to work, but is not really practical, as I have to create locally a specific folder to mount it to the docker container. So here are the questions I am asking myself :
Is there a more practical ways of doings things ? To directly send one single input file and directly receive one single output files from the output of a docker container ?
When I run the docker image, what happens (If I understand correctly) is that it will create a docker container that will run my program once process.py once and then just sits there doing nothing. Even after finishing running process.py it will still be there listed in the command "docker ps -a". Is this behaviour expected ? Is there a way to automatically delete finished container ? Am I using docker run the right way ?
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?
I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.
That's most efficiently done without Docker; just run the python command you already have. If your application has interesting Python library dependencies, you can install them in a virtual environment to avoid conflicts with the system Python installation.
When I run the Docker image...
...the container runs its main command (docker run command arguments, Dockerfile CMD, possibly combined with an entrypoint from the some sources), and when that command exits, the container exits. It will be listed in docker ps -a output, but as "Stopped" (probably with status 0 for a successful completion). You can docker run --rm to have the container automatically delete itself.
Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?
Wrap it in a network service, like a Flask application. As long as this is running, you can use a tool like curl to do an HTTP POST with the input JPEG file as the body, and get the output JPEG file as the response. Avoid using local files and Docker together whenever that's an option (prefer network I/O for process inputs and outputs; prefer a database to local-file storage).
Why are volume mounts not practical?
I would argue that Dockerising your application is not practical, but you've chosen to do so for, presumably very good, reasons. Volume mounts are simply an extension to this. If you want to get data in/out of your container, the 'normal' way to do this is by using volume mounts as you have done. Sure, you could use docker cp to copy the files manually, but that's not really practical either.
As far as the process exiting goes, normally, once the main process exits, the container exits. docker ps -a shows stopped containers as well as running ones. You should see that it says Exited n minutes(hours, days etc) ago. This means that your container has run and exited, correctly. You can remove it with docker rm <containerid>.
docker ps (no -a) will only show the running ones, btw.
If you use the --rm flag in your Docker run command, it will be removed when it exits, so you won't see it in the ps -a output. Stopped containers can be started again, but that's rather unusual.
Another solution might be to change your script to wait for incoming files and process them as they are received. Then you can leave the container running, and it will just process them as needed. If doing this, make sure that your idle loop has a sleep or something in it to ensure that you don't consume too many resources.
I have a docker container running a bunch of python scripts. I am using HyperV as backend virtualization engine on Docker and running Docker for Windows.
The container builds just fine but when I start the container with:
docker run --memory 10240mb -it container_name
It runs the few initial operations from the file, prints out the results and then exits without an error. When I run:
docker logs --tail=50 container_id
I see just the print outs as when I ran docker run, funnily enough the moment it exists is pretty random operation wise (it might exit after first 2 ops or sometimes 1 op) but it usually ends the same time, as if there was a timer letting it run only for 5 minutes minutes for example. The script runs fine on a different machine running VirtualBox and Docker-Machine.
Right click on the docker icon in the system tray
Click on advanced
increase the memory settings to what you need, if you're not sure try setting it somewhere close to the middle depending on your system. You might go ahead and increase the CPU setting as well if you can.
Save your changes Docker will restart
Once that's done you should be able to run your app run it without the --memory option
I am trying to build a monitoring app that constantly gets a feed from docker stats API. I quickly noticed that whenever I try to run something like docker stats 857ff7a0403b from within python, it does not gather the std out and waits for ever. The example python code is below.
commands.getoutput('docker stats 857ff7a0403b')
While the above code works for running commands like docker ps and docker images but it does not work for docker stats.
Is there a way in python to quickly grab the results and terminate the utility so that it does not wait for ever.
There is a docker option called --no-stream that will only grab once and output to standard out.
docker stats --no-stream 857ff7a0403b
See https://docs.docker.com/engine/reference/commandline/stats/ for more details.
in python file/code write below command and run
import os
os.system("docker stats 857ff7a0403b")
in terminal when you write (docker stats container-id) hit enter then it will show you the stats of that specific container
in python OS library will help you to write the terminal commands in python as you are writing in terminal as above(it is for, if you want to access it through program)