I want to know how to control the number of processes and threads in OpenVINO.
I executed following command referring to document.
docker run --name [my_server_name] --network [my_network] -d -u $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 \
[my_repository_name] --model_path /models/model1 --model_name models --port 9000 --rest_port 8000 \
--plugin_config '{"CPU_THROUGHPUT_STREAMS": "2","CPU_BIND_THREAD": "NUMA","CPU_THREADS_NUM": "3"}' --shape "(1,3,704,576)"
Although I specified the '--plugin_config' no parameters are adopted and 1 process and 80 threads show up in the result of 'ps -efL' command.
Does anyone know the cause of this result?
Do not run Docker image in detached mode to make sure the server is running successfully.
If the server runs successfully, the parameters are adopted and shown in the serving info.
The plugin config parameters can define the number of threads used by the inference engine. Those are the threads dedicated to the inference execution. OpenVINO™ Model Server process will use more threads because some of them are related to gRPC and REST API servers.
They will use the number of threads dependent on the number of available CPU cores and for the number of REST server threads. They can be tuned with the parameter --rest_workers.
gRPC threads are not configurable now. They are set to the number of cpu cores, but it is possible to pass --grpc_channel_arguments to tune grpc server behavior.
Related
Setup
I have Docker installed and connected 9 machines, 1 manager and 8 worker nodes, using Docker swarm. This arrangement has been used in our development servers for ~5 years now.
I'm using this to launch a task queue that uses Celery for Python. Celery is using RabbitMQ as its broker and Redis for the results backend.
I have created an overlay network in Docker so that all my Celery workers launched by Docker swarm can reference their broker and results backend by name; i.e., rabbitmq or redis, instead of by IP address. The network was created by running the following command:
docker network create -d overlay <network_name>
The RabbitMQ service and Redis service were launched on the manager node under this overlay network using the following commands:
docker service create --network <my_overlay_network> --name redis --constraint "node.hostname==manager" redis
docker service create --network <my_overlay_network> --name rabbitmq --constraint "node.hostname==manager" rabbitmq
Once both of these have been launched, I deploy my Celery workers, one per each Docker swarm worker node, on the same overlay network using the following command:
docker service create --network <my_overlay_network> --name celery-worker --constraint "node.hostname!=manager" --replicas 8 --replicas-max-per-node 1 <my_celery_worker_image>
Before someone suggest it, yes I know I should be using a Docker compose file to launch all of this. I'm currently testing, and I'll write up one after I can get everything working.
The Problem
The Celery workers are configured to reference their broker and backend by the container name:
app = Celery('tasks', backend='redis://redis', broker='pyamqp://guest#rabbitmq//')
Once all the services have been launched and verified by Docker, 3 of the 8 start successfully, connect to the broker and backend, and allow me to begin running task on them. The other 5 continuously time out when attempting to connect to RabbitMQ and report the following message:
consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: timed out.
I'm at my wits end trying to find out why only 3 of my worker nodes allow the connection to occur while the other 5 cause a continuous timeout. All launched services are connected over the same overlay network.
The issue persist when I attempt to use brokers other than RabbitMQ, leading me to think that it's not specific to any one broker. I'd likely have issues connecting to any service by name on the overlay network when on the machines that are reporting the timeout. Stopping the service and launching again always produces the same results - the same 3 nodes work while the other 5 timeout.
All nodes are running the same version of Docker (19.03.4, build 9013bf583a), and the machines were created from identical images. They're virtually the same. The only difference among them is their hostnames, e.g., manager, worker1, worker2, and etc.
I have been able to replicate this setup outside of Docker swarm (all on one machine) by using a bridge network instead of overlay when developing my application on my personal computer without issue. I didn't experience problems until I launched everything on our development server, using the steps detailed above, to test it before pushing it to production.
Any ideas on why this is occurring and how I can remedy it? Switching form Docker swarm to Kubernetes isn't an option for me currently.
It's not the answer I wanted, but this appears to be an on-going bug in Docker swarm. For any who are interested, I'll include the issue page.
https://github.com/docker/swarmkit/issues/1429
There's a work around listed by one user on there that may wake for some, but your mileage may vary. It didn't work for me. The work around is listed in the bullet below:
Don't try to use docker for Windows to get multi-node mesh network (swarm) running. It's simply not (yet) supported. If you google around, you find some Microsoft blogs telling about it. Also the docker documentation mentions it somewhere. It would be nice, if docker cmd itself would print an error/warning when trying to set something up under Windows - which simply doesn't work. It does work on a single node though.
Don't try to use a Linux in a Virtualbox under Windows and hoping to workaround with it. It, of course, doesn't work since it has the same limitations as the underlying Windows.
Make sure you open ports at least 7946 tcp/udp and 4789 udp for worker nodes. For master also 2377 tcp. Use e.g. netcat -vz -u for udp check. Without -u for tcp.
Make sure to pass --advertise-addr on the docker worker node (!) when executing the join swarm command. Here put the external IP address of the worker node which has the mentioned ports open. Doublecheck that the ports are really open!
Using ping to check the DNS resolution for container names works. If you forget the --advertise-addr or not opening port 7946 results in DNS resolution not working on worker nodes!
I suggest attempting all of the above first if you encounter the same issue. To clarify a few things in the above bullet points, the --advertise-addr flag should be used on a worker node when joining it to the swarm. If your worker node doesn't have a static IP address, you can use the interface to connect it. Run ifconfig to view your interfaces. You'll need to use the interface that has your external facing IP address. For most people, this will probably be eth0, but you should still check before running the command. Doing this, the command you would issue on the worker is:
docker swarm join --advertise-addr eth0:2377 --token <your_token> <manager_ip>:2377
With 2377 being the port Docker uses. Verify that you joined with your correct IP address by going into your manager node and running the following:
docker node inspect <your_node_name>
If you don't know your node name, it should be the host name of the machine which you joined as a worker node. You can see it by running:
docker node ls
If you joined on the right interface, you will see this at the bottom of the return when running inspect:
{
"Status": "ready",
"Addr": <your_workers_external_ip_addr>
}
If you verified that everything has joined correctly, but the issue still persist, you can try launching your services with the additional flag of --dns-option use-vc when running Docker swarm create as such:
docker swarm create --dns-option use-vc --network <my_overlay> ...
Lastly, if all the above fails for you as it did for me, then you can expose the port of the running service you wish connect to in the swarm. For me, I wished to connect my services on my worker nodes to RabbitMQ and Redis on my manager node. I did so by exposing the services port. You can do this at creation by running:
docker swarm create -p <port>:<port> ...
Or after the services has been launched by running
docker service update --publish-add <port>:<port> <service_name>
After this, your worker node services can connect to the manager node service by the IP address of the worker node host and the port you exposed. For example, using RabbitMQ, this would be:
pyamqp://<user>:<pass>#<worker_host_ip_addr>:<port>/<vhost>
Hopefully this helps someone who stumbles on this post.
I am using Django with Nginx and want to serve multiple requests in parallel.
We have Docker configuration and one pod has 10 cores. I am trying to create multiple workers in uWSGI like (uwsgi --socket /tmp/main.sock --module main.wsgi --enable-threads --master --processes=10 --threads=1 --chmod-socket=666)
Request first lands to view and from there it calls service file which does heavy work.
Actually, I am using openCV library in service file which has loop over all pixels to remove colored ones(pretty time consuming..)
I also tried using multiple cores and 1 worker as
(uwsgi --socket /tmp/main.sock --module main.wsgi --enable-threads --master --processes=1 --threads=10 --chmod-socket=666).
But still performance did not improve. I think it is due to GIL which is getting acquired while doing heavy I/O operations, not sure how I can find a work around it. Or use all cores in some other efficient way? TIA!
I have built a base image from Dockerfile named centos+ssh. In centos+ssh's Dockerfile, I use CMD to run ssh service.
Then I want to build a image run other service named rabbitmq,the Dockerfile:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD /opt/mq/sbin/rabbitmq-server start
To start rabbitmq container,run:
docker run -d -p 222:22 -p 4149:4149 rabbitmq
but ssh service doesn't work, it sense rabbitmq's Dockerfile CMD override centos's CMD.
How does CMD work inside docker image?
If I want to run multiple service, how to? Using supervisor?
You are right, the second Dockerfile will overwrite the CMD command of the first one. Docker will always run a single command, not more. So at the end of your Dockerfile, you can specify one command to run. Not more.
But you can execute both commands in one line:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD service sshd start && /opt/mq/sbin/rabbitmq-server start
What you could also do to make your Dockerfile a little bit cleaner, you could put your CMD commands to an extra file:
FROM centos+ssh
EXPOSE 22
EXPOSE 4149
CMD sh /home/centos/all_your_commands.sh
And a file like this:
service sshd start &
/opt/mq/sbin/rabbitmq-server start
Even though CMD is written down in the Dockerfile, it really is runtime information. Just like EXPOSE, but contrary to e.g. RUN and ADD. By this, I mean that you can override it later, in an extending Dockerfile, or simple in your run command, which is what you are experiencing. At all times, there can be only one CMD.
If you want to run multiple services, I indeed would use supervisor. You can make a supervisor configuration file for each service, ADD these in a directory, and run the supervisor with supervisord -c /etc/supervisor to point to a supervisor configuration file which loads all your services and looks like
[supervisord]
nodaemon=true
[include]
files = /etc/supervisor/conf.d/*.conf
If you would like more details, I wrote a blog on this subject here: http://blog.trifork.com/2014/03/11/using-supervisor-with-docker-to-manage-processes-supporting-image-inheritance/
While I respect the answer from qkrijger explaining how you can work around this issue I think there is a lot more we can learn about what's going on here ...
To actually answer your question of "why" ... I think it would for helpful for you to understand how the docker stop command works and that all processes should be shutdown cleanly to prevent problems when you try to restart them (file corruption etc).
Problem: What if docker did start SSH from it's command and started RabbitMQ from your Docker file? "The docker stop command attempts to stop a running container first by sending a SIGTERM signal to the root process (PID 1) in the container." Which process is docker tracking as PID 1 that will get the SIGTERM? Will it be SSH or Rabbit?? "According to the Unix process model, the init process -- PID 1 -- inherits all orphaned child processes and must reap them. Most Docker containers do not have an init process that does this correctly, and as a result their containers become filled with zombie processes over time."
Answer: Docker simply takes that last CMD as the one that will get launched as the root process with PID 1 and get the SIGTERM from docker stop.
Suggested solution: You should use (or create) a base image specifically made for running more than one service, such as phusion/baseimage
It should be important to note that tini exists exactly for this reason, and as of Docker 1.13 and up, tini is officially part of Docker, which tells us that running more than one process in Docker IS VALID .. so even if someone claims to be more skilled regarding Docker, and insists that you absurd for thinking of doing this, know that you are not. There are perfectly valid situations for doing so.
Good to know:
https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
http://www.techbar.me/stopping-docker-containers-gracefully/
https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/
https://github.com/phusion/baseimage-docker#docker_single_process
The official docker answer to Run multiple services in a container.
It explains how you can do it with an init system (systemd, sysvinit, upstart) , a script (CMD ./my_wrapper_script.sh) or a supervisor like supervisord.
The && workaround can work only for services that starts in background (daemons) or that will execute quickly without interaction and release the prompt. Doing this with an interactive service (that keeps the prompt) and only the first service will start.
To address why CMD is designed to run only one service per container, let's just realize what would happen if the secondary servers run in the same container are not trivial / auxiliary but "major" (e.g. storage bundled with the frontend app). For starters, it would break down several important containerization features such as horizontal (auto-)scaling and rescheduling between nodes, both of which assume there is only one application (source of CPU load) per container. Then there is the issue of vulnerabilities - more servers exposed in a container means more frequent patching of CVEs...
So let's admit that it is a 'nudge' from Docker (and Kubernetes/Openshift) designers towards good practices and we should not reinvent workarounds (SSH is not necessary - we have docker exec / kubectl exec / oc rsh designed to replace it).
More info
https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container
Question
How do I specify the correct address of Dask workers on a remote resource to a Dask scheduler running locally?
Situation
I have a remote resource I can ssh into. There, I have a docker container that runs an image containing all the dependencies I need to run Dask, Distributed.
When run, the container executes the following:
dask-worker --nprocs 14 --nthreads 1 {inet_addr_local}:878
In the same network, but on my laptop, I run another container of the same image. In this container, I run the Dask scheduler, like so:
dask-scheduler --port 8786
When I start up the scheduler, everything is fine. When I start up the container of workers, it seems to connect to the scheduler. In the status I see the following:
Waiting to connect to: tcp://{this_matches_inet_address_of_local}:8786
On the scheduler, I see the following logged repeatedly, in a loop as it continually tries to contact/respond to each of the workers:
distributed.scheduler - INFO - Remove worker tcp://172.18.0.10:41508
distributed.scheduler - INFO - Removed worker tcp://172.18.0.10:41508
distributed.scheduler - ERROR - Failed to connect to worker 'tcp://172.18.0.10:44590': Timed out trying to connect to 'tcp://172.18.0.10:44590' after 3 s: OSError: [Errno 113] No route to host
The issue (I think) can be seen here. tcp://172.18.0.10 is incorrect. The workers on running on a resource db.foo.net that I can ssh into via me#db.foo.net.
From the scheduler container, I can see that I am able to ping db.foo.net successfully. I think that the workers are assuming their address is the local address for the container they are in, and not db.foo.net. I need to override this default as some sort of configuration for the workers. I thought --host tag would do it, but that causes Tornado to throw the following error: OSError: [Errno 99] Cannot assign requested address.
Dask workers need to be able to contact the scheduler with the address given to them. It sounds like this isn't happening for you. This could be for many reasons associated to your network. A couple of possibilities:
You've mis-typed the address (for example I noticed that you used port 878 in one place in your question and port 8786 in another)
Your network doesn't allow communication on certain ports (check with your system administrator)
Your docker containers aren't set up to publish ports externally (you may need to do some docker-wiring or use the host network explicitly)
Unfortunately there isn't much that Dask itself can do to help you identify these network issues. You might try running other services on the relevant ports and seeing if you can recreate the lack of connectivity with common tools like ping or python -m http.serve --port 8786
In an environment where docker containers are running inside other docker containers (by mounting the docker socket, not running as privileged), is there any way to manipulate the network to do things like:
Introduce latency
Drop % of packets
Bandwidth caps
I am only interested in docker-to-docker traffic from containers I am starting myself with docker-py (inside the environment). I do not care about manipulating other traffic such as docker to localhost or docker to internet. In many regards it would be ideal to only manipulate docker-docker network traffic.
There are a lot of ways you can do this even within a docker container when it is run under one of the following situations:
Privileged mode
Passing the --cap-add=NET_ADMIN flag runtime
A few utilities (iptables, tc, and all sorts of libraries implemented using them) allow this. But all require higher permissions than are available in my environment, since the "host" container is not started in privileged mode.
I cannot control the system configuration. I have to run these containers inside another container, not started in privileged mode. It would be straightforward if I could change this because I could just use any of the above listed utilities.
All the containers are attached to a network created simply by docker network create foobar.
My application, written in Python3.4, is using docker-py on OSX.
Is there any way to manipulate networking for docker to docker networking characteristics to introduce latency, packet drop, etc?
I know this is very late, but I found this why I was looking for an answer to my own question... (it does however need privileges, but thought it might be useful nonetheless)
You can introduce latency between containers using the tc command. For
example, if the ping time is 5ms then by running the command:
tc qdisc add dev eth0 root netem delay 1000ms
the ping will now be approx. 1005 ms.
To remove the delay run the command:
tc qdisc del dev eth0 root netem
It's possible to simulate the complete failure of the network using the iptables command, so the following command will block all traffic to the IP address 192.168.1.202:
iptables -A INPUT -s 192.168.1.202/255.255.255.255 -j DROP
and to unblock it again use:
iptables -D INPUT -s 192.168.1.202/255.255.255.255 -j DROP