Im using an sql server and rabbitmq as a result backend/broker for celery workers.Everything works fine but for future purposes we plan to use several remote workers on diferent machines that need to monitor this broker/backend.The problem is that you need to provide direct access to your broker and database url , thing that open many security risks.Is there a way to provide remote celery worker the remote broker/database via ssh?
It seems like ssh port forwarding is working but still i have some reservations.
My plan works as follows:
port forward both remote database and broker on local ports(auto
ssh) in remote celery workers machine.
now celery workers consuming the tasks and writing to remote database from local ports port forwaded.
Is this implementations bad as noone seems to use remote celery workers like this.
Any different answer will be appreciated.
Related
Setup
I have Docker installed and connected 9 machines, 1 manager and 8 worker nodes, using Docker swarm. This arrangement has been used in our development servers for ~5 years now.
I'm using this to launch a task queue that uses Celery for Python. Celery is using RabbitMQ as its broker and Redis for the results backend.
I have created an overlay network in Docker so that all my Celery workers launched by Docker swarm can reference their broker and results backend by name; i.e., rabbitmq or redis, instead of by IP address. The network was created by running the following command:
docker network create -d overlay <network_name>
The RabbitMQ service and Redis service were launched on the manager node under this overlay network using the following commands:
docker service create --network <my_overlay_network> --name redis --constraint "node.hostname==manager" redis
docker service create --network <my_overlay_network> --name rabbitmq --constraint "node.hostname==manager" rabbitmq
Once both of these have been launched, I deploy my Celery workers, one per each Docker swarm worker node, on the same overlay network using the following command:
docker service create --network <my_overlay_network> --name celery-worker --constraint "node.hostname!=manager" --replicas 8 --replicas-max-per-node 1 <my_celery_worker_image>
Before someone suggest it, yes I know I should be using a Docker compose file to launch all of this. I'm currently testing, and I'll write up one after I can get everything working.
The Problem
The Celery workers are configured to reference their broker and backend by the container name:
app = Celery('tasks', backend='redis://redis', broker='pyamqp://guest#rabbitmq//')
Once all the services have been launched and verified by Docker, 3 of the 8 start successfully, connect to the broker and backend, and allow me to begin running task on them. The other 5 continuously time out when attempting to connect to RabbitMQ and report the following message:
consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: timed out.
I'm at my wits end trying to find out why only 3 of my worker nodes allow the connection to occur while the other 5 cause a continuous timeout. All launched services are connected over the same overlay network.
The issue persist when I attempt to use brokers other than RabbitMQ, leading me to think that it's not specific to any one broker. I'd likely have issues connecting to any service by name on the overlay network when on the machines that are reporting the timeout. Stopping the service and launching again always produces the same results - the same 3 nodes work while the other 5 timeout.
All nodes are running the same version of Docker (19.03.4, build 9013bf583a), and the machines were created from identical images. They're virtually the same. The only difference among them is their hostnames, e.g., manager, worker1, worker2, and etc.
I have been able to replicate this setup outside of Docker swarm (all on one machine) by using a bridge network instead of overlay when developing my application on my personal computer without issue. I didn't experience problems until I launched everything on our development server, using the steps detailed above, to test it before pushing it to production.
Any ideas on why this is occurring and how I can remedy it? Switching form Docker swarm to Kubernetes isn't an option for me currently.
It's not the answer I wanted, but this appears to be an on-going bug in Docker swarm. For any who are interested, I'll include the issue page.
https://github.com/docker/swarmkit/issues/1429
There's a work around listed by one user on there that may wake for some, but your mileage may vary. It didn't work for me. The work around is listed in the bullet below:
Don't try to use docker for Windows to get multi-node mesh network (swarm) running. It's simply not (yet) supported. If you google around, you find some Microsoft blogs telling about it. Also the docker documentation mentions it somewhere. It would be nice, if docker cmd itself would print an error/warning when trying to set something up under Windows - which simply doesn't work. It does work on a single node though.
Don't try to use a Linux in a Virtualbox under Windows and hoping to workaround with it. It, of course, doesn't work since it has the same limitations as the underlying Windows.
Make sure you open ports at least 7946 tcp/udp and 4789 udp for worker nodes. For master also 2377 tcp. Use e.g. netcat -vz -u for udp check. Without -u for tcp.
Make sure to pass --advertise-addr on the docker worker node (!) when executing the join swarm command. Here put the external IP address of the worker node which has the mentioned ports open. Doublecheck that the ports are really open!
Using ping to check the DNS resolution for container names works. If you forget the --advertise-addr or not opening port 7946 results in DNS resolution not working on worker nodes!
I suggest attempting all of the above first if you encounter the same issue. To clarify a few things in the above bullet points, the --advertise-addr flag should be used on a worker node when joining it to the swarm. If your worker node doesn't have a static IP address, you can use the interface to connect it. Run ifconfig to view your interfaces. You'll need to use the interface that has your external facing IP address. For most people, this will probably be eth0, but you should still check before running the command. Doing this, the command you would issue on the worker is:
docker swarm join --advertise-addr eth0:2377 --token <your_token> <manager_ip>:2377
With 2377 being the port Docker uses. Verify that you joined with your correct IP address by going into your manager node and running the following:
docker node inspect <your_node_name>
If you don't know your node name, it should be the host name of the machine which you joined as a worker node. You can see it by running:
docker node ls
If you joined on the right interface, you will see this at the bottom of the return when running inspect:
{
"Status": "ready",
"Addr": <your_workers_external_ip_addr>
}
If you verified that everything has joined correctly, but the issue still persist, you can try launching your services with the additional flag of --dns-option use-vc when running Docker swarm create as such:
docker swarm create --dns-option use-vc --network <my_overlay> ...
Lastly, if all the above fails for you as it did for me, then you can expose the port of the running service you wish connect to in the swarm. For me, I wished to connect my services on my worker nodes to RabbitMQ and Redis on my manager node. I did so by exposing the services port. You can do this at creation by running:
docker swarm create -p <port>:<port> ...
Or after the services has been launched by running
docker service update --publish-add <port>:<port> <service_name>
After this, your worker node services can connect to the manager node service by the IP address of the worker node host and the port you exposed. For example, using RabbitMQ, this would be:
pyamqp://<user>:<pass>#<worker_host_ip_addr>:<port>/<vhost>
Hopefully this helps someone who stumbles on this post.
I have a python application hosted by a node.js frontend. I am running that on a linux vm on Google Cloud virtual machine (GCP).
node appname runserver 8080 command starts local server within VM but I am wondering what would be step by step process to access it via a DNS from outside world.
Or if there is better approach to host python ML applications behind a web interface, then please suggest.
You need to use forever for this.
Forever will move the node process to the background and service will keep running in the background even if you log out of the server. And In order to access from outside point a DNS domain to this IP address of the machine and then Proxy Pass the request on port 80 to the port your service is running on.
Then you will be able to access it via domain name.
Look for ProxyPass directive in the Http server. That would work for you. :D
If using Luigi in a server I am connected to with ssh, is it possible to see the progress of tasks (as I can if I use luigi locally by looking up "localhost" in browser)?
Any help appreciated
Short answer: yes
When you run luigid "locally", there is a server that starts on your system, which you, as you mentioned, you can access at http://localhost:8082 (or whatever port you specify). To make that work on a remote server, all you need to do is run luigid in said server, then point your browser at, instead of http://localhost:8082, http://:8082 (or whatever port you configure luigid to listen on.
Question
How do I specify the correct address of Dask workers on a remote resource to a Dask scheduler running locally?
Situation
I have a remote resource I can ssh into. There, I have a docker container that runs an image containing all the dependencies I need to run Dask, Distributed.
When run, the container executes the following:
dask-worker --nprocs 14 --nthreads 1 {inet_addr_local}:878
In the same network, but on my laptop, I run another container of the same image. In this container, I run the Dask scheduler, like so:
dask-scheduler --port 8786
When I start up the scheduler, everything is fine. When I start up the container of workers, it seems to connect to the scheduler. In the status I see the following:
Waiting to connect to: tcp://{this_matches_inet_address_of_local}:8786
On the scheduler, I see the following logged repeatedly, in a loop as it continually tries to contact/respond to each of the workers:
distributed.scheduler - INFO - Remove worker tcp://172.18.0.10:41508
distributed.scheduler - INFO - Removed worker tcp://172.18.0.10:41508
distributed.scheduler - ERROR - Failed to connect to worker 'tcp://172.18.0.10:44590': Timed out trying to connect to 'tcp://172.18.0.10:44590' after 3 s: OSError: [Errno 113] No route to host
The issue (I think) can be seen here. tcp://172.18.0.10 is incorrect. The workers on running on a resource db.foo.net that I can ssh into via me#db.foo.net.
From the scheduler container, I can see that I am able to ping db.foo.net successfully. I think that the workers are assuming their address is the local address for the container they are in, and not db.foo.net. I need to override this default as some sort of configuration for the workers. I thought --host tag would do it, but that causes Tornado to throw the following error: OSError: [Errno 99] Cannot assign requested address.
Dask workers need to be able to contact the scheduler with the address given to them. It sounds like this isn't happening for you. This could be for many reasons associated to your network. A couple of possibilities:
You've mis-typed the address (for example I noticed that you used port 878 in one place in your question and port 8786 in another)
Your network doesn't allow communication on certain ports (check with your system administrator)
Your docker containers aren't set up to publish ports externally (you may need to do some docker-wiring or use the host network explicitly)
Unfortunately there isn't much that Dask itself can do to help you identify these network issues. You might try running other services on the relevant ports and seeing if you can recreate the lack of connectivity with common tools like ping or python -m http.serve --port 8786
I'm new to Python and Fabric, and I've modified a script that pings hosts on our LAN (to determine what machines are alive, we have a lot) to log into the hosts and list running processes back to the client. Whilst this works on servers, it seems there's other devices in the subnets that don't permit SSH logins and the connection is refused, causing Fabric to exit with a fatal error. Is there any way to make Fabric skip any host that refuses a connection?
Using
with settings(warn_only=True)
doesn't seem to help.
Thanks.
You can set this env var or also use this flag. Searching the docs, if you can't find it in a heading, is best.