How to run RabbitMQ and Celery Service in Docker? - python

I am having issues while running my Python Flask application from Docker pull (remote pull).
In my app I had used RabbitMQ as message broker, and Celery as task scheduler. It is working as expected when running locally, But when I put my application on Docker, and Docker pull it from remote system, it runs fine, but Celery and RabbitMQ are not running with it, so all tasks (with method.delay()) are running infinitely and http request is not being processed.
I need help in putting my Python Flask application to Docker, as my application has asynchronous tasks to be processed with Celery. I am not aware about how to modify docker-compose.yml for including Celery service.
Thanks is advance.

I think you need to link celery container with rabbitmq.
From https://docs.docker.com/compose/compose-file/#links
Link to containers in another service. Either specify both the service name and a link alias (SERVICE:ALIAS), or just the service name.
links:
- rabbitmq
Or
- rabbitmq:rabbitmq

Related

Docker Swarm Failing to Resolve DNS by Service Name With Python Celery Workers Connecting to RabbitMQ Broker Resulting in Connection Timeout

Setup
I have Docker installed and connected 9 machines, 1 manager and 8 worker nodes, using Docker swarm. This arrangement has been used in our development servers for ~5 years now.
I'm using this to launch a task queue that uses Celery for Python. Celery is using RabbitMQ as its broker and Redis for the results backend.
I have created an overlay network in Docker so that all my Celery workers launched by Docker swarm can reference their broker and results backend by name; i.e., rabbitmq or redis, instead of by IP address. The network was created by running the following command:
docker network create -d overlay <network_name>
The RabbitMQ service and Redis service were launched on the manager node under this overlay network using the following commands:
docker service create --network <my_overlay_network> --name redis --constraint "node.hostname==manager" redis
docker service create --network <my_overlay_network> --name rabbitmq --constraint "node.hostname==manager" rabbitmq
Once both of these have been launched, I deploy my Celery workers, one per each Docker swarm worker node, on the same overlay network using the following command:
docker service create --network <my_overlay_network> --name celery-worker --constraint "node.hostname!=manager" --replicas 8 --replicas-max-per-node 1 <my_celery_worker_image>
Before someone suggest it, yes I know I should be using a Docker compose file to launch all of this. I'm currently testing, and I'll write up one after I can get everything working.
The Problem
The Celery workers are configured to reference their broker and backend by the container name:
app = Celery('tasks', backend='redis://redis', broker='pyamqp://guest#rabbitmq//')
Once all the services have been launched and verified by Docker, 3 of the 8 start successfully, connect to the broker and backend, and allow me to begin running task on them. The other 5 continuously time out when attempting to connect to RabbitMQ and report the following message:
consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: timed out.
I'm at my wits end trying to find out why only 3 of my worker nodes allow the connection to occur while the other 5 cause a continuous timeout. All launched services are connected over the same overlay network.
The issue persist when I attempt to use brokers other than RabbitMQ, leading me to think that it's not specific to any one broker. I'd likely have issues connecting to any service by name on the overlay network when on the machines that are reporting the timeout. Stopping the service and launching again always produces the same results - the same 3 nodes work while the other 5 timeout.
All nodes are running the same version of Docker (19.03.4, build 9013bf583a), and the machines were created from identical images. They're virtually the same. The only difference among them is their hostnames, e.g., manager, worker1, worker2, and etc.
I have been able to replicate this setup outside of Docker swarm (all on one machine) by using a bridge network instead of overlay when developing my application on my personal computer without issue. I didn't experience problems until I launched everything on our development server, using the steps detailed above, to test it before pushing it to production.
Any ideas on why this is occurring and how I can remedy it? Switching form Docker swarm to Kubernetes isn't an option for me currently.
It's not the answer I wanted, but this appears to be an on-going bug in Docker swarm. For any who are interested, I'll include the issue page.
https://github.com/docker/swarmkit/issues/1429
There's a work around listed by one user on there that may wake for some, but your mileage may vary. It didn't work for me. The work around is listed in the bullet below:
Don't try to use docker for Windows to get multi-node mesh network (swarm) running. It's simply not (yet) supported. If you google around, you find some Microsoft blogs telling about it. Also the docker documentation mentions it somewhere. It would be nice, if docker cmd itself would print an error/warning when trying to set something up under Windows - which simply doesn't work. It does work on a single node though.
Don't try to use a Linux in a Virtualbox under Windows and hoping to workaround with it. It, of course, doesn't work since it has the same limitations as the underlying Windows.
Make sure you open ports at least 7946 tcp/udp and 4789 udp for worker nodes. For master also 2377 tcp. Use e.g. netcat -vz -u for udp check. Without -u for tcp.
Make sure to pass --advertise-addr on the docker worker node (!) when executing the join swarm command. Here put the external IP address of the worker node which has the mentioned ports open. Doublecheck that the ports are really open!
Using ping to check the DNS resolution for container names works. If you forget the --advertise-addr or not opening port 7946 results in DNS resolution not working on worker nodes!
I suggest attempting all of the above first if you encounter the same issue. To clarify a few things in the above bullet points, the --advertise-addr flag should be used on a worker node when joining it to the swarm. If your worker node doesn't have a static IP address, you can use the interface to connect it. Run ifconfig to view your interfaces. You'll need to use the interface that has your external facing IP address. For most people, this will probably be eth0, but you should still check before running the command. Doing this, the command you would issue on the worker is:
docker swarm join --advertise-addr eth0:2377 --token <your_token> <manager_ip>:2377
With 2377 being the port Docker uses. Verify that you joined with your correct IP address by going into your manager node and running the following:
docker node inspect <your_node_name>
If you don't know your node name, it should be the host name of the machine which you joined as a worker node. You can see it by running:
docker node ls
If you joined on the right interface, you will see this at the bottom of the return when running inspect:
{
"Status": "ready",
"Addr": <your_workers_external_ip_addr>
}
If you verified that everything has joined correctly, but the issue still persist, you can try launching your services with the additional flag of --dns-option use-vc when running Docker swarm create as such:
docker swarm create --dns-option use-vc --network <my_overlay> ...
Lastly, if all the above fails for you as it did for me, then you can expose the port of the running service you wish connect to in the swarm. For me, I wished to connect my services on my worker nodes to RabbitMQ and Redis on my manager node. I did so by exposing the services port. You can do this at creation by running:
docker swarm create -p <port>:<port> ...
Or after the services has been launched by running
docker service update --publish-add <port>:<port> <service_name>
After this, your worker node services can connect to the manager node service by the IP address of the worker node host and the port you exposed. For example, using RabbitMQ, this would be:
pyamqp://<user>:<pass>#<worker_host_ip_addr>:<port>/<vhost>
Hopefully this helps someone who stumbles on this post.

How can I debug `Error while fetching server API version` when running docker?

Context
I am running Apache Airflow, and trying to run a sample Docker container using Airflow's DockerOperator. I am testing using docker-compose and deploying to Kubernetes (EKS). Whenever I run my task, I am receiving the Error: ERROR - Error while fetching server API version. The erros happens both on docker-compose as well as EKS (kubernetes).
I guess your Airflow Docker container is trying to launch a worker on the same Docker machine where it is running. To do so, you need to give Airflow's container special permissions and, as you said, acces to the Docker socket. This is called Docker In Docker (DIND). There are more than one way to do it. In this tutorial there are 3 different ways explained. It also depends on where those containers are run: Kubernetes, Docker machines, external services (like GitLab or GitHub), etc.

Python celery backend/broker access via ssh

Im using an sql server and rabbitmq as a result backend/broker for celery workers.Everything works fine but for future purposes we plan to use several remote workers on diferent machines that need to monitor this broker/backend.The problem is that you need to provide direct access to your broker and database url , thing that open many security risks.Is there a way to provide remote celery worker the remote broker/database via ssh?
It seems like ssh port forwarding is working but still i have some reservations.
My plan works as follows:
port forward both remote database and broker on local ports(auto
ssh) in remote celery workers machine.
now celery workers consuming the tasks and writing to remote database from local ports port forwaded.
Is this implementations bad as noone seems to use remote celery workers like this.
Any different answer will be appreciated.

How do I tell Jenkins to run a Python Flask application, then Selenium tests, then close the Flask application?

I have a Flask application that spins up my web application to a specific port (e.g., 8080). It launches by running python run.py. I also have a Python file that runs Selenium end-to-end tests for the web app.
This is part of a GitHub repo, and I have it hooked up so that whenever I push/commit to the GitHub repo, it initiates a Jenkins test. Jenkins is using my machine as the agent to run the commands. I have a series of commands in a Pipeline on Jenkins to execute.
Ideally, the order of commands should be:
Spin up the Flask web application
Run the Selenium tests
Terminate the Flask web application
Now, my issue is this: I can use Jenkins to clone the repo to the local workspace on my computer (the agent), and then run the Python command to spin up my web app. I can use Jenkins to run a task in parallel to do the Selenium tests on the port used by the web app.
BUT I cannot figure out how to end the Flask web app once the Selenium tests are done. If I don't end the run.py execution, it is stuck on an infinite loop, endlessly serving up the application to the port. I can have Jenkins automatically abort after some specified amount of time, but that flags the entire process as a failure.
Does anyone have a good idea of how to end the Flask hosting process once the Selenium tests are done? Or, if I am doing this in a block-headed way (I admit I am speaking out of copious amounts of ignorance and inexperience), is there a better way I should be handling this?
I'm not aware of an official way to cause the development server to exit under Selenium control, but the following route works with Flask 1.1.2
from flask import request
...
if settings.DEBUG:
#app.route('/quit')
def quit():
shutdown_hook = request.environ.get('werkzeug.server.shutdown')
if shutdown_hook is not None:
shutdown_hook()
return "Bye"
return "No shutdown hook"

How to persist UWSGI server on AWS after console closed

I'm a newcomer to web applications and AWS, so please forgive me if this is the answer is bit trivial!
I'm hosting a python web application on a AWS EC2 server using nginx + uWSGI. This is all working perfectly, except when I terminate my connection (using putty), my uWSGI application stops running, producing a "502 Bad Gateway" error from nginx.
I'm aware of adding the "&" to the uwsgi start up command (below), but that does not work after I close out my connection.
uwsgi --socket 127.0.0.1:8000 -master -s /tmp/uwsgi.sock --chmod-socket=666 -w wsgi2 &
How do I persist my uWSGI application to continue hosting my web app after I log out/terminate my connection?
Thanks in advance!
You'll typically want to start uwsgi from an init script. There are several ways to do this and the exact procedure depends on the Linux distribution you're using:
SystemV init script
upstart script (Ubuntu)
supervisor (Python-based process manager)
For the purposes of my use case, I will not have to reboot my machine in the near future, so Linux's nohup (no hangup) command works perfectly for this. It's a quick and dirty hack that's super powerful.

Categories

Resources