I have a celery setup and running fine using rabbitmq as the broker. I also have CELERY_SEND_TASK_ERROR_EMAILS=True in my settings. I receive emails if there is an Exception
thrown while executing the tasks which is fine.
My question is is there a way either with celery or rabbitmq, to receive an error notification from either celery if the broker connection cannot be established or rabbitmq itself if the rabbitmq-server running dies.
I think the right tool for this job is a process control system like supervisord, which launches/watches processes and can trigger events when those processes die or restart. More specifically, using the plugin superlance, you can send an email when a process dies.
Related
I am having some docker container which listens on RabbitMQ and process the message received. I have a code pipeline which kicks off the rebuilding of the image and updating the tasks when there is a code commit.
My problem here is the container will be killed abruptly during the message processing is there any way where I can stop the container killing until the process is finished and allow it to stop so that a new new container will be automatically created as I am ok with the current container processing the message with the old code. My container is running python code inside.
ECS by default sends a SIGTERM:
StopTask
Stops a running task.
When StopTask is called on a task, the equivalent of docker stop is
issued to the containers running in the task. This results in a
SIGTERM and a default 30-second timeout, after which SIGKILL is sent
and the containers are forcibly stopped. If the container handles the
SIGTERM gracefully and exits within 30 seconds from receiving it, no
SIGKILL is sent.
Note
The default 30-second timeout can be configured on the Amazon ECS
container agent with the ECS_CONTAINER_STOP_TIMEOUT variable. For more
information, see Amazon ECS Container Agent Configuration in the
Amazon Elastic Container Service Developer Guide.
Knowing this, you can add a simple check in your app to catch the SIGTERM, and react appropriately.
Two ways to achieve a graceful stop in Docker:
Using docker stop
When issuing a docker stop command to a container, Docker fires a SIGTERM signal to the process inside the container, and waits for 10 seconds before cleaning up the container.
You can specify the timeout other than 10s:
docker stop --time 30 <CONTAINER>
You need to ensure that the process handles the SIGTERM signal properly. Otherwise it will be rudely killed by a SIGKILL signal.
Using docker kill
By default the docker kill command sends a SIGKILL signal to the process. But you can specify another signal to be used:
docker kill --signal SIGQUIT <CONTAINER>
Also ensure that the process handles the specified signal properly. Worse than docker stop, the docker kill command does not have a timeout behavior.
I would like to run APScheduler which is a part of WSGI (via Apache's modwsgi with 3 workers) webapp. I am new in WSGI world thus I would appreciate if you could resolve my doubts:
If APScheduler is a part of webapp - it becomes alive just after first request (first after start/reset Apache) which is run at least by one worker? Starting/resetting Apache won't start it - at least one request is needed.
What about concurrent requests - would every worker run same set of APScheduler's tasks or there will be only one set shared between all workers?
Would once running process (webapp run via worker) keep alive (so APScheduler's tasks will execute) or it could terminate after some idle time (as a consequence - APScheduler's tasks won't execute)?
Thank you!
You're right -- the scheduler won't start until the first request comes in.
Therefore running a scheduler in a WSGI worker is not a good idea. A better idea would be to run the scheduler in a separate process and connect to the scheduler when necessary via some RPC mechanism like RPyC or Execnet.
I use Celery and RabbintMQ for my project.
I have 3 servers (Main, A, B). A and B are calculating the tasks from Main server, then they post response to him.
This is an organizational question: where I need to install Celery and RabbitMQ?
As I think, RabbitMQ must be install on Main server (create rabbitmq user, etc.), Celery - on A and B servers. Or A and B also needs to install RabbitMQ?
Thanks!
There is no need to install RabbitMQ on all servers. Installing it in one server is sufficient. You just need to route tasks to A & B servers.
Also, remember AMQP is network protocol, the producers, consumers and the broker can all reside on same or different machines. Following are the possible arrangements for them.
Producer: A producer is a user application that sends messages.
Broker: A broker receives massages from producer and router them to consumer. A broker consists an exchange and one or more queues.
Consumer: A consumer is an application that receives messages and process them.
We're using celery eta tasks to schedule tasks FAR (like months) in the future.
Now using the rabbitMQ backend because the mongo backend did loose such tasks on a worker restart.
Actually tasks with the rabbitMQ backend seem to be persistent across celery and rabbitMQ restarts, BUT revoke messages seem to be lost on rabbitMQ restarts.
I guess that if revoke messages are lost, those eta tasks that should be killed will execute anyway.
This may be helpful from the documentation (Persistent Revokes):
The list of revoked tasks is in-memory so if all workers restart the
list of revoked ids will also vanish. If you want to preserve this
list between restarts you need to specify a file for these to be
stored in by using the –statedb argument to celery worker:
$ celery -A proj worker -l info --statedb=/var/run/celery/worker.state
I've been experimenting with several WSGI servers and am unable to find a way for them to gracefully shut down. What I mean by graceful is that the server stops listen()'ing for new requests, but finishes processing all connections that have been accept()'ed. The server process then exits.
So far I have spent some time with FAPWS, Cherrypy, Tornado, and wsgiref. It seems like no matter what I do, some of the clients receive a "Connection reset by peer".
Can someone direct me to a WSGI server that handles this properly? Or know of a way to configure one of these servers to doing a clean shutdown? I think my next step is to mock up a simple http server that does what I want.
HTTPd has the graceful-stop predicate for -k that will allow it to bring down any workers after they have completed their request. mod_wsgi is required to make it a WSGI container.