I'm a bit new to redis and celery. Do I need to restart celeryd and redis every time I restart apache? I'm using celery and redis with a django project hosted on webfaction.
Thanks for the info in advance.
Provided you're running Daemon processes of Redis and Celery you do not need to restart them when you restart Apache.
Generally, you will need to restart them when you make configuration changes to either Redis or Celery as the applications are dependent on eachother.
Related
I have a Django app running on Elastic Beanstalk in a Multicontainer Docker platform. So each EC2 instance has Docker containers for Django, Celery, RabbitMQ and Nginx.
I have concerns regarding celery tasks when an EC2 instance is removed due to an auto-scale event or an immutable deployment.
Will current tasks in the celery queue be lost when an instance is
removed?
Can a running celery task be interrupted on an instance removal?
Celery beat schedule (cron) would be called from every new instance
launched, causing duplicated calls.
I'm curious if anyone else has experience solving the above? Here's a list of some of the solutions I'm thinking about:
Change the celery broker to a remote ElastiCache Redis instance. Not
sure that would work tho.
Use another library to replace Celery which can store the tasks in
the DB (e.g. huey or apscheduler).
Migrate from celery to AWS SQS + Elastic Beanstalk Worker. That
would mean duplicating the same codebase to be deployed to both the
current web as well as a worker Elastic Beanstalk.
Any other ideas or concerns?
Do you need separate Celery/Rabbit instance for every EC2 instance? Removing the rabbit instance will kill the celery unless you store it externally
(1) and (2) can be solved with elasticache (because redis is a database) and task_acks_late + prefetch = 1 (this is what we do when we use autoscaling + celery). singleton celery beat is generally a harder problem to solve but there are packages that already provide this functionality (e.g., https://pypi.org/project/celery-redbeat/).
I have a follow-on / clarification question related to an older question
I have 2 servers (for now). 1 server runs a django web application. The other server runs pure python scripts that are CRON-scheduled data acquisition & processing jobs for the web app.
There is a use case where user activity in the web application (updating a certain field) should trigger a series of actions by the backend server. I could stick with CRON but as we scale up, I can imagine running into trouble. Celery seems like a good solution except I'm unclear how to implement it. (Yes, I did read the getting started guide).
I want the web application to send tasks to a specific queue but the backend server to actually execute the work.
Assuming that both servers are using the same broker URL,
Do I need to define stub tasks in Djando or can I just use the celery.send_task method?
Should I still be using django-celery?
Meanwhile the backend server will be running Celery with the full implementation of the tasks and workers?
I decided to try it and work through any issues that came up.
On my django server, I did not use django-celery. I installed celery and redis (via pip) and followed most of the instructions in the First Steps with Django:
updated proj/proj/settings.py file to include the bare minimum of
configuration for Celery such as the BROKER_URL
created the proj/proj/celery.py file but without the task defined
at the bottom
updated the proj/proj/__init__.py file as documented
Since the server running django wasn't actually going to execute any
Celery tasks, in the view that would trigger a task, I added the
following:
from proj.celery import app as celery_app
try:
# send it to celery for backend processing
celery_app.send_task('tasks.mytask', kwargs={'some_id':obj.id,'another_att':obj.att}, queue='my-queue')
except Exception as err:
print('Issue sending task to Celery')
print err
The other server had the following installed: celery and redis (I used an AWS Elasticache redis instance for this testing).
This server had the following files:
celeryconfig.py will all of my Celery configuration and queues
defined, pointing to the same BROKER_URL as the django server
tasks.py with the actual code for all of my tasks
The celery workers were then started on this server, using the standard command: celery -A tasks worker -Q my-queue1,my-queue2
For testing, the above worked. Now I just need to make celery run in the background and optimize the number of workers/queue.
If anyone has additional comments or improvements, I'd love to hear them!
I decided I need to use an asynchronous queue system. And am setting up Redis/RQ/django-rq. I am wondering how I can start workers in my project.
django-rq provides a management command which is great, it looks like:
python manage.py rqworker high default low
But is it possible to start the worker when you start the django instance? Just wondering or is it something I will always have to start manually?
Thanks.
Django operates inside reques-response cycle, and it starts by request. So it is bad idea to attach such command to Django startup.
Instead of that, I would recommend you to look at supervisord - a process manager, that can automate services launch at system start and other things.
When I host Django project in Heroku. Heroku provide a Procfile, you can specify what to start with project.
It is my Procfile:
web: gunicorn RestApi.wsgi
worker: python manage.py rqworker default
So I have a Django app that occasionally sends a task to Celery for asynchronous execution. I've found that as I work on my code in development, the Django development server knows how to automatically detect when code has changed and then restart the server so I can see my changes. However, the RabbitMQ/Celery section of my app doesn't pick up on these sorts of changes in development. If I change code that will later be run in a Celery task, Celery will still keep running the old version of the code. The only way I can get it to pick up on the change is to:
stop the Celery worker
stop RabbitMQ
reset RabbitMQ
start RabbitMQ
add the user to RabbitMQ that my Django app is configured to use
set appropriate permissions for this user
restart the Celery worker
This seems like a far more drastic approach than I should have to take, however. Is there a more lightweight approach I can use?
I've found that as I work on my code in development, the Django
development server knows how to automatically detect when code has
changed and then restart the server so I can see my changes. However,
the RabbitMQ/Celery section of my app doesn't pick up on these sorts
of changes in development.
What you've described here is exactly correct and expected. Keep in mind that Python will use a module cache, so you WILL need to restart the Python interpreter before you can use the new code.
The question is "Why doesn't Celery pick up the new version", but this is how most libraries will work. The Django development server, however, is an exception. It has special code that helps it automatically reload Python code as necessary. It basically restarts the web server without you needing to restart the web server.
Note that when you run Django in production, you probably WILL have to restart/reload your server (since you won't be using the development server in production, and most production servers don't try to take on the hassle of implementing a problematic feature of detecting file changes and auto-reloading the server).
Finally, you shouldn't need to restart RabbitMQ. You should only have to restart the Celery worker to use the new version of the Python code. You might have to clear the queue if the new version of the code is changing the data in the message, however. For example, the Celery worker might be receiving version 1 of the message when it is expecting to receive version 2.
I am considering using celery in my project. I found a lot of information about how to use it etc. What I am interested in is how to deploy/package my solution.
I need to run two components - django app and then celeryd worker (component that sends emails). For example I would like my django app to use email_ticket task that would email support tickets. I create tasks.py in the django app.
#task
def email_ticket(from, message):
...
Do I deploy my django app and then just run celeryd as separate process from the same path?
./manage.py celeryd ...
What about workers on different servers? Deploy whole django application and run only celeryd? I understand I could use celery only for the worker, but I would like to use celerycam and celerybeat.
Any feedback is appreciated. Thanks
Thanks for any feedback.
This is covered in the documentation here. The gist is you need to download some init scripts and setup some config. Once that's done celeryd will start on boot and you'll be off and running.