I have a follow-on / clarification question related to an older question
I have 2 servers (for now). 1 server runs a django web application. The other server runs pure python scripts that are CRON-scheduled data acquisition & processing jobs for the web app.
There is a use case where user activity in the web application (updating a certain field) should trigger a series of actions by the backend server. I could stick with CRON but as we scale up, I can imagine running into trouble. Celery seems like a good solution except I'm unclear how to implement it. (Yes, I did read the getting started guide).
I want the web application to send tasks to a specific queue but the backend server to actually execute the work.
Assuming that both servers are using the same broker URL,
Do I need to define stub tasks in Djando or can I just use the celery.send_task method?
Should I still be using django-celery?
Meanwhile the backend server will be running Celery with the full implementation of the tasks and workers?
I decided to try it and work through any issues that came up.
On my django server, I did not use django-celery. I installed celery and redis (via pip) and followed most of the instructions in the First Steps with Django:
updated proj/proj/settings.py file to include the bare minimum of
configuration for Celery such as the BROKER_URL
created the proj/proj/celery.py file but without the task defined
at the bottom
updated the proj/proj/__init__.py file as documented
Since the server running django wasn't actually going to execute any
Celery tasks, in the view that would trigger a task, I added the
following:
from proj.celery import app as celery_app
try:
# send it to celery for backend processing
celery_app.send_task('tasks.mytask', kwargs={'some_id':obj.id,'another_att':obj.att}, queue='my-queue')
except Exception as err:
print('Issue sending task to Celery')
print err
The other server had the following installed: celery and redis (I used an AWS Elasticache redis instance for this testing).
This server had the following files:
celeryconfig.py will all of my Celery configuration and queues
defined, pointing to the same BROKER_URL as the django server
tasks.py with the actual code for all of my tasks
The celery workers were then started on this server, using the standard command: celery -A tasks worker -Q my-queue1,my-queue2
For testing, the above worked. Now I just need to make celery run in the background and optimize the number of workers/queue.
If anyone has additional comments or improvements, I'd love to hear them!
Related
I am hoping to gain a basic understanding of scheduled task processes and why things like Celery are recommended for Flask.
My situation is a web-based tool which generates spreadsheets based on user input. I save those spreadsheets to a temp directory, and when the user clicks the "download" button, I use Flask's "send_from_directory" function to serve the file as an attachment. I need a background service to run every 15 minutes or so to clear the temp directory of all files older than 15 minutes.
My initial plan was a basic python script running in a while(True) loop, but I did some research to find what people normally do, and everything recommends Celery or other task managers. I looked into Celery and found that I also need to learn about redis, and I need to apparently host redis in a unix environment. This is a lot of trouble for a script that just deletes files every 15 minutes.
I'm developing my Flask app locally in Windows with the built-in development server and deploying to a virtual machine on company intranet with IIS. I'm learning as I go, so please explain why this much machinery is needed to regularly call a script that simply deletes things. It seems like a vast overcomplication, but as I said, I'm trying to learn as I go so I want to do/learn it correctly.
Thanks!
You wouldn't use Celery or redis for this. A cron job would be perfectly appropriate.
Celery is for jobs that need to be run asynchronously but in response to events in the main server processes. For example, if a sign up form requires sending an email notification, that would be scheduled and run via Celery so as not to block the main web response.
I am implementing an online server using:
Flask
NGINX
Celery
Celery uses:
RabbitMQ as a broker
Redis as a Result backend.
I would like to know if it is possible to use Redis as a cache to avoid doing big calculations if I receive the same request. For example, I want to answer a cache result if I receive a POST containing the same body.
If it is possible, do I have to configure it in Celery or in Redis? And how should I do it?
There are many existing extensions in the flask eco-system that let you do this easily. Including Flask-Redis
So I have a Django app that occasionally sends a task to Celery for asynchronous execution. I've found that as I work on my code in development, the Django development server knows how to automatically detect when code has changed and then restart the server so I can see my changes. However, the RabbitMQ/Celery section of my app doesn't pick up on these sorts of changes in development. If I change code that will later be run in a Celery task, Celery will still keep running the old version of the code. The only way I can get it to pick up on the change is to:
stop the Celery worker
stop RabbitMQ
reset RabbitMQ
start RabbitMQ
add the user to RabbitMQ that my Django app is configured to use
set appropriate permissions for this user
restart the Celery worker
This seems like a far more drastic approach than I should have to take, however. Is there a more lightweight approach I can use?
I've found that as I work on my code in development, the Django
development server knows how to automatically detect when code has
changed and then restart the server so I can see my changes. However,
the RabbitMQ/Celery section of my app doesn't pick up on these sorts
of changes in development.
What you've described here is exactly correct and expected. Keep in mind that Python will use a module cache, so you WILL need to restart the Python interpreter before you can use the new code.
The question is "Why doesn't Celery pick up the new version", but this is how most libraries will work. The Django development server, however, is an exception. It has special code that helps it automatically reload Python code as necessary. It basically restarts the web server without you needing to restart the web server.
Note that when you run Django in production, you probably WILL have to restart/reload your server (since you won't be using the development server in production, and most production servers don't try to take on the hassle of implementing a problematic feature of detecting file changes and auto-reloading the server).
Finally, you shouldn't need to restart RabbitMQ. You should only have to restart the Celery worker to use the new version of the Python code. You might have to clear the queue if the new version of the code is changing the data in the message, however. For example, the Celery worker might be receiving version 1 of the message when it is expecting to receive version 2.
I have a Django application which uses django-celery, celery and rabbitmq for offline, distributed processing.
Now the setup is such that I need to run the celery tasks (and in turn celery workers) in other nodes in the network (different from where the Django web app is hosted).
To do that, as I understand I will need to place all my Django code in these separate servers. Not only that, I will have to install all the other python libraries which the Django apps require.
This way I will have to transfer all the django source code to all possible servers in the network, install dependencies and run some kind of an update system which will sync all the sources across nodes.
Is this the right way of doing things? Is there a simpler way of
making the celery workers run outside the web application server
where the Django code is hosted ?
If indeed there is no way other than to copy code and replicate in
all servers, is there a way to copy only the source files which the
celery task needs (which will include all models and views - not so
small a task either)
For this type of situation I have in the past made a egg of all of my celery task code that I can simply rsync or copy in some fashion to my worker nodes. This way you can edit your celery code in a single project that can be used in your django and on your work nodes.
So in summary create a web-app-celery-tasks project and make it into an installable egg and have a web-app package that depends on the celery tasks egg.
I am considering using celery in my project. I found a lot of information about how to use it etc. What I am interested in is how to deploy/package my solution.
I need to run two components - django app and then celeryd worker (component that sends emails). For example I would like my django app to use email_ticket task that would email support tickets. I create tasks.py in the django app.
#task
def email_ticket(from, message):
...
Do I deploy my django app and then just run celeryd as separate process from the same path?
./manage.py celeryd ...
What about workers on different servers? Deploy whole django application and run only celeryd? I understand I could use celery only for the worker, but I would like to use celerycam and celerybeat.
Any feedback is appreciated. Thanks
Thanks for any feedback.
This is covered in the documentation here. The gist is you need to download some init scripts and setup some config. Once that's done celeryd will start on boot and you'll be off and running.