I have several applications that communicate using RabbitMQ. I need to process messages from the queue in Python application, these messages are created and added to the queue by different application. Is there a way to configure Celery to process tasks that were not created in Python? Or is there some way to achieve this without using Celery?
Related
How do I get logs from different workers in a single file in Celery.
I have 3 workers running python tasks and a master node which runs the broker. I want to consolidate logs from these worker machines and store it in the master machine. How can I do that?
We have all our Celery workers on AWS EC2 instances configured to upload Celery logs to CloudWatch. This is one way to achieve what you want. It is not difficult to implement this kind of system even if you are not on AWS - all you need is an agent running on each worker machine that periodically uploads Celery logs to central place. It can even be a cron-job running a script that does the job for you.
In my Django project I implemented Celery, which is running using RabbitMQ backend.
What celery does:
Celery puts my tasks into queue, and then under certain conditions runs them. That being said, I basically interact with RabbitMQ message queue exclusively using Celery python interface.
Problem:
I want just to push simple string message into RabbitMQ queue, which should be consumed by 3rd party application.
What I tried:
There is a way how to directly connect to RabbitMQ using Pika library. However I would find it a little clunky - If I have already Celery connected to RabbitMQ, why not use it (if possible) to send simple messages to a specific queue, instead opening another connection using mentioned Pika library.
Any insights appreciated.
You cannot use Celery to send arbitrary messages to your RabbitMQ server.
However, considering that you already use RabbitMQ as a broker, which means you already have all the necesary RabbitMQ support (py-amqp supports it either directly, or via librabbitmq), you can easily send messages to the MQ server from your Celery tasks. If you for whatever reason do not like py-amqp, you may use Pika as you mentioned already.
I have a follow-on / clarification question related to an older question
I have 2 servers (for now). 1 server runs a django web application. The other server runs pure python scripts that are CRON-scheduled data acquisition & processing jobs for the web app.
There is a use case where user activity in the web application (updating a certain field) should trigger a series of actions by the backend server. I could stick with CRON but as we scale up, I can imagine running into trouble. Celery seems like a good solution except I'm unclear how to implement it. (Yes, I did read the getting started guide).
I want the web application to send tasks to a specific queue but the backend server to actually execute the work.
Assuming that both servers are using the same broker URL,
Do I need to define stub tasks in Djando or can I just use the celery.send_task method?
Should I still be using django-celery?
Meanwhile the backend server will be running Celery with the full implementation of the tasks and workers?
I decided to try it and work through any issues that came up.
On my django server, I did not use django-celery. I installed celery and redis (via pip) and followed most of the instructions in the First Steps with Django:
updated proj/proj/settings.py file to include the bare minimum of
configuration for Celery such as the BROKER_URL
created the proj/proj/celery.py file but without the task defined
at the bottom
updated the proj/proj/__init__.py file as documented
Since the server running django wasn't actually going to execute any
Celery tasks, in the view that would trigger a task, I added the
following:
from proj.celery import app as celery_app
try:
# send it to celery for backend processing
celery_app.send_task('tasks.mytask', kwargs={'some_id':obj.id,'another_att':obj.att}, queue='my-queue')
except Exception as err:
print('Issue sending task to Celery')
print err
The other server had the following installed: celery and redis (I used an AWS Elasticache redis instance for this testing).
This server had the following files:
celeryconfig.py will all of my Celery configuration and queues
defined, pointing to the same BROKER_URL as the django server
tasks.py with the actual code for all of my tasks
The celery workers were then started on this server, using the standard command: celery -A tasks worker -Q my-queue1,my-queue2
For testing, the above worked. Now I just need to make celery run in the background and optimize the number of workers/queue.
If anyone has additional comments or improvements, I'd love to hear them!
I'm using RabbitMQ to make my tasks pool run sequentially one by one. But how can add a time parameter to make a task only run at the defined time in the future (like a scheduled tasks).
RabbitMQ is not a task scheduler, even though the documentation talks about "scheduling" a task. You might consider using something like cron. You could also use a library like sched to build a scheduler in a Python process.
FYI It looks like this question has already been answered:
Delayed message in RabbitMQ
RabbitMQ has a plugin for delayed messages.
Using this plugin, the messages can be delivered to the respective queues after a certain delay. Thanks to this plugin, you can use RabbitMQ as a scheduler, even though it's not a task scheduler by nature.
You can use celery along with rabbitmq as broker for task scheduling. Here is the celery documentation http://docs.celeryproject.org/en/master/index.html
I have a webapp that's built on python/Flask and it has a corresponding background job that runs continuously, periodically polling for data for each registered user.
I would like this background job to start when the system starts and keep running til it shuts down. Instead of setting up /etc/rc.d scripts, I just had the flask app spawn a new process (using the multiprocessing module) when the app starts up.
So with this setup, I only have to deploy the Flask app and that will get the background worker running as well.
What are the downsides of this? Is this a complete and utter hack that is fragile in some way or a nice way to set up a webapp with corresponding background task?
The downside of your approach is that there are many ways it could fail especially around stopping and restarting your flask application.
You will have to deal with graceful shutdown to give your worker a chance to finish its current task.
Sometime your worker won't stop on time and might linger while you start another one when you reboot your flask application.
Here are some approches I would suggest depending on your constraints:
script + crontab
You only have to write a script that does whatever task you want and cron will take care of running it for you every few minutes. Advantages: cron will run it for you periodically and will start when the system starts. Disadvantages: if the task takes too long, you might have multiple instances of your script running at the same time. You can find some solutions for this problem here.
supervisord
supervisord is a neat way to deal with different daemons. You can set it to run your app, your background script or both and have them start with the server. Only downside is that you have to install supervisord and make sure its daemon is running when the server starts.
uwsgi
uwsgi is a very common way for deploying flask applications. It has few features that might come in handy for managing background workers.
Celery
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. I think this is the best solution for scheduling background tasks for a flask application or any other python based application. But using it comes with some extra bulk. You will be introducing at least the following processes:
- a broker (rabbitmq or redis)
- a worker
- a scheduler
You can also get supervisord to manage all of the processes above and get them to start when the server starts.
Conclusion
In your quest of reducing the number of processes, I would highly suggest the crontab based solution as it can get you a long way. But please make sure your background script leaves an execution trace or logs of some sort.