How to consolidate celery logs from different computers. CELERY - python

How do I get logs from different workers in a single file in Celery.
I have 3 workers running python tasks and a master node which runs the broker. I want to consolidate logs from these worker machines and store it in the master machine. How can I do that?

We have all our Celery workers on AWS EC2 instances configured to upload Celery logs to CloudWatch. This is one way to achieve what you want. It is not difficult to implement this kind of system even if you are not on AWS - all you need is an agent running on each worker machine that periodically uploads Celery logs to central place. It can even be a cron-job running a script that does the job for you.

Related

Are celery tasks and queues interrupted when elastic beanstalk EC2 instances are removed?

I have a Django app running on Elastic Beanstalk in a Multicontainer Docker platform. So each EC2 instance has Docker containers for Django, Celery, RabbitMQ and Nginx.
I have concerns regarding celery tasks when an EC2 instance is removed due to an auto-scale event or an immutable deployment.
Will current tasks in the celery queue be lost when an instance is
removed?
Can a running celery task be interrupted on an instance removal?
Celery beat schedule (cron) would be called from every new instance
launched, causing duplicated calls.
I'm curious if anyone else has experience solving the above? Here's a list of some of the solutions I'm thinking about:
Change the celery broker to a remote ElastiCache Redis instance. Not
sure that would work tho.
Use another library to replace Celery which can store the tasks in
the DB (e.g. huey or apscheduler).
Migrate from celery to AWS SQS + Elastic Beanstalk Worker. That
would mean duplicating the same codebase to be deployed to both the
current web as well as a worker Elastic Beanstalk.
Any other ideas or concerns?
Do you need separate Celery/Rabbit instance for every EC2 instance? Removing the rabbit instance will kill the celery unless you store it externally
(1) and (2) can be solved with elasticache (because redis is a database) and task_acks_late + prefetch = 1 (this is what we do when we use autoscaling + celery). singleton celery beat is generally a harder problem to solve but there are packages that already provide this functionality (e.g., https://pypi.org/project/celery-redbeat/).

How can I run function asynchronously to make calculation parallelly on Heroku with Django app?

I have to run function 500 times with different arguments on Heroku with hobby plan in my Django app at a specific time. I need to do it in the shortest period of time. I noticed that when I use Heroku Scheduler every task is running parallelly and asynchronously and each of them has own worker. So for example 10 functions ran in this way will calculate results as there would be only 1 ran function. As I have mentioned I need to run 500 functions with different arguments. I could create 500 Heroku schedulers and ran it separately but it seems to me that it's not supported by Heroku or maybe I am wrong? If so maybe someone know how it could be solved in another way?
Heroku doesn't support running this amount of workers at the same time in hobby plan.
You can use Celery to run this asynchronously with the amount of workers you want. Heroku hobby plan only supports 1 worker, but Celery will run you task in background at least (if that helps).
If you want to go with Celery, there's a guide to start using Celery on Django

Celery: run tasks created by another app

I have several applications that communicate using RabbitMQ. I need to process messages from the queue in Python application, these messages are created and added to the queue by different application. Is there a way to configure Celery to process tasks that were not created in Python? Or is there some way to achieve this without using Celery?

Using celery with django app and backend server

I have a follow-on / clarification question related to an older question
I have 2 servers (for now). 1 server runs a django web application. The other server runs pure python scripts that are CRON-scheduled data acquisition & processing jobs for the web app.
There is a use case where user activity in the web application (updating a certain field) should trigger a series of actions by the backend server. I could stick with CRON but as we scale up, I can imagine running into trouble. Celery seems like a good solution except I'm unclear how to implement it. (Yes, I did read the getting started guide).
I want the web application to send tasks to a specific queue but the backend server to actually execute the work.
Assuming that both servers are using the same broker URL,
Do I need to define stub tasks in Djando or can I just use the celery.send_task method?
Should I still be using django-celery?
Meanwhile the backend server will be running Celery with the full implementation of the tasks and workers?
I decided to try it and work through any issues that came up.
On my django server, I did not use django-celery. I installed celery and redis (via pip) and followed most of the instructions in the First Steps with Django:
updated proj/proj/settings.py file to include the bare minimum of
configuration for Celery such as the BROKER_URL
created the proj/proj/celery.py file but without the task defined
at the bottom
updated the proj/proj/__init__.py file as documented
Since the server running django wasn't actually going to execute any
Celery tasks, in the view that would trigger a task, I added the
following:
from proj.celery import app as celery_app
try:
# send it to celery for backend processing
celery_app.send_task('tasks.mytask', kwargs={'some_id':obj.id,'another_att':obj.att}, queue='my-queue')
except Exception as err:
print('Issue sending task to Celery')
print err
The other server had the following installed: celery and redis (I used an AWS Elasticache redis instance for this testing).
This server had the following files:
celeryconfig.py will all of my Celery configuration and queues
defined, pointing to the same BROKER_URL as the django server
tasks.py with the actual code for all of my tasks
The celery workers were then started on this server, using the standard command: celery -A tasks worker -Q my-queue1,my-queue2
For testing, the above worked. Now I just need to make celery run in the background and optimize the number of workers/queue.
If anyone has additional comments or improvements, I'd love to hear them!

Background Worker with Flask

I have a webapp that's built on python/Flask and it has a corresponding background job that runs continuously, periodically polling for data for each registered user.
I would like this background job to start when the system starts and keep running til it shuts down. Instead of setting up /etc/rc.d scripts, I just had the flask app spawn a new process (using the multiprocessing module) when the app starts up.
So with this setup, I only have to deploy the Flask app and that will get the background worker running as well.
What are the downsides of this? Is this a complete and utter hack that is fragile in some way or a nice way to set up a webapp with corresponding background task?
The downside of your approach is that there are many ways it could fail especially around stopping and restarting your flask application.
You will have to deal with graceful shutdown to give your worker a chance to finish its current task.
Sometime your worker won't stop on time and might linger while you start another one when you reboot your flask application.
Here are some approches I would suggest depending on your constraints:
script + crontab
You only have to write a script that does whatever task you want and cron will take care of running it for you every few minutes. Advantages: cron will run it for you periodically and will start when the system starts. Disadvantages: if the task takes too long, you might have multiple instances of your script running at the same time. You can find some solutions for this problem here.
supervisord
supervisord is a neat way to deal with different daemons. You can set it to run your app, your background script or both and have them start with the server. Only downside is that you have to install supervisord and make sure its daemon is running when the server starts.
uwsgi
uwsgi is a very common way for deploying flask applications. It has few features that might come in handy for managing background workers.
Celery
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. I think this is the best solution for scheduling background tasks for a flask application or any other python based application. But using it comes with some extra bulk. You will be introducing at least the following processes:
- a broker (rabbitmq or redis)
- a worker
- a scheduler
You can also get supervisord to manage all of the processes above and get them to start when the server starts.
Conclusion
In your quest of reducing the number of processes, I would highly suggest the crontab based solution as it can get you a long way. But please make sure your background script leaves an execution trace or logs of some sort.

Categories

Resources