I am trying to come up with a notification service for a list of events for which the data is available in the database every few minutes and gets updated using some mechanism. 2 minutes before the next event, I need to read this database and send out the data to my subscribers as a reminder that the event is about to start. This times are not fixed. They depend on the event time of the next event.
Right now I am creating a celery worker for every user who subscribes. I make the specific celery worker go to sleep till the next event, at which point it resumes and sends out the messge.
Something like this:
nextEventDelay = events.getTimeToNextEventInSeconds()
sleep(nextEventDelay)
SendEventNotification()
But I know, it is not good. For a single person/ 2 people it's working. But for 1000 users, if it spawns 1000 workers, it will not be good.
So my solution? I am thinking of creating a single worker process which will monitor the database for subscribers and once the notification is to be sent out will read from database and send to them. But, this takes care of only one event. Should I keep this in an infinite for loop to notify about the next event?
I am using Celery for async task management with redis. The appplication is Python flask application. Let me know if you need any more info. Thanks.
Using celery beats you could run a job every x seconds to check if any events are within two minutes of starting. You could then trigger your 'reminder' jobs from that task.
Here is the documentation for periodic celery tasks.
http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
I would suggest you stay far away from long running celery tasks as I have not had a great experience with them.
Here is some untested pseudo code to get you started.
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# check for events every 20 seconds
sender.add_periodic_task(20.0, trigger_reminders.s(), name='check for upcoming events')
#app.task
def trigger_reminders(*args, **kwargs):
upcoming_events = get_upcoming_events()
for event in upcoming_events:
send_notification.delay(event)
#app.task
def send_event(*args, **kwargs):
#Send the user notification
Related
To access the info of a celery task, i need the task_id. When a celery task gets manually started i can easily get the id of this task with task.id (and the write it to DB or do something else).
If i use celery-beat, which periodically sends tasks to the worker, that seems not to be possible.
So my question is, how to get the id from the task in the moment beat sends the task to celery's worker?
In the moment the worker receives the task, the console shows the task-id. So my worry is, that in the moment the task got sent by beat to the worker, it has no task id.
Manual case to get task_id:
task = tasks.LongRunningTask.delay(username_from_formTargetsLaden, password_from_formTargetsLaden, url_from_formTargetsLaden)
task_id = task.id
Mayhaps some of you got an idea?
i found an answer for that little issue:
If you need the task id of the tasks which was initially sent by beat, you can simply add an inspect function to your (schedulded) worker task.
Configure Periodic-Task
This is the schedule which "reminds" celery every day at 11:08 am (UTC) to kick off the task.
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
test = sender.add_periodic_task(crontab(minute=8, hour=11), CheckLists.s(app.config['USR'], app.config['PWD']))
Task to be executed periodically
This is the scheduled task which will be executed by celery after the worker received the "reminder" from beat.
#celery.task(bind=True)
def CheckLists(self, arg1, arg2):
#get task_id von scheduled Task 'Check-List'
i = inspect()
activetasks = i.active()
list_of_tasks = {'activetasks': activetasks}
task_id = list_of_tasks['activetasks']['celery#DESKTOP-XXXXX'][0]['id'] #adapt this section depending on environment (local, webserver, etc...)
task_type = "CHECK_LISTS"
task_id_to_db = Tasks(task_id, task_type)
db.session.add(task_id_to_db)
db.session.commit()
long_runnning_task
[...more task relevant code here...]
So i'm making use out of app.control.inspect which lets you inspect running workers. It uses remote control commands under the hood.
With i.active() you will get a dictionary, which you can easily parse.
As long as i don't find any documentation how to get the task_id from a periodic task more easier, i'll stick to that solution.
After you saved the task id you can easily poll task status etc. via AJAX for instance.
Hope that helps you guys :)
I need to create a periodic task that will run every five minutes to check an azure service bus for new messages. Every time it runs, I would like the worker process to remain active for five minutes, whether there are new messages waiting or not. I need to specify a new queue for these tasks as well.
In a django project with celery installed, where do I configure this new queue and above mentioned configuration?
I have created a new python file containing a class that inherits from PeriodicTask, passing the parameter run_every = 300 to run the task every 5 minutes. I would like to assign it to azure_queue, but don't know where to configure it
def CheckForProfileUpdates(PeriodicTask)
run_every = 300
def run(self, queue_name='azure_queue'):
result = check_service_bus_for_profile_update()
if update_was_found(result):
update = json.loads(result.body)
#do business logic here
I would like these tasks to execute every 5 minutes, and each time, the worker must remain active throughout those five minutes, in a dedicated queue called 'azure_queue'. Again, where do I specify these settings using django and celery?
Currently, we are using Celery & RabbitMQ to perform repeatable tasks on Ubuntu 14.04 servers and everything is working great. Celery picks up tasks from RMQ and executes the correct method. We have 12 Celery workers constantly monitoring RMQ queues. We have a new requirement where we want to execute 1 method in Celery only once or say once a day. Is this possible to do? I don't want to look at possibly other technologies as we are invested in Celery/RMQ at the moment.
Thanks in advance.
For every task, you can store a boolean value which will keep track whether that is executed for the day for not, this data you can store in db or some file store.
Maintain a cron that executes daily that sets every task value to false(assuming false as task not executed for that day).
Create a celery pre_run signal that will return if the task is already done for the day else continues task processing
from django.db import models
class TaskModel(models.Model)
task = models.CharField(max_length=200)
is_executed = models.BooleanField(default=False)
from celery.signals import task_prerun
#task_prerun.connect()
def task_setup(signal=None, sender=None, task_id=None, task=None, args=None, kwargs=None):
# this method executes before every celery task
task_obj = TaskModel.objects.get(task=task.name)
if task_obj.is_executed:
return
Celery beat is made exactly for this requirement: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
I found that most of the docs regarding Django Channels are about WebSockets. But I want to use them in a different way, and I believe it is possible.
How to run the async periodic task using Django channels? For example, I want to check the temperature on some website (through the API) every 15 seconds and I need a notification when its hit > 20.
It also means that this task will live for a long time (maybe even for 3 month), is Django capable of keeping the consumers live for a long time?
Thank you.
Channels is easily put to work for background tasks - see here for notes on doing so with the new version of Channels:
https://github.com/jayhale/channels-examples-bg-task
With a background task in place, you could set up a task that you could call with cron that puts your tasks in the queue at whatever period you would like. Downside to cron is that it doesn't do sub-minute scheduling out of the box - some hacking is required. See: Running a cron every 30 seconds
E.g.:
Add a job to the app user's crontab:
# crontab
/5 * * * * python manage.py cron_every_5_minutes
Your custom management command can spawn the tasks for channels:
# myapp/management/commands/cron_every_5_minutes.py
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
from django.core.management.base import BaseCommand
class Command(BaseCommand):
channel_layer = get_channel_layer()
help = 'Periodic task to be run every 5 minutes'
def handle(self, *args, **options):
async_to_sync(channel_layer.send)('background-tasks', {'type': 'task_5_min'})
Regarding the expected reliability of a worker - they can run indefinitely, but you should expect them to occasionally fail. Managing the workers is more of a question of how you intend to supervise the processes (or containers, or however you are architecting).
I have a task that runs every night that updates users from a external system. How can I prevent my server flooding the external system with request?
My code:
#task()
def update_users():
#Get all users
users = User.objects.all()
for userobject in users:
#send to update task:
update_user.apply_async(args=[userobject.username,], countdown=15)
Is there any way to "slow down" the forloop, or is it possible to make celery not executing a task, if there is already a task running?
First, you need to specify what exactly you mean with "flooding" the service. Is it the fact that many requests end up being fired concurrently to one server? If that is the case, a common pattern really is to apply a pool of workers, with a fixed size N. Using this approach, there is a guarantee that your service is queried with at most N requests concurrently. That is, at any point in time, there will be no more than N outstanding requests. This effectively throttles your request rate.
You can then play with N and do some benchmarking and see which number is reasonable in your specific case.
You could use a lock on your task, forcing it to execute only once at a time for the whole pool of workers. You can check this celery recipe.
Putting a time.sleep at won't help you because there's a chance that those tasks will execute at the same time, say if there is some delay on the queue.
1You could use time.sleep to delay your task
import time
#task()
def update_users():
#Get all users
users = User.objects.all()
for userobject in users:
#send to update task:
update_user.apply_async(args=[userobject.username,], countdown=15)
time.sleep(1)
this will delay the for loop for 1 second