Django channels for asynchronous periodic tasks

Django channels for asynchronous periodic tasks - python

I found that most of the docs regarding Django Channels are about WebSockets. But I want to use them in a different way, and I believe it is possible.
How to run the async periodic task using Django channels? For example, I want to check the temperature on some website (through the API) every 15 seconds and I need a notification when its hit > 20.
It also means that this task will live for a long time (maybe even for 3 month), is Django capable of keeping the consumers live for a long time?
Thank you.

Channels is easily put to work for background tasks - see here for notes on doing so with the new version of Channels:
https://github.com/jayhale/channels-examples-bg-task
With a background task in place, you could set up a task that you could call with cron that puts your tasks in the queue at whatever period you would like. Downside to cron is that it doesn't do sub-minute scheduling out of the box - some hacking is required. See: Running a cron every 30 seconds
E.g.:
Add a job to the app user's crontab:
# crontab
/5 * * * * python manage.py cron_every_5_minutes
Your custom management command can spawn the tasks for channels:
# myapp/management/commands/cron_every_5_minutes.py
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
from django.core.management.base import BaseCommand
class Command(BaseCommand):
channel_layer = get_channel_layer()
help = 'Periodic task to be run every 5 minutes'
def handle(self, *args, **options):
async_to_sync(channel_layer.send)('background-tasks', {'type': 'task_5_min'})
Regarding the expected reliability of a worker - they can run indefinitely, but you should expect them to occasionally fail. Managing the workers is more of a question of how you intend to supervise the processes (or containers, or however you are architecting).

Related

How to Inspect the Queue Processing a Celery Task

I'm currently leveraging celery for periodic tasks. I am new to celery. I have two workers running two different queues. One for slow background jobs and one for jobs user's queue up in the application.
I am monitoring my tasks on datadog because it's an easy way to confirm my workers a running appropriately.
What I want to do is after each task completes, record which queue the task was completed on.
#after_task_publish.connect()
def on_task_publish(sender=None, headers=None, body=None, **kwargs):
statsd.increment("celery.on_task_publish.start.increment")
task = celery.tasks.get(sender)
queue_name = task.queue
statsd.increment("celery.on_task_publish.increment", tags=[f"{queue_name}:{task}"])
The following function is something that I implemented after researching the celery docs and some StackOverflow posts, but it's not working as intended. I get the first statsd increment but the remaining code does not execute.
I am wondering if there is a simpler way to inspect inside/after each task completes, what queue processed the task.

Since your question says is there a way to inspect inside/after each task completes - I'm assuming you haven't tried this celery-result-backend stuff. So you could check out this feature which is provided by Celery itself : Celery-Result-Backend / Task-result-Backend .
It is very useful for storing results of your celery tasks.
Read through this => https://docs.celeryproject.org/en/stable/userguide/configuration.html#task-result-backend-settings
Once you get an idea of how to setup this result-backend, Search for result_extended key (in the same link) to be able to add queue-names in your task return values.
Number of options are available - Like you can setup these results to go to any of these :
Sql-DB / NoSql-DB / S3 / Azure / Elasticsearch / etc
I have made use of this Result-Backend feature with Elasticsearch and this how my task results are stored :
It is just a matter of adding few configurations in settings.py file as per your requirements. Worked really well for my application. And I have a weekly cron that clears only successful results of tasks - since we don't need the results anymore - and I can see only failed results (like the one in image).
These were main keys for my requirement : task_track_started and task_acks_late along with result_backend

How do you create a new celery queue for a periodic task for a django application?

I need to create a periodic task that will run every five minutes to check an azure service bus for new messages. Every time it runs, I would like the worker process to remain active for five minutes, whether there are new messages waiting or not. I need to specify a new queue for these tasks as well.
In a django project with celery installed, where do I configure this new queue and above mentioned configuration?
I have created a new python file containing a class that inherits from PeriodicTask, passing the parameter run_every = 300 to run the task every 5 minutes. I would like to assign it to azure_queue, but don't know where to configure it
def CheckForProfileUpdates(PeriodicTask)
run_every = 300
def run(self, queue_name='azure_queue'):
result = check_service_bus_for_profile_update()
if update_was_found(result):
update = json.loads(result.body)
#do business logic here
I would like these tasks to execute every 5 minutes, and each time, the worker must remain active throughout those five minutes, in a dedicated queue called 'azure_queue'. Again, where do I specify these settings using django and celery?

Is this the right approach to run long running async tasks?

I am trying to come up with a notification service for a list of events for which the data is available in the database every few minutes and gets updated using some mechanism. 2 minutes before the next event, I need to read this database and send out the data to my subscribers as a reminder that the event is about to start. This times are not fixed. They depend on the event time of the next event.
Right now I am creating a celery worker for every user who subscribes. I make the specific celery worker go to sleep till the next event, at which point it resumes and sends out the messge.
Something like this:
nextEventDelay = events.getTimeToNextEventInSeconds()
sleep(nextEventDelay)
SendEventNotification()
But I know, it is not good. For a single person/ 2 people it's working. But for 1000 users, if it spawns 1000 workers, it will not be good.
So my solution? I am thinking of creating a single worker process which will monitor the database for subscribers and once the notification is to be sent out will read from database and send to them. But, this takes care of only one event. Should I keep this in an infinite for loop to notify about the next event?
I am using Celery for async task management with redis. The appplication is Python flask application. Let me know if you need any more info. Thanks.

Using celery beats you could run a job every x seconds to check if any events are within two minutes of starting. You could then trigger your 'reminder' jobs from that task.
Here is the documentation for periodic celery tasks.
http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
I would suggest you stay far away from long running celery tasks as I have not had a great experience with them.
Here is some untested pseudo code to get you started.
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# check for events every 20 seconds
sender.add_periodic_task(20.0, trigger_reminders.s(), name='check for upcoming events')
#app.task
def trigger_reminders(*args, **kwargs):
upcoming_events = get_upcoming_events()
for event in upcoming_events:
send_notification.delay(event)
#app.task
def send_event(*args, **kwargs):
#Send the user notification

External API RabbitMQ and Celery rate limit

I'm using an external REST API which limits my API request at 1 CPS.
This is the following architecture:
Versions:
Flask
RabbitMQ 3.6.4
AMPQ 1.4.9
kombu 3.0.35
Celery 3.1.23
Python 2.7
API client send Web request to internal API, API process the request and control at which rate is sent to RabbitMQ. These tasks could take from 5 seconds to 120 seconds, and there are situations in which tasks may Queue up and they get sent to external API at a higher rate than the one defined, resulting in numerous failed requests. (Resulting in around 5% of failed requests)
Possible solutions:
Increase External API limit
Add more workers
Keep track of failed tasks and retry them later
Although those solutions may work, is not solving exactly the implementation of my rate limiter and controlling the real rate in which my workers can process the API requests. As later I really need to control the external rate.
I believe if I can control RabbitMQ rate limit in which messages can be sent to workers, this could be a better option. I found the rabbitmq prefetch option but not sure if anyone can recommend other options to control the rate in which messages are sent to consumers?

You will need to create your own rate limiter as Celery's rate limit only works per-worker and "does not work as you expect it to".
I have personally found it completely breaks when trying to add new tasks from another task.
I think the requirement spectrum for rate limiting is too wide and depends on an application itself, so Celery's implementation is intentionally too simple.
Here is an example I've created using Celery + Django + Redis.
Basically it adds a convenience method to your App.Task class which will keep track of your task execution rate in Redis. If it is too high, the task will Retry at a later time.
This example uses sending a SMTP message as an example, but can easily be replaced with API calls.
The algorithm is inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
https://gist.github.com/Vigrond/2bbea9be6413415e5479998e79a1b11a
# Rate limiting with Celery + Django + Redis
# Multiple Fixed Windows Algorithm inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
# and Celery's sometimes ambiguous, vague, and one-paragraph documentation
#
# Celery's Task is subclassed and the is_rate_okay function is added
# celery.py or however your App is implemented in Django
import os
import math
import time
from celery import Celery, Task
from django_redis import get_redis_connection
from django.conf import settings
from django.utils import timezone
app = Celery('your_app')
# Get Redis connection from our Django 'default' cache setting
redis_conn = get_redis_connection("default")
# We subclass the Celery Task
class YourAppTask(Task):
def is_rate_okay(self, times=30, per=60):
"""
Checks to see if this task is hitting our defined rate limit too much.
This example sets a rate limit of 30/minute.
times (int): The "30" in "30 times per 60 seconds".
per (int): The "60" in "30 times per 60 seconds".
The Redis structure we create is a Hash of timestamp keys with counter values
{
'1560649027.515933': '2', // unlikely to have more than 1
'1560649352.462433': '1',
}
The Redis key is expired after the amount of 'per' has elapsed.
The algorithm totals the counters and checks against 'limit'.
This algorithm currently does not implement the "leniency" described
at the bottom of the figma article referenced at the top of this code.
This is left up to you and depends on application.
Returns True if under the limit, otherwise False.
"""
# Get a timestamp accurate to the microsecond
timestamp = timezone.now().timestamp()
# Set our Redis key to our task name
key = f"rate:{self.name}"
# Create a pipeline to execute redis code atomically
pipe = redis_conn.pipeline()
# Increment our current task hit in the Redis hash
pipe.hincrby(key, timestamp)
# Grab the current expiration of our task key
pipe.ttl(key)
# Grab all of our task hits in our current frame (of 60 seconds)
pipe.hvals(key)
# This returns a list of our command results. [current task hits, expiration, list of all task hits,]
result = pipe.execute()
# If our expiration is not set, set it. This is not part of the atomicity of the pipeline above.
if result[1] < 0:
redis_conn.expire(key, per)
# We must convert byte to int before adding up the counters and comparing to our limit
if sum([int(count) for count in result[2]]) <= times:
return True
else:
return False
app.Task = YourAppTask
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
...
# SMTP Example
import random
from YourApp.celery import app
from django.core.mail import EmailMessage
# We set infinite max_retries so backlogged email tasks do not disappear
#app.task(name='smtp.send-email', max_retries=None, bind=True)
def send_email(self, to_address):
if not self.is_rate_okay():
# We implement a random countdown between 30 and 60 seconds
# so tasks don't come flooding back at the same time
raise self.retry(countdown=random.randint(30, 60))
message = EmailMessage(
'Hello',
'Body goes here',
'from#yourdomain.com',
[to_address],
)
message.send()

How to dynamically add a scheduled task to Celery beat

Using Celery ver.3.1.23, I am trying to dynamically add a scheduled task to celery beat. I have one celery worker and one celery beat instance running.
Triggering a standard celery task y running task.delay() works ok. When I define a scheduled periodic task as a setting in configuration, celery beat runs it.
However what I need is to be able to add a task that runs at specified crontab at runtime. After adding a task to persistent scheduler, celery beat doesn't seem to detect the newly added new task. I can see that the celery-schedule file does have an entry with new task.
Code:
scheduler = PersistentScheduler(app=current_app, schedule_filename='celerybeat-schedule')
scheduler.add(name="adder",
task="app.tasks.add",
schedule=crontab(minute='*/1'),
args=(1,2))
scheduler.close()
When I run:
print(scheduler.schedule)
I get:
{'celery.backend_cleanup': <Entry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>,
'adder': <Entry: adder app.tasks.add(1, 2) <crontab: */1 * * * * (m/h/d/dM/MY)>}

Note that app.tasks.add has the #celery.task decorator.

Instead of trying to find a good workaround, I suggest you switch to the Celery Redbeat.

You may solve your problem by enabling autoreloading.
However I'm not 100% sure it will work for your config file but it should if is in the CELERY_IMPORTS paths.
Hoverer note that this feature is experimental and to don't be used in production.
If you really want to have dynamic celerybeat scheduling you can always use another scheduler like the django-celery one to manage periodic tasks on db via a django admin.

I'm having a similar problem and a solution I thought about is to pre-define some generic periodic tasks (every 1s, every 5mins, etc) and then have them getting, from DB, a list of function to be executed.
Every time you want to add a new task you just add an entry in your DB.

Celery beat stores all the periodically scheduled tasks in the model PeriodicTask . As a beat task can be scheduled in different ways including crontab, interval or solar. All these fields are a foreign key in the PeriodicTask model.
In order to dynamically add a scheduled task, just populate the relevant models in celery beat, the scheduler will detect changes. The changes are detected when either the count of tuple changes or save() function is called.
from django_celery_beat.models import PeriodicTask, CrontabSchedule
# -- Inside the function you want to add task dynamically
schedule = CrontabSchedule.objects.create(minute='*/1')
task = PeriodicTask.objects.create(name='adder',
task='apps.task.add', crontab=schedule)
task.save()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django channels for asynchronous periodic tasks - python

Related

How to Inspect the Queue Processing a Celery Task

How do you create a new celery queue for a periodic task for a django application?

Is this the right approach to run long running async tasks?

External API RabbitMQ and Celery rate limit

How to dynamically add a scheduled task to Celery beat

Categories

Resources