Currently, we are using Celery & RabbitMQ to perform repeatable tasks on Ubuntu 14.04 servers and everything is working great. Celery picks up tasks from RMQ and executes the correct method. We have 12 Celery workers constantly monitoring RMQ queues. We have a new requirement where we want to execute 1 method in Celery only once or say once a day. Is this possible to do? I don't want to look at possibly other technologies as we are invested in Celery/RMQ at the moment.
Thanks in advance.
For every task, you can store a boolean value which will keep track whether that is executed for the day for not, this data you can store in db or some file store.
Maintain a cron that executes daily that sets every task value to false(assuming false as task not executed for that day).
Create a celery pre_run signal that will return if the task is already done for the day else continues task processing
from django.db import models
class TaskModel(models.Model)
task = models.CharField(max_length=200)
is_executed = models.BooleanField(default=False)
from celery.signals import task_prerun
#task_prerun.connect()
def task_setup(signal=None, sender=None, task_id=None, task=None, args=None, kwargs=None):
# this method executes before every celery task
task_obj = TaskModel.objects.get(task=task.name)
if task_obj.is_executed:
return
Celery beat is made exactly for this requirement: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
Related
To access the info of a celery task, i need the task_id. When a celery task gets manually started i can easily get the id of this task with task.id (and the write it to DB or do something else).
If i use celery-beat, which periodically sends tasks to the worker, that seems not to be possible.
So my question is, how to get the id from the task in the moment beat sends the task to celery's worker?
In the moment the worker receives the task, the console shows the task-id. So my worry is, that in the moment the task got sent by beat to the worker, it has no task id.
Manual case to get task_id:
task = tasks.LongRunningTask.delay(username_from_formTargetsLaden, password_from_formTargetsLaden, url_from_formTargetsLaden)
task_id = task.id
Mayhaps some of you got an idea?
i found an answer for that little issue:
If you need the task id of the tasks which was initially sent by beat, you can simply add an inspect function to your (schedulded) worker task.
Configure Periodic-Task
This is the schedule which "reminds" celery every day at 11:08 am (UTC) to kick off the task.
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
test = sender.add_periodic_task(crontab(minute=8, hour=11), CheckLists.s(app.config['USR'], app.config['PWD']))
Task to be executed periodically
This is the scheduled task which will be executed by celery after the worker received the "reminder" from beat.
#celery.task(bind=True)
def CheckLists(self, arg1, arg2):
#get task_id von scheduled Task 'Check-List'
i = inspect()
activetasks = i.active()
list_of_tasks = {'activetasks': activetasks}
task_id = list_of_tasks['activetasks']['celery#DESKTOP-XXXXX'][0]['id'] #adapt this section depending on environment (local, webserver, etc...)
task_type = "CHECK_LISTS"
task_id_to_db = Tasks(task_id, task_type)
db.session.add(task_id_to_db)
db.session.commit()
long_runnning_task
[...more task relevant code here...]
So i'm making use out of app.control.inspect which lets you inspect running workers. It uses remote control commands under the hood.
With i.active() you will get a dictionary, which you can easily parse.
As long as i don't find any documentation how to get the task_id from a periodic task more easier, i'll stick to that solution.
After you saved the task id you can easily poll task status etc. via AJAX for instance.
Hope that helps you guys :)
I have a django app with celery 4.1.0 and celery beat with database scheduler. What I want is to run periodic tasks from admin site and set expiration time for each of this tasks. expire property in PeriodicTask is a time scheduler stops creating new messages for that task but i want the expiration to revoke tasks which are scheduled but are older than some value e.g. one hour. how to do this?
I am really confused with celery documentation and differences between different versions of it.
Sounds like you need to use a custom scheduler class.
I solved it by running an scheduled task which runs defined task with desired expiry time:
#shared_task(bind=True, queue='q1', max_retries=3)
def parent_task(self, arg1):
child_task.apply_async(kwargs={'arg1': arg1}, expires=86400)
#shared_task(bind=True, queue='q1', max_retries=3)
def child_task(self, arg1):
pass
I am trying to come up with a notification service for a list of events for which the data is available in the database every few minutes and gets updated using some mechanism. 2 minutes before the next event, I need to read this database and send out the data to my subscribers as a reminder that the event is about to start. This times are not fixed. They depend on the event time of the next event.
Right now I am creating a celery worker for every user who subscribes. I make the specific celery worker go to sleep till the next event, at which point it resumes and sends out the messge.
Something like this:
nextEventDelay = events.getTimeToNextEventInSeconds()
sleep(nextEventDelay)
SendEventNotification()
But I know, it is not good. For a single person/ 2 people it's working. But for 1000 users, if it spawns 1000 workers, it will not be good.
So my solution? I am thinking of creating a single worker process which will monitor the database for subscribers and once the notification is to be sent out will read from database and send to them. But, this takes care of only one event. Should I keep this in an infinite for loop to notify about the next event?
I am using Celery for async task management with redis. The appplication is Python flask application. Let me know if you need any more info. Thanks.
Using celery beats you could run a job every x seconds to check if any events are within two minutes of starting. You could then trigger your 'reminder' jobs from that task.
Here is the documentation for periodic celery tasks.
http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
I would suggest you stay far away from long running celery tasks as I have not had a great experience with them.
Here is some untested pseudo code to get you started.
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# check for events every 20 seconds
sender.add_periodic_task(20.0, trigger_reminders.s(), name='check for upcoming events')
#app.task
def trigger_reminders(*args, **kwargs):
upcoming_events = get_upcoming_events()
for event in upcoming_events:
send_notification.delay(event)
#app.task
def send_event(*args, **kwargs):
#Send the user notification
Using Celery ver.3.1.23, I am trying to dynamically add a scheduled task to celery beat. I have one celery worker and one celery beat instance running.
Triggering a standard celery task y running task.delay() works ok. When I define a scheduled periodic task as a setting in configuration, celery beat runs it.
However what I need is to be able to add a task that runs at specified crontab at runtime. After adding a task to persistent scheduler, celery beat doesn't seem to detect the newly added new task. I can see that the celery-schedule file does have an entry with new task.
Code:
scheduler = PersistentScheduler(app=current_app, schedule_filename='celerybeat-schedule')
scheduler.add(name="adder",
task="app.tasks.add",
schedule=crontab(minute='*/1'),
args=(1,2))
scheduler.close()
When I run:
print(scheduler.schedule)
I get:
{'celery.backend_cleanup': <Entry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>,
'adder': <Entry: adder app.tasks.add(1, 2) <crontab: */1 * * * * (m/h/d/dM/MY)>}
Note that app.tasks.add has the #celery.task decorator.
Instead of trying to find a good workaround, I suggest you switch to the Celery Redbeat.
You may solve your problem by enabling autoreloading.
However I'm not 100% sure it will work for your config file but it should if is in the CELERY_IMPORTS paths.
Hoverer note that this feature is experimental and to don't be used in production.
If you really want to have dynamic celerybeat scheduling you can always use another scheduler like the django-celery one to manage periodic tasks on db via a django admin.
I'm having a similar problem and a solution I thought about is to pre-define some generic periodic tasks (every 1s, every 5mins, etc) and then have them getting, from DB, a list of function to be executed.
Every time you want to add a new task you just add an entry in your DB.
Celery beat stores all the periodically scheduled tasks in the model PeriodicTask . As a beat task can be scheduled in different ways including crontab, interval or solar. All these fields are a foreign key in the PeriodicTask model.
In order to dynamically add a scheduled task, just populate the relevant models in celery beat, the scheduler will detect changes. The changes are detected when either the count of tuple changes or save() function is called.
from django_celery_beat.models import PeriodicTask, CrontabSchedule
# -- Inside the function you want to add task dynamically
schedule = CrontabSchedule.objects.create(minute='*/1')
task = PeriodicTask.objects.create(name='adder',
task='apps.task.add', crontab=schedule)
task.save()
I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication