Celery/Django: Get result of periodic task execution - python

I have a Django 1.7 project using Celery (latest). I have a REST API that receives some parameters, and creates, programmatically, a PeriodicTask. For testing, I'm using a period of seconds:
periodic_task, _= PeriodicTask.objects.get_or_create(name=task_label, task=task_name, interval=interval_schedule)
I store a reference to this tasks somewhere. I start celery beat:
python manage.py celery beat
and a worker:
python manage.py celery worker --loglevel=info
and my task runs as I can see in the worker's output.
I've set the result backend:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
and with that, I can check the task results using the TaskMeta model. The objects there contains the task_id (the same that I would get if I call the task with .delay() or .apply_async() ), the status, the result, everything, beautiful.
However, I can't find a connection between the PeriodicTask object and TaskMeta.
PeriodicTask has a task property, but its just the task name/path. The id is just a consecutive number, not the task_id from TaskMeta, and I really need to be able to find the task that was executed as a PeriodicTask with TaskMeta so I can offer some monitoring over the status. TaskMeta doesn't have any other value that allows me to identify which task ran (since I will have several ones), so at least I could give a status of the last execution.
I've checked all over Celery docs and in here, but no solution so far.
Any help is highly appreciated.
Thanks

You can run service to monitor task have been performed by using command line
python manage.py celerycam --frequency=10.0
More detail at:
http://www.lexev.org/en/2014/django-celery-setup/

Related

How to make sure a Celery task was enqueued in Pytest?

I have an API endpoint to register new user. The "welcome email" will enqueue and do this task async. I have 2 unit tests to check:
Api does save user's information to DB OK
The Celery task does send email with right content+template
I want to add 3rd unit test to ensure "The endpoint has to enqueue email-sending after saving user form to DB"
I try with celery.AsyncResult but it ask me to run the worker. For further, even if the worker is ready, we still can't verify the task was enqueued or not because the ambiguous PENDING state:
Task exists in queue but not execute yet: PENDING
Task doesn't exist in queue: PENDING
Does anyone face this problem? How do I solve it?
Common way to solve this problem in testing environments is to use the task_always_eager configuration setting, which basically instructs Celery to run the task like a regular function. Instead of the AsyncResult, Celery will make an object of the EagerResult type that behaves the same, but has completely different execution logic.

How to get task id from celery in case of scheduled tasks (beat)

To access the info of a celery task, i need the task_id. When a celery task gets manually started i can easily get the id of this task with task.id (and the write it to DB or do something else).
If i use celery-beat, which periodically sends tasks to the worker, that seems not to be possible.
So my question is, how to get the id from the task in the moment beat sends the task to celery's worker?
In the moment the worker receives the task, the console shows the task-id. So my worry is, that in the moment the task got sent by beat to the worker, it has no task id.
Manual case to get task_id:
task = tasks.LongRunningTask.delay(username_from_formTargetsLaden, password_from_formTargetsLaden, url_from_formTargetsLaden)
task_id = task.id
Mayhaps some of you got an idea?
i found an answer for that little issue:
If you need the task id of the tasks which was initially sent by beat, you can simply add an inspect function to your (schedulded) worker task.
Configure Periodic-Task
This is the schedule which "reminds" celery every day at 11:08 am (UTC) to kick off the task.
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
test = sender.add_periodic_task(crontab(minute=8, hour=11), CheckLists.s(app.config['USR'], app.config['PWD']))
Task to be executed periodically
This is the scheduled task which will be executed by celery after the worker received the "reminder" from beat.
#celery.task(bind=True)
def CheckLists(self, arg1, arg2):
#get task_id von scheduled Task 'Check-List'
i = inspect()
activetasks = i.active()
list_of_tasks = {'activetasks': activetasks}
task_id = list_of_tasks['activetasks']['celery#DESKTOP-XXXXX'][0]['id'] #adapt this section depending on environment (local, webserver, etc...)
task_type = "CHECK_LISTS"
task_id_to_db = Tasks(task_id, task_type)
db.session.add(task_id_to_db)
db.session.commit()
long_runnning_task
[...more task relevant code here...]
So i'm making use out of app.control.inspect which lets you inspect running workers. It uses remote control commands under the hood.
With i.active() you will get a dictionary, which you can easily parse.
As long as i don't find any documentation how to get the task_id from a periodic task more easier, i'll stick to that solution.
After you saved the task id you can easily poll task status etc. via AJAX for instance.
Hope that helps you guys :)

How to dynamically add a scheduled task to Celery beat

Using Celery ver.3.1.23, I am trying to dynamically add a scheduled task to celery beat. I have one celery worker and one celery beat instance running.
Triggering a standard celery task y running task.delay() works ok. When I define a scheduled periodic task as a setting in configuration, celery beat runs it.
However what I need is to be able to add a task that runs at specified crontab at runtime. After adding a task to persistent scheduler, celery beat doesn't seem to detect the newly added new task. I can see that the celery-schedule file does have an entry with new task.
Code:
scheduler = PersistentScheduler(app=current_app, schedule_filename='celerybeat-schedule')
scheduler.add(name="adder",
task="app.tasks.add",
schedule=crontab(minute='*/1'),
args=(1,2))
scheduler.close()
When I run:
print(scheduler.schedule)
I get:
{'celery.backend_cleanup': <Entry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>,
'adder': <Entry: adder app.tasks.add(1, 2) <crontab: */1 * * * * (m/h/d/dM/MY)>}
​
Note that app.tasks.add has the #celery.task decorator.
Instead of trying to find a good workaround, I suggest you switch to the Celery Redbeat.
You may solve your problem by enabling autoreloading.
However I'm not 100% sure it will work for your config file but it should if is in the CELERY_IMPORTS paths.
Hoverer note that this feature is experimental and to don't be used in production.
If you really want to have dynamic celerybeat scheduling you can always use another scheduler like the django-celery one to manage periodic tasks on db via a django admin.
I'm having a similar problem and a solution I thought about is to pre-define some generic periodic tasks (every 1s, every 5mins, etc) and then have them getting, from DB, a list of function to be executed.
Every time you want to add a new task you just add an entry in your DB.
Celery beat stores all the periodically scheduled tasks in the model PeriodicTask . As a beat task can be scheduled in different ways including crontab, interval or solar. All these fields are a foreign key in the PeriodicTask model.
In order to dynamically add a scheduled task, just populate the relevant models in celery beat, the scheduler will detect changes. The changes are detected when either the count of tuple changes or save() function is called.
from django_celery_beat.models import PeriodicTask, CrontabSchedule
# -- Inside the function you want to add task dynamically
schedule = CrontabSchedule.objects.create(minute='*/1')
task = PeriodicTask.objects.create(name='adder',
task='apps.task.add', crontab=schedule)
task.save()

Can i use luigi with Python celery

I am using celery for my web application.
Celery executes Parent tasks which then executes further pipline of tasks
The issues with celery
I can't get dependency graph and visualizer i get with luigi to see whats the status of my parent task
Celery does not provide mechanism to restart the failed pipeline and start from where it failed.
These two thing i can easily get from luigi.
So i was thinking that once celery runs the parent task then inside that task i execute the Luigi pipleine.
Is there going to be any issue with that i.e i need to autoscale the celery workers based on queuesize . will that affect any luigi workers across multiple machines??
Never tried but I think it should be possible to call a luigi task form inside a celery task, the same way you do it from python code in general:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.CentralPlannerScheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
About scaling your queue and celery workers: if you have too many celery workers calling luigi tasks of course it will require you to scale your luigi scheduler/daemon so it can handle the number of API requests (every time you call a task to be excecuted, you hit the luigi scheduler API, every N seconds -it dependes on your config- your tasks will hit the scheduler API to say "I'm alive", every time a task finished with -error or success- you hit the scheduler API, and so on).
So yes, take a close look at your scheduler to see if it's receiving too many http requests or if its database is being a bottle neck (luigi uses by default an sqlite but you can easily change it to mysql o postgres).
UPDATE:
Since version 2.7.0, luigi.scheduler.CentralPlannerScheduler has been renamed to luigi.scheduler.Scheduler as you may see here so the above code should now be:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.Scheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()

How to inspect and cancel Celery tasks by task name

I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication

Categories

Resources