Add dynamic task to celery - python

I am trying to find a way to add a new task to celery after celery worker has been executed and after celery has been instantiated. I basically want to add a new task with a dynamic name based on user input. I will also want to set a rate limit for that new task.
I have not been able to find any documentation on this and no examples on my google searches. All I have been able to find is dynamically adding periodic tasks with celery beat.
Is there any way to do what I am looking to do?

What you want to achieve is not trivial. I am not aware of any distributed system similar to Celery that allows such thing.
The only way perhaps to do it is to dynamically create and run a new Celery worker with the new task added and configured the way you prefer...

You can register a task as:
def dynamic_task():
return "Hi"
dynamic_task = app.task(dynamic_task, name='my_name')
See that you can register a list of functions in this way, then you will need to restart the worker, however you can use the signals https://docs.celeryproject.org/en/stable/userguide/signals.html as well.

Related

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Appropriate method to define a pool of workers in Python

I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.
Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.
I've been thinking in using RabbitMQ and Celery for this but I have some questions:
General question: Is really a good approach to solve this problem?
I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?
Yeah, it looks like a good approach to me.
There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.
You can call a function once a task is successfully completed.
from celery.signals import task_success
#task_success(sender='task_i_am_waiting_to_complete')
def call_me_when_my_task_is_done():
pass
Yes, you can have remote workes on different networks.

Is there a possible way to create dynamic periodic tasks in Celery?

I would like to add and remove periodic tasks from the celery scheduler, however I do not find an easy way to do that. Is it possible?
I found the next question, but I wonder if there is an alternative which does not use Django.
How to dynamically add / remove periodic tasks to Celery (celerybeat)
For removing the tasks from celery scheduler you can use like
from celery.task.control import discard_all
discard_all()
And you can see a detailed explanation here on how to append tasks to a scheduler.
Yes, it is possible. There is a package django-celery-beat that provides PeriodicTask database model. It can be used to dynamically add/remove/update/pause periodic tasks.
Some time ago, I needed dynamic periodic tasks, and I wrote an example project, which is available on GitHub, and wrote an article about dynamic periodic tasks in Celery and Django.

Execute code after some time in django

I have a code which deletes an api column when executed. Now I want it to execute after some time lets say two weeks. Any idea or directions how do I implement it?
My code:
authtoken = models.UserApiToken.objects.get(api_token=token)
authtoken.delete()
This is inside a function and executed when a request is made.
There are two main ways to get this done:
Make it a custom management command, and trigger it through crontab.
Use celery, make it a celery task, and use celerybeat to trigger the job after 2 weeks.
I would recommend celery, as it provides a better control of your task queues and jobs.

Register a celery PeriodicTask after it's created and at runtime

My application creates PeriodicTask objects according to user-defined schedules. That is, the schedule for the PeriodicTask can change at any time. The past couple days have been spent in frustration trying to figure out how to get Celery to support this. Ultimately, the issue is that, for something to run as a PeriodicTask it first, has to be created and then second, has to be registered (I have no idea why this is required).
So, for dynamic tasks to work, I need
to register all the tasks when the celery server starts
to register a task when it is newly created.
#1 should be solved easily enough by running a startup script (i.e., something that gets run after ./manage.py celerybeat gets called). Unfortunately, I don't think there's a convenient place to put this. If there were, the script would go something like this:
from djcelery.models import PeriodicTask
from celery.registry import tasks
for task in PeriodicTask.objects.filter(name__startswith = 'scheduler.'):
tasks.register(task)
I'm filtering for 'scheduler.' because the names of all my dynamic tasks begin that way.
#2 I have no idea. The issue so far as I see it is that celery.registry.tasks is kept in memory and there's no way, barring some coding magic, to access the celerybeat's tasks registry once it started running.
Thanks in advance for your help.

Categories

Resources