Django & Celery - Signalling Across Multiple Methods & Files

Django & Celery - Signalling Across Multiple Methods & Files - python

Problem
I have a Django + Celery app, and am trying to get a lengthy method to run in a Celery Task.
Goal
I want to
1) Put an entire line of execution - that calls across multiple classes & files - into a Celery Task. The application should continue serving users in the meantime.
2) Signal back when the Celery Task is done so I can update the UI.
Issue
Despite scouring docs, examples, and help sites I can't get this working.
I don't know which signals to use, how to register them, how to write the methods involved to receive those signals properly, and respond back when done.
It isn't clear to me from the Celery documentation how to accomplish all of this: there are many, many signals and decorators available but little clarification on which ones are appropriate for this sequence of events.
Help Requested
Either
A point in the direction of some resources that can help (better documentation, links to examples that fit my use case, etc...),
or
Some first-hand help with which signals are appropriate to use, where to use them, and why.
This is my first post on StackOverflow, so thanks in advance for your kindness and patience.
The desired execution path:
1) views.py: Get input via GET or POST
SomeMethod: Send Signal to staticmethod in TaskManager (housed in TaskManager.py) to start a task with arguments
2) TaskManager.py: Process Signals, Keep Track of Tasks Running
SignalProcessor: receive signal, pull arguments out of kwargs, call appropriate method in tasks.py.
TaskList: Make note of the task in a TaskList so I know it's running, and where.
3) tasks.py
#shared_task
HandleTheTask(arg1, arg2):
Call the appropriate sequence of methods in other files/classes in sequence, which ultimately writes to the database many times in many methods. These other methods are not inside tasks.py.
Send a Signal to TaskManager when the method in tasks.py completes the lengthy task.
4) TaskManager.py
SignalProcessor: receive task_complete signal, remove the appropriate task from the TaskList.
What I've already tried
Putting HandleTheTask(arg1, arg2) with a #task or #shared_task
decorator into tasks.py.
I've tried calling HandleTheTask.delay(arg1, arg2) from the SignalProcessor method.
Result: Nothing happens. The app continues to execute, but the lengthy task doesn't. Celery doesn't notice the call and never executes it.
This is probably because the method I want to run is in another process?
Putting an intermediary method into tasks.py, which calls back to
TaskManager.py to run the lengthy task.
Result: As above.
Looking up Celery Signals to properly signal across processes: This
looks like the right solution but the documentation is big on info,
small on guidance.
Resources I've Consulted
I've already looked a bunch of docs and help sites. Here are just a few of the resources I've consulted:
https://docs.celeryproject.org/en/stable/userguide/signals.html
https://medium.com/analytics-vidhya/integrating-django-signals-and-celery-cb2876ebd494
How will I know if celery background process is successful inside django code. If it is successful I want render a html page
Catch django signal sent from celery task
Django Signals in celery
Any help is welcome. Thank you!

Related

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass

In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!

That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.

If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Appropriate method to define a pool of workers in Python

I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.
Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.
I've been thinking in using RabbitMQ and Celery for this but I have some questions:
General question: Is really a good approach to solve this problem?
I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?

Yeah, it looks like a good approach to me.
There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.
You can call a function once a task is successfully completed.
from celery.signals import task_success
#task_success(sender='task_i_am_waiting_to_complete')
def call_me_when_my_task_is_done():
pass
Yes, you can have remote workes on different networks.

Execute code after some time in django

I have a code which deletes an api column when executed. Now I want it to execute after some time lets say two weeks. Any idea or directions how do I implement it?
My code:
authtoken = models.UserApiToken.objects.get(api_token=token)
authtoken.delete()
This is inside a function and executed when a request is made.

There are two main ways to get this done:
Make it a custom management command, and trigger it through crontab.
Use celery, make it a celery task, and use celerybeat to trigger the job after 2 weeks.
I would recommend celery, as it provides a better control of your task queues and jobs.

trigger function after returning HttpResponse from django view

I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.
I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:
# processing.py
def spreadsheet(*args, **kwargs):
print "[robot voice] processing spreadsheet........."
views.finished_processing_spreadsheet()
# views.py
def upload_spreadsheet(request):
print "save the spreadsheet somewhere"
return HttpResponse("started processing spreadsheet")
processing.data()
def finished_processing_spreadsheet():
print "send good news to other server (with known IP)"
I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?
I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.
I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.
Thanks for your help!

In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.
Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.

The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.
You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().
You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).

Django, Signals and another process

I've got my Django project running well, and a separate background process which will collect data from various sources and store that data in an index.
I've got a model in a Django app called Sources which contains, essentially, a list of sources that data can come from! I've successfully managed to create a signal that is activated/called when a new entry is put in the Sources model.
My question is, is there a simple way that anybody knows of whereby I can send some form of signal/message to my background process indicating that the Sources model has been changed? Or should I just resort to polling for changes every x seconds, because it's so much simpler?
Many thanks for any help received.

It's unclear how are you running the background process you're talking about.
Anyway, I'd suggest that in your background task you use the Sources model directly. There are convenient ways to run the task without leaving realm of Django (so as to have an access to your models. You can use Celery [1], for example, or RQ [2].
Then you won't need to pass any messages, any changes to Sources model will take effect the next time your task is run.
[1] Celery is an open source asynchronous task queue/job queue, it isn't hard to set up and integrates with Django well.
Celery: general introduction
Django with celery introduction
[2] RQ means "Redis Queue", it is ‘a simple Python library for queueing jobs and processing them in the background with workers’.
Introductory post
GitHub repository

Polling is probably the easiest if you don't need split-second latency.
If you do, however, then you'll probably want to look into either, say,
sending an UNIX signal (or other methods of IPC, depending on platform) to the process
having the background process have a simple listening socket that you just send, say, a byte to (which is, admittedly, a form of IPC), and that triggers the action you want to trigger
or some sort of task/message queue. Celery or ZeroMQ come to mind.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.