Celery: update state of task when using classes - python

Hi i'm trying to update the state of a method which is executed as a task:
As described in : http://docs.celeryproject.org/en/latest/reference/celery.contrib.methods.html
from celery import Celery
celery = Celery()
class A(object):
def __init__(self):
self.a = 0
#celery.task(filter=task_method)
def add(self):
self.a += 10
for i in range(10):
self.update_state(state="PROGRESS", meta={
"current": i, "total": 10, "status": "Sleeping"
})
return {"current": 100, "total": 100, "status": "Complete."}
a = A()
a.add.delay()
Which gives an error:
AttributeError: 'A' object has no attribute 'update_state'
Which seems logical to me since A does not inherit from task, so it hasn't got the "update_task" method.
Question: How do i update the state of an task when using method based tasks ???
Update:
As described in the comments below, updating the status of a task which is
not bound is impossible, therefore the celery.contrib.methods way of defining methods as tasks is not usable in my example.

Probably you can do it like that:
from celery import Celery, current_task
celery = Celery()
class A:
#celery.task(filter=task_method)
def add(self):
# ...
current_task.update_state('PROGRESS', meta={...})
a = A()
a.add.delay()
Notice that I use current_task proxy instead of the self variable (which, in contrary to class methods, denotes current task in bound Celery tasks).
Alternatively (didn't check that but probably it should work as well), you may be able to bind class method task as well:
class A:
#celery.task(filter=task_method, bind=True)
def add(self, task):
task.update_state('PROGRESS', meta={...})
Probably you'll have to exchange self and task arguments for them to work properly, not sure about that.
BTW, it seems that celery.contrib.methods was removed in Celery 4.0.

Related

Pass complex object instance to class that subclasses process

I have a large Python 3.6 system where multiple processes and threads interact with each other and the user. Simplified, there is a Scheduler instance (subclasses threading.Thread) and a Worker instance (subclasses multiprocessing.Process). Both objects run for the entire duration of the program.
The user interacts with the Scheduler by adding Task instances and the Scheduler passes the task to the Worker at the correct moment in time. The worker uses the information contained in the task to do its thing.
Below is some stripped out and simplified code out of the project:
class Task:
def __init__(self, name:str):
self.name = name
self.state = 'idle'
class Scheduler(threading.Thread):
def __init__(self, worker:Worker):
super().init()
self.worker = worker
self.start()
def run(self):
while True:
# Do stuff until the user schedules a new task
task = Task() # <-- In reality the Task intance is not created here but the thread gets it from elsewhere
task.state = 'scheduled'
self.worker.change_task(task)
# Do stuff until the task.state == 'finished'
class Worker(multiprocessing.Process):
def __init__(self):
super().init()
self.current_task = None
self.start()
def change_task(self, new_task:Task):
self.current_task = new_task
self.current_task.state = 'accepted-idle'
def run(self):
while True:
# Do stuff until the current task is updated
self.current_task.state = 'accepted-running'
# Task is running
self.current_task.state = 'finished'
The system used to be structured so that the task contained multiple multiprocessing.Events indicating each of its possible states. Then, not the whole Task instance was passed to the worker, but each of the task's attributes was. As they were all multiprocessing safe, it worked, with a caveat. The events changed in worker.run had to be created in worker.run and back passed to the task object for it work. Not only is this a less than ideal solution, it no longer works with some changes I am making to the project.
Back to the current state of the project, as described by the python code above. As is, this will never work because nothing makes this multiprocessing safe at the moment. So I implemented a Proxy/BaseManager structure so that when a new Task is needed, the system gets it from the multiprocessing manager. I use this structure in a sightly different way elsewhere in the project as well. The issue is that the worker.run never knows that the self.current_task is updated, it remains None. I expected this to be fixed by using the proxy but clearly I am mistaken.
def Proxy(target: typing.Type) -> typing.Type:
"""
Normally a Manager only exposes only object methods. A NamespaceProxy can be used when registering the object with
the manager to expose all the attributes. This also works for attributes created at runtime.
https://stackoverflow.com/a/68123850/8353475
1. Instead of exposing all the attributes manually, we effectively override __getattr__ to do it dynamically.
2. Instead of defining a class that subclasses NamespaceProxy for each specific object class that needs to be
proxied, this method is used to do it dynamically. The target parameter should be the class of the object you want
to generate the proxy for. The generated proxy class will be returned.
Example usage: FooProxy = Proxy(Foo)
:param target: The class of the object to build the proxy class for
:return The generated proxy class
"""
# __getattr__ is called when an attribute 'bar' is called from 'foo' and it is not found eg. 'foo.bar'. 'bar' can
# be a class method as well as a variable. The call gets rerouted from the base object to this proxy, were it is
# processed.
def __getattr__(self, key):
result = self._callmethod('__getattribute__', (key,))
# If attr call was for a method we need some further processing
if isinstance(result, types.MethodType):
# A wrapper around the method that passes the arguments, actually calls the method and returns the result.
# Note that at this point wrapper() does not get called, just defined.
def wrapper(*args, **kwargs):
# Call the method and pass the return value along
return self._callmethod(key, args, kwargs)
# Return the wrapper method (not the result, but the method itself)
return wrapper
else:
# If the attr call was for a variable it can be returned as is
return result
dic = {'types': types, '__getattr__': __getattr__}
proxy_name = target.__name__ + "Proxy"
ProxyType = type(proxy_name, (NamespaceProxy,), dic)
# This is a tuple of all the attributes that are/will be exposed. We copy all of them from the base class
ProxyType._exposed_ = tuple(dir(target))
return ProxyType
class TaskManager(BaseManager):
pass
TaskProxy = Proxy(Task)
TaskManager.register('get_task', callable=Task, proxytype=TaskProxy)

How to get the celery.Task object from the #app.task

I have a task function:
#app.task(base=ProcessingTask)
def do_processing_task(args):
ProcessingTask inherits celery.Task:
class ProcessingTask(celery.Task):
def on_success(self, res, task_id, args, kwargs):
I start the task remotely with
result = app.send_task("workerTasks.do_processing_task", args=[args])
(I don't have access to the workerTasks file from the server file which calls this, so send_task is the route I need to take)
Within do_processing_task I'd like to get the instance of the ProcessingTask object so I can add some data to it that I can use in do_processing_task::on_success.
Is this possible? Thanks.
Bound tasks
A task being bound means the first argument to the task will always be the task instance (self), just like Python bound methods:
logger = get_task_logger(__name__)
#task(bind=True)
def add(self, x, y):
logger.info(self.request.id)

Celery with continuous deployment

I have a service that exposes an API which is then feeding tasks, it is implemented with Falcon (API) and Celery (task management).
Specifically, my workers take long time to load and their code looks something like this
class HeavyOp(celery.Task):
def __init__(self):
self._asset = get_heavy_asset() # <-- takes long time
#property
def asset(self):
return self._asset
#app.task(base=HeavyOp)
def my_task(data):
return my_task.asset.do_something(data)
What actually goes on is that in the __init__ function some object is being read from disk and held in memory for as long as the worker lives.
Sometimes, I want to update that object.
Is there a way to reload the worker, without downtime? As this is all behind an API, I don't wish to have those few minutes of loading the heavy object as downtime.
We can assume the host has more than 1 core, but the solution must be a single host solution.
I don't think you need a custom base task class. What you want to achieve is a single instance asset class which gets loaded after the worker has initialised and you can reload from a task.
This approach works:
# worker.py
import os
import sys
import time
from celery import Celery
from celery.signals import worker_ready
app = Celery(include=('tasks',))
class Asset:
def __init__(self):
self.time = time.time()
class AssetLoader:
__shared_state = {}
def __init__(self):
self.__dict__ = self.__shared_state
if '_value' not in self.__dict__:
self.get_heavy_asset()
def get_heavy_asset(self):
self._value = Asset()
#property
def value(self):
return self._value
#worker_ready.connect
def after_worker_ready(sender, **kwargs):
AssetLoader()
Here, I made AssetLoader a Borg class, but you can choose any other pattern/strategy to share a single instance of Asset. For illustrative purposes, I just capture the timestamp when executing get_heavy_asset.
# tasks.py
from worker import app, AssetLoader
#app.task(bind=True)
def load(self):
AssetLoader().get_heavy_asset()
return AssetLoader().value.time
#app.task(bind=True)
def my_task(self):
return AssetLoader().value.time
Bear in mind that Asset is shared per worker process but not across workers. If you run with concurrency=1, it doesn't make a difference, but for anything else it does. But from what I gather in your use case, it should be fine either way.

How to call a function when a QObject is about to be destroyed?

I'd like to do some cleanup operations inside the object just before its destruction. In this case it would be close the connection to the database.
Here is what I'm already doing:
Worker class:
from PyQt5 import QtCore
from pymongo import MongoClient, ASCENDING
from time import sleep
class StatusWidgetWorker(QtCore.QObject):
ongoing_conversions_transmit = QtCore.pyqtSignal([list])
def __init__(self, mongo_settings):
super().__init__()
print("StatusWidget Worker init")
mongo_client = MongoClient([mongo_settings["server_address"]])
self.log_database = mongo_client[mongo_settings["database"]]
self.ongoing_conversions = mongo_settings["ongoing_conversions"]
def status_retriever(self):
print("mongo bridge()")
while True:
ongoing_conversions_list = []
for doc in self.log_database[self.ongoing_conversions].find({}, {'_id': False}).sort([("start_date", ASCENDING)]):
ongoing_conversions_list.append(doc)
self.ongoing_conversions_transmit.emit(ongoing_conversions_list)
sleep(2)
And the function that call the worker from an other class :
def status_worker(self):
mongo_settings = "dict parameter"
self.worker_thread_status = QtCore.QThread()
self.worker_object_status = StatusWidgetWorker(mongo_settings)
self.worker_object_status.moveToThread(self.worker_thread_status)
self.worker_thread_status.started.connect(self.worker_object_status.status_retriever)
self.worker_object_status.ongoing_conversions_transmit.connect(self.status_table_auto_updater)
self.worker_thread_status.start()
Here is what I already tried:
Define a __del__ function in the Worker class, this function is never called.
Define a function in the Worker class and then connect it to the destroyed signal with self.destroyed.connect(self.function). This function is again never called. I think this happen because the signal is emitted when the object is already destroyed, not before its destruction.
I'm really wondering on how to this, here are some parts of answer:
http://www.riverbankcomputing.com/pipermail/pyqt/2014-November/035049.html
His approach seems a bit hacky to me (no offense to the author, there is probably no simple answer) and I have signals & parameters to pass to the worker witch would make the ThreadController class messier.
I find this solution a bit hacky because you have to set up a Controller class to do the Worker class job
If nobody has an answer, I'll probably use the ThreadController class and post the result here.
thank you for reading :-)
The usual rule in python apply:
there is a module for that
the solution is to use the atexit module and register the cleanup function in the __init__ function.
Example:
import atexit
class StatusWidgetWorker(QObject):
def __init__(self):
super().__init__()
# code here
atexit.register(self.cleanup)
def cleanup(self):
print("Doing some long cleanup")
sleep(2)
self.bla = "Done !"
print(self.bla)

Own params to PeriodicTask run() method in Celery

I am writing a small Django application and I should be able to create
for each model object its periodical task which will be executed with
a certain interval. I'm use for this a Celery application, but i can't understand one thing:
class ProcessQueryTask(PeriodicTask):
run_every = timedelta(minutes=1)
def run(self, query_task_pk, **kwargs):
logging.info('Process celery task for QueryTask %d' %
query_task_pk)
task = QueryTask.objects.get(pk=query_task_pk)
task.exec_task()
return True
Then i'm do following:
>>> from tasks.tasks import ProcessQueryTask
>>> result1 = ProcessQueryTask.delay(query_task_pk=1)
>>> result2 = ProcessQueryTask.delay(query_task_pk=2)
First call is success, but other periodical calls returning the error
- TypeError: run() takes exactly 2 non-keyword arguments (1 given) in
celeryd server.
Can I pass own params to PeriodicTask run()?
This was answered wonderfully by Ask Solem in his response to your question on the celery-users Google group.
Periodic tasks doesn't use arguments, so you need to make several
classes or make a periodic task that processes more than one "model".
E.g.:
from celery.task import PeriodicTask
from celery.decorators import periodic_task
# base class
class BaseProcessQueryTask(PeriodicTask):
abstract = True
run_every = timedelta(minutes=1)
query_task_pk = None
def run(self):
task = QueryTask.objects.get(pk=self.query_task_pk)
task.exec_task()
class ProcessQueryTask1(BaseProcessQueryTask):
query_task_pk = 1
class ProcessQueryTask2(BaseProcessQueryTask):
query_task_pk = 2
but it's more likely you want something like this:
#task(ignore_result=True)
def execute_query_task(task):
task.exec_task()
#periodic_task(run_every=timedelta(minutes=1))
def process_query_tasks():
for task in QueryTask.objects.all():
ExecuteQueryTask.delay(task)

Categories

Resources