I have a task function:
#app.task(base=ProcessingTask)
def do_processing_task(args):
ProcessingTask inherits celery.Task:
class ProcessingTask(celery.Task):
def on_success(self, res, task_id, args, kwargs):
I start the task remotely with
result = app.send_task("workerTasks.do_processing_task", args=[args])
(I don't have access to the workerTasks file from the server file which calls this, so send_task is the route I need to take)
Within do_processing_task I'd like to get the instance of the ProcessingTask object so I can add some data to it that I can use in do_processing_task::on_success.
Is this possible? Thanks.
Bound tasks
A task being bound means the first argument to the task will always be the task instance (self), just like Python bound methods:
logger = get_task_logger(__name__)
#task(bind=True)
def add(self, x, y):
logger.info(self.request.id)
Related
I have a large Python 3.6 system where multiple processes and threads interact with each other and the user. Simplified, there is a Scheduler instance (subclasses threading.Thread) and a Worker instance (subclasses multiprocessing.Process). Both objects run for the entire duration of the program.
The user interacts with the Scheduler by adding Task instances and the Scheduler passes the task to the Worker at the correct moment in time. The worker uses the information contained in the task to do its thing.
Below is some stripped out and simplified code out of the project:
class Task:
def __init__(self, name:str):
self.name = name
self.state = 'idle'
class Scheduler(threading.Thread):
def __init__(self, worker:Worker):
super().init()
self.worker = worker
self.start()
def run(self):
while True:
# Do stuff until the user schedules a new task
task = Task() # <-- In reality the Task intance is not created here but the thread gets it from elsewhere
task.state = 'scheduled'
self.worker.change_task(task)
# Do stuff until the task.state == 'finished'
class Worker(multiprocessing.Process):
def __init__(self):
super().init()
self.current_task = None
self.start()
def change_task(self, new_task:Task):
self.current_task = new_task
self.current_task.state = 'accepted-idle'
def run(self):
while True:
# Do stuff until the current task is updated
self.current_task.state = 'accepted-running'
# Task is running
self.current_task.state = 'finished'
The system used to be structured so that the task contained multiple multiprocessing.Events indicating each of its possible states. Then, not the whole Task instance was passed to the worker, but each of the task's attributes was. As they were all multiprocessing safe, it worked, with a caveat. The events changed in worker.run had to be created in worker.run and back passed to the task object for it work. Not only is this a less than ideal solution, it no longer works with some changes I am making to the project.
Back to the current state of the project, as described by the python code above. As is, this will never work because nothing makes this multiprocessing safe at the moment. So I implemented a Proxy/BaseManager structure so that when a new Task is needed, the system gets it from the multiprocessing manager. I use this structure in a sightly different way elsewhere in the project as well. The issue is that the worker.run never knows that the self.current_task is updated, it remains None. I expected this to be fixed by using the proxy but clearly I am mistaken.
def Proxy(target: typing.Type) -> typing.Type:
"""
Normally a Manager only exposes only object methods. A NamespaceProxy can be used when registering the object with
the manager to expose all the attributes. This also works for attributes created at runtime.
https://stackoverflow.com/a/68123850/8353475
1. Instead of exposing all the attributes manually, we effectively override __getattr__ to do it dynamically.
2. Instead of defining a class that subclasses NamespaceProxy for each specific object class that needs to be
proxied, this method is used to do it dynamically. The target parameter should be the class of the object you want
to generate the proxy for. The generated proxy class will be returned.
Example usage: FooProxy = Proxy(Foo)
:param target: The class of the object to build the proxy class for
:return The generated proxy class
"""
# __getattr__ is called when an attribute 'bar' is called from 'foo' and it is not found eg. 'foo.bar'. 'bar' can
# be a class method as well as a variable. The call gets rerouted from the base object to this proxy, were it is
# processed.
def __getattr__(self, key):
result = self._callmethod('__getattribute__', (key,))
# If attr call was for a method we need some further processing
if isinstance(result, types.MethodType):
# A wrapper around the method that passes the arguments, actually calls the method and returns the result.
# Note that at this point wrapper() does not get called, just defined.
def wrapper(*args, **kwargs):
# Call the method and pass the return value along
return self._callmethod(key, args, kwargs)
# Return the wrapper method (not the result, but the method itself)
return wrapper
else:
# If the attr call was for a variable it can be returned as is
return result
dic = {'types': types, '__getattr__': __getattr__}
proxy_name = target.__name__ + "Proxy"
ProxyType = type(proxy_name, (NamespaceProxy,), dic)
# This is a tuple of all the attributes that are/will be exposed. We copy all of them from the base class
ProxyType._exposed_ = tuple(dir(target))
return ProxyType
class TaskManager(BaseManager):
pass
TaskProxy = Proxy(Task)
TaskManager.register('get_task', callable=Task, proxytype=TaskProxy)
I have a service that exposes an API which is then feeding tasks, it is implemented with Falcon (API) and Celery (task management).
Specifically, my workers take long time to load and their code looks something like this
class HeavyOp(celery.Task):
def __init__(self):
self._asset = get_heavy_asset() # <-- takes long time
#property
def asset(self):
return self._asset
#app.task(base=HeavyOp)
def my_task(data):
return my_task.asset.do_something(data)
What actually goes on is that in the __init__ function some object is being read from disk and held in memory for as long as the worker lives.
Sometimes, I want to update that object.
Is there a way to reload the worker, without downtime? As this is all behind an API, I don't wish to have those few minutes of loading the heavy object as downtime.
We can assume the host has more than 1 core, but the solution must be a single host solution.
I don't think you need a custom base task class. What you want to achieve is a single instance asset class which gets loaded after the worker has initialised and you can reload from a task.
This approach works:
# worker.py
import os
import sys
import time
from celery import Celery
from celery.signals import worker_ready
app = Celery(include=('tasks',))
class Asset:
def __init__(self):
self.time = time.time()
class AssetLoader:
__shared_state = {}
def __init__(self):
self.__dict__ = self.__shared_state
if '_value' not in self.__dict__:
self.get_heavy_asset()
def get_heavy_asset(self):
self._value = Asset()
#property
def value(self):
return self._value
#worker_ready.connect
def after_worker_ready(sender, **kwargs):
AssetLoader()
Here, I made AssetLoader a Borg class, but you can choose any other pattern/strategy to share a single instance of Asset. For illustrative purposes, I just capture the timestamp when executing get_heavy_asset.
# tasks.py
from worker import app, AssetLoader
#app.task(bind=True)
def load(self):
AssetLoader().get_heavy_asset()
return AssetLoader().value.time
#app.task(bind=True)
def my_task(self):
return AssetLoader().value.time
Bear in mind that Asset is shared per worker process but not across workers. If you run with concurrency=1, it doesn't make a difference, but for anything else it does. But from what I gather in your use case, it should be fine either way.
Hi i'm trying to update the state of a method which is executed as a task:
As described in : http://docs.celeryproject.org/en/latest/reference/celery.contrib.methods.html
from celery import Celery
celery = Celery()
class A(object):
def __init__(self):
self.a = 0
#celery.task(filter=task_method)
def add(self):
self.a += 10
for i in range(10):
self.update_state(state="PROGRESS", meta={
"current": i, "total": 10, "status": "Sleeping"
})
return {"current": 100, "total": 100, "status": "Complete."}
a = A()
a.add.delay()
Which gives an error:
AttributeError: 'A' object has no attribute 'update_state'
Which seems logical to me since A does not inherit from task, so it hasn't got the "update_task" method.
Question: How do i update the state of an task when using method based tasks ???
Update:
As described in the comments below, updating the status of a task which is
not bound is impossible, therefore the celery.contrib.methods way of defining methods as tasks is not usable in my example.
Probably you can do it like that:
from celery import Celery, current_task
celery = Celery()
class A:
#celery.task(filter=task_method)
def add(self):
# ...
current_task.update_state('PROGRESS', meta={...})
a = A()
a.add.delay()
Notice that I use current_task proxy instead of the self variable (which, in contrary to class methods, denotes current task in bound Celery tasks).
Alternatively (didn't check that but probably it should work as well), you may be able to bind class method task as well:
class A:
#celery.task(filter=task_method, bind=True)
def add(self, task):
task.update_state('PROGRESS', meta={...})
Probably you'll have to exchange self and task arguments for them to work properly, not sure about that.
BTW, it seems that celery.contrib.methods was removed in Celery 4.0.
Using python 2.7, celery 3.0.24 and mock 1.0.1. I have this:
class FancyTask(celery.Task):
#classmethod
def helper_method1(cls, name):
"""do some remote request depending on name"""
return 'foo' + name + 'bar'
def __call__(self, *args, **kwargs):
funcname = self.name.split()[-1]
bigname = self.helper_method1(funcname)
return bigname
#celery.task(base=FancyTask)
def task1(*args, **kwargs):
pass
#celery.task(base=FancyTask)
def task2(*args, **kwargs):
pass
how can I patch helper_method1 while testing either task?
I've tried something like:
import mock
from mymodule import tasks
class TestTasks(unittest.TestCase):
def test_task1(self):
task = tasks.task1
task.helper_method1 = mock.MagickMock(return_value='42')
res = task.delay('blah')
task.helper_method1.assert_called_with('blah')
and the test is failing. The original function is the one being called. And no, this question didn't help me.
(I don't have a celery instance up and running so it's difficult for me to test this)
The target function in your application code is a classmethod. The function your test code is mocking is an instance method.
Does changing the test_task1 like this help -
def test_task1(self):
FancyTask.helper_method1 = mock.MagickMock(return_value='42')
task = tasks.task1
res = task.delay('blah')
task.helper_method1.assert_called_with('blah')
You probably also need to change the assert_called_with so it is called from the class level instead of the instance level.
change
task.helper_method1.assert_called_with('blah')
to
FancyTask.helper_method1.assert_called_with('blah')
I use celery in my application to run periodic tasks. Let's see simple example below
from myqueue import Queue
#perodic_task(run_every=timedelta(minutes=1))
def process_queue():
queue = Queue()
uid, questions = queue.pop()
if uid is None:
return
job = group(do_stuff(q) for q in questions)
job.apply_async()
def do_stuff(question):
try:
...
except:
...
raise
As you can see in the example above, i use celery to run async task, but (since it's a queue) i need to do queue.fail(uid) in case of exception in do_stuff or queue.ack(uid) otherwise. In this situation it would be very clear and usefull to have some callback from my task in both cases - on_failure and on_success.
I saw some documentation, but never seen practices of using callbacks with apply_async. Is it possible to do that?
Subclass the Task class and overload the on_success and on_failure functions:
from celery import Task
class CallbackTask(Task):
def on_success(self, retval, task_id, args, kwargs):
'''
retval – The return value of the task.
task_id – Unique id of the executed task.
args – Original arguments for the executed task.
kwargs – Original keyword arguments for the executed task.
'''
pass
def on_failure(self, exc, task_id, args, kwargs, einfo):
'''
exc – The exception raised by the task.
task_id – Unique id of the failed task.
args – Original arguments for the task that failed.
kwargs – Original keyword arguments for the task that failed.
'''
pass
Use:
#celery.task(base=CallbackTask) # this does the trick
def add(x, y):
return x + y
You can specify success and error callbacks via the link and link_err kwargs when you call apply_async. The celery docs include a clear example: http://docs.celeryproject.org/en/latest/userguide/calling.html#linking-callbacks-errbacks