I have a post_save hook that triggers a task to run in celery. The task also updates the model, which causes the post_save hook to run. The catch is I do not want to .delay() the call in this instance, I just want to run it synchronously because it's already being run in a worker.
Is there an environmental variable or something else I can use to detect when the code is being run in celery?
To clarify: I'm aware that Celery tasks can still be called as normal functions, that's exactly what I'm trying to take advantage of. I want to do something like this:
if os.environ['is_celery']:
my_task(1, 2, 3)
else:
my_task.delay(1, 2, 3)
Usually you'd have common.py, production.py, test.py and local.py/dev.py. You could just add a celery_settings.py with the following content:
from production import *
IS_CELERY = True
Then in your celery.py (I'm assuming you have one) you'll do
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.celery_settings")
Then in your script you can now do:
if getattr(settings, 'IS_CELERY', None):
my_task(1, 2, 3)
else:
my_task.delay(1, 2, 3)
Related
I have some celery task. I want to test it via unittest.
I'm doing something very similar to:
class TestMe(unittest.TestCase):
def test_celery_task(self):
self.assertRaises(ValueError, celery_task.apply, args)
what is strange for me:
this assert fails, because ValueError not raised, but during executing process I can see ValueError as a result of this celery task.
I'm not sure, but it looks like assert is checking faster than ValueError is rising.
Is it possible to check the result of executed celery task?
or how it may be tested?
That can't possibly work. When you enqueue a Celery task, all that happens is that you put a message into the queue for a separate process to pick up; it is that process that runs the task and, potentially, raises the exception.
If you want to check that the task itself raises ValueError, then you should call the task, not the delay function:
self.assertRaises(ValueError, celery_task, args)
I see 3 options here.
1) Try to call get() on apply(). Here is what you will get:
class TestMe(unittest.TestCase):
def test_celery_task(self):
self.assertRaises(ValueError, celery_task.apply().get(), args)
2) You can either enable eager mode by setting 'task_always_eager' to True, however it does not guarantee that your code will be able to catch up.
3) A better option would be to mock the celery tasks. From the point of unit testing it is not actually correct to test a unit of code with actual 'alive' part of systems like celery.
Here is a sample of code taken from celery testing documentation.
from pytest import raises
from celery.exceptions import Retry
# for python 2: use mock.patch from `pip install mock`.
from unittest.mock import patch
from proj.models import Product
from proj.tasks import send_order
class test_send_order:
#patch('proj.tasks.Product.order') # < patching Product in module above
def test_success(self, product_order):
product = Product.objects.create(
name='Foo',
)
send_order(product.pk, 3, Decimal(30.3))
product_order.assert_called_with(3, Decimal(30.3))
#patch('proj.tasks.Product.order')
#patch('proj.tasks.send_order.retry')
def test_failure(send_order_retry, product_order):
product = Product.objects.create(
name='Foo',
)
# set a side effect on the patched method
# so that it raises the error we want.
product_order.side_effect = OperationalError()
with raises(Retry):
send_order(product.pk, 3, Decimal(30.6))
Is there a way to determine, programatically, that the current module being imported/run is done so in the context of a celery worker?
We've settled on setting an environment variable before running the Celery worker, and checking this environment variable in the code, but I wonder if there's a better way?
Simple,
import sys
IN_CELERY_WORKER_PROCESS = sys.argv and sys.argv[0].endswith('celery')\
and 'worker' in sys.argv
if IN_CELERY_WORKER_PROCESS:
print ('Im in Celery worker')
http://percentl.com/blog/django-how-can-i-detect-whether-im-running-celery-worker/
As of celery 4.2 you can also do this by setting a flag on the worker_ready signal
in celery.py:
from celery.signals import worker_ready
app = Celery(...)
app.running = False
#worker_ready.connect
def set_running(*args, **kwargs):
app.running = True
Now you can check within your task by using the global app instance
to see whether or not you are running. This can be very useful to determine which logger to use.
Depending on what your use-case scenario is exactly, you may be able to detect it by checking whether the request id is set:
#app.task(bind=True)
def foo(self):
print self.request.id
If you invoke the above as foo.delay() then the task will be sent to a worker and self.request.id will be set to a unique number. If you invoke it as foo(), then it will be executed in your current process and self.request.id will be None.
You can use the current_worker_task property from the Celery application instance class. Docs here.
With the following task defined:
# whatever_app/tasks.py
celery_app = Celery(app)
#celery_app.task
def test_task():
if celery_app.current_worker_task:
return 'running in a celery worker'
return 'just running'
You can run the following on a python shell:
In [1]: from whatever_app.tasks import test_task
In [2]: test_task()
Out[2]: 'just running'
In [3]: r = test_task.delay()
In [4]: r.result
Out[4]: u'running in a celery worker'
Note: Obviously for test_task.delay() to succeed, you need to have at least one celery worker running and configured to load tasks from whatever_app.tasks.
Adding a environment variable is a good way to check if the module is being run by celery worker. In the task submitter process we may set the environment variable, to mark that it is not running in the context of a celery worker.
But the better way may be to use some celery signals which may help to know if the module is running in worker or task submitter. For example, worker-process-init signal is sent to each child task executor process (in preforked mode) and the handler can be used to set some global variable indicating it is a worker process.
It is a good practice to start workers with names, so that it becomes easier to manage(stop/kill/restart) them. You can use -n to name a worker.
celery worker -l info -A test -n foo
Now, in your script you can use app.control.inspect to see if that worker is running.
In [22]: import test
In [23]: i = test.app.control.inspect(['foo'])
In [24]: i.app.control.ping()
Out[24]: [{'celery#foo': {'ok': 'pong'}}]
You can read more about this in celery worker docs
I am writing a small test task for django-celery, in which I would like to set a custom state (and some data, but let's start with a custom state first).
I use django as the messaging backend. My version of python is 2.6.
Here's the content of tasks.py
import time
from djcelery import celery
#celery.task
def generate():
generate.update_state(state="PROGRESS")
time.sleep(10)
return True
And here's what happens when I give it a try:
>>> import tasks
>>> result = tasks.generate.delay()
>>> result
<AsyncResult: f72574aa-f8c5-49dc-89d4-47d2012a4d6d>
# status and state are the same, but just to make sure
>>> result.status
u'PENDING'
>>> result.state
u'PENDING'
>>> result.result
# empty, as in None
# wait a few seconds
>>> result.status
u'SUCCESS'
>>> result.state
u'SUCCESS'
>>> result.result
True
I can't figure out why the state is PENDING while it should be PROGRESS. Any idea?
I've already looked at the documentation, and here's the relevant link: http://docs.celeryproject.org/en/latest/userguide/tasks.html#custom-states
I do the exact same thing (minus the meta, but I also tried without success), so it should work.
UPDATE: I found out why, looks like you have to restart the celery daemon whenever you update your tasks so that the changes are taken into account.
I found out why, looks like you have to restart the celery daemon whenever you update your tasks so that the changes are taken into account.
I have been reading the doc and searching but cannot seem to find a straight answer:
Can you cancel an already executing task? (as in the task has started, takes a while, and half way through it needs to be cancelled)
I found this from the doc at Celery FAQ
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
But I am unclear if this will cancel queued tasks or if it will kill a running process on a worker. Thanks for any light you can shed!
revoke cancels the task execution. If a task is revoked, the workers ignore the task and do not execute it. If you don't use persistent revokes your task can be executed after worker's restart.
https://docs.celeryq.dev/en/stable/userguide/workers.html#worker-persistent-revokes
revoke has an terminate option which is False by default. If you need to kill the executing task you need to set terminate to True.
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
https://docs.celeryq.dev/en/stable/userguide/workers.html#revoke-revoking-tasks
In Celery 3.1, the API of revoking tasks is changed.
According to the Celery FAQ, you should use result.revoke:
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
or if you only have the task id:
>>> from proj.celery import app
>>> app.control.revoke(task_id)
#0x00mh's answer is correct, however recent celery docs say that using the terminate option is "a last resort for administrators" because you may accidentally terminate another task which started executing in the meantime. Possibly a better solution is combining terminate=True with signal='SIGUSR1' (which causes the SoftTimeLimitExceeded exception to be raised in the task).
Per the 5.2.3 documentation, the following command can be run:
celery.control.revoke(task_id, terminate=True, signal='SIGKILL')
where
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
Link to the doc: https://docs.celeryq.dev/en/stable/reference/celery.app.control.html?highlight=revoke#celery.app.control.Control.revoke
In addition, unsatisfactory, there is another way(abort task) to stop the task, but there are many unreliability, more details, see:
http://docs.celeryproject.org/en/latest/reference/celery.contrib.abortable.html
You define celery app with broker and backend something like :
from celery import Celery
celeryapp = Celery('app', broker=redis_uri, backend=redis_uri)
When you run send task it return unique id for task:
task_id = celeryapp.send_task('run.send_email', queue = "demo")
To revoke task you need celery app and task id:
celeryapp.control.revoke(task_id, terminate=True)
from celery.app import default_app
revoked = default_app.control.revoke(task_id, terminated=True, signal='SIGKILL')
print(revoked)
See the following options for tasks: time_limit, soft_time_limit (or you can set it for workers). If you want to control not only time of execution, then see expires argument of apply_async method.
#tasks.py
from celery.decorators import task
#task()
def add(x, y):
add.delay(1, 9)
return x + y
>>> import tasks
>>> res = tasks.add.delay(5, 2)
>>> res.result()
7
If I run this code, I expect tasks to be continously added to the queue. But it's not! Only the first task (5,2) gets added to the queue and processed.
There should continuously be tasks being added, due to this line: "add.delay(1,9)"
Note: I need each task to execute another task.
As far as I can see, a periodic_task decorator is creating preiodic tasks, task creates just one task. And delay just executes it asynchronically.
You should just use periodic_task, instead of recursion.
add inside function body refers to original function, not its decorated version.
If you just need to run task repeatedly, use #periodic_task instead. You only need recursion if delay is different each time. In this case, subclass Task instead of using decorator and you'll be able to use recursion without a problem.
You should look at subtasks and callbacks, might give you the answer you are looking for
http://celeryproject.org/docs/userguide/tasksets.html