django celery: update_state not doing anything - python

I am writing a small test task for django-celery, in which I would like to set a custom state (and some data, but let's start with a custom state first).
I use django as the messaging backend. My version of python is 2.6.
Here's the content of tasks.py
import time
from djcelery import celery
#celery.task
def generate():
generate.update_state(state="PROGRESS")
time.sleep(10)
return True
And here's what happens when I give it a try:
>>> import tasks
>>> result = tasks.generate.delay()
>>> result
<AsyncResult: f72574aa-f8c5-49dc-89d4-47d2012a4d6d>
# status and state are the same, but just to make sure
>>> result.status
u'PENDING'
>>> result.state
u'PENDING'
>>> result.result
# empty, as in None
# wait a few seconds
>>> result.status
u'SUCCESS'
>>> result.state
u'SUCCESS'
>>> result.result
True
I can't figure out why the state is PENDING while it should be PROGRESS. Any idea?
I've already looked at the documentation, and here's the relevant link: http://docs.celeryproject.org/en/latest/userguide/tasks.html#custom-states
I do the exact same thing (minus the meta, but I also tried without success), so it should work.
UPDATE: I found out why, looks like you have to restart the celery daemon whenever you update your tasks so that the changes are taken into account.

I found out why, looks like you have to restart the celery daemon whenever you update your tasks so that the changes are taken into account.

Related

Detect if Django function is running in a celery worker

I have a post_save hook that triggers a task to run in celery. The task also updates the model, which causes the post_save hook to run. The catch is I do not want to .delay() the call in this instance, I just want to run it synchronously because it's already being run in a worker.
Is there an environmental variable or something else I can use to detect when the code is being run in celery?
To clarify: I'm aware that Celery tasks can still be called as normal functions, that's exactly what I'm trying to take advantage of. I want to do something like this:
if os.environ['is_celery']:
my_task(1, 2, 3)
else:
my_task.delay(1, 2, 3)
Usually you'd have common.py, production.py, test.py and local.py/dev.py. You could just add a celery_settings.py with the following content:
from production import *
IS_CELERY = True
Then in your celery.py (I'm assuming you have one) you'll do
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.celery_settings")
Then in your script you can now do:
if getattr(settings, 'IS_CELERY', None):
my_task(1, 2, 3)
else:
my_task.delay(1, 2, 3)

How can I detect whether I'm running in a Celery worker?

Is there a way to determine, programatically, that the current module being imported/run is done so in the context of a celery worker?
We've settled on setting an environment variable before running the Celery worker, and checking this environment variable in the code, but I wonder if there's a better way?
Simple,
import sys
IN_CELERY_WORKER_PROCESS = sys.argv and sys.argv[0].endswith('celery')\
and 'worker' in sys.argv
if IN_CELERY_WORKER_PROCESS:
print ('Im in Celery worker')
http://percentl.com/blog/django-how-can-i-detect-whether-im-running-celery-worker/
As of celery 4.2 you can also do this by setting a flag on the worker_ready signal
in celery.py:
from celery.signals import worker_ready
app = Celery(...)
app.running = False
#worker_ready.connect
def set_running(*args, **kwargs):
app.running = True
Now you can check within your task by using the global app instance
to see whether or not you are running. This can be very useful to determine which logger to use.
Depending on what your use-case scenario is exactly, you may be able to detect it by checking whether the request id is set:
#app.task(bind=True)
def foo(self):
print self.request.id
If you invoke the above as foo.delay() then the task will be sent to a worker and self.request.id will be set to a unique number. If you invoke it as foo(), then it will be executed in your current process and self.request.id will be None.
You can use the current_worker_task property from the Celery application instance class. Docs here.
With the following task defined:
# whatever_app/tasks.py
celery_app = Celery(app)
#celery_app.task
def test_task():
if celery_app.current_worker_task:
return 'running in a celery worker'
return 'just running'
You can run the following on a python shell:
In [1]: from whatever_app.tasks import test_task
In [2]: test_task()
Out[2]: 'just running'
In [3]: r = test_task.delay()
In [4]: r.result
Out[4]: u'running in a celery worker'
Note: Obviously for test_task.delay() to succeed, you need to have at least one celery worker running and configured to load tasks from whatever_app.tasks.
Adding a environment variable is a good way to check if the module is being run by celery worker. In the task submitter process we may set the environment variable, to mark that it is not running in the context of a celery worker.
But the better way may be to use some celery signals which may help to know if the module is running in worker or task submitter. For example, worker-process-init signal is sent to each child task executor process (in preforked mode) and the handler can be used to set some global variable indicating it is a worker process.
It is a good practice to start workers with names, so that it becomes easier to manage(stop/kill/restart) them. You can use -n to name a worker.
celery worker -l info -A test -n foo
Now, in your script you can use app.control.inspect to see if that worker is running.
In [22]: import test
In [23]: i = test.app.control.inspect(['foo'])
In [24]: i.app.control.ping()
Out[24]: [{'celery#foo': {'ok': 'pong'}}]
You can read more about this in celery worker docs

Controlling a local redis server from python code

I'm looking for a way to check if a redis instance (on a local machine with default port) is running or not. If not, I want to start it from my python code.
If you start a redis client you can first try to ping -- if you get a redis.exceptions.ConnectionError then the service is probably not running. Below is an example of such a function. There are other ways to get a similar or more robust result -- this one is just an easy approach. Also note that this doesn't tell you if a particular key is setup or anything about the redis setup. It only tells you if there is a live redis server on localhost or not.
def redisLocalhostLive():
redtest = redis.StrictRedis() # non-default ports could go here
try:
return redtest.ping()
except ConnectionError:
return False
gah, Pyrce beat me to a similar answer. posting anyway:
import redis
server = redis.Redis()
try:
server.ping()
except redis.exceptions.ConnectionError:
# your redis start command here
One approach is to use psutil, which is a Python module that provides a cross-platform way to retrieve info on running processes.
>>> import psutil
>>> processes = psutil.process_iter() # Get all running processes
>>> if any(process.name == 'redis-server' for process in processes):
... print "redis is running"
...
redis is running

Cancel an already executing task with Celery?

I have been reading the doc and searching but cannot seem to find a straight answer:
Can you cancel an already executing task? (as in the task has started, takes a while, and half way through it needs to be cancelled)
I found this from the doc at Celery FAQ
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
But I am unclear if this will cancel queued tasks or if it will kill a running process on a worker. Thanks for any light you can shed!
revoke cancels the task execution. If a task is revoked, the workers ignore the task and do not execute it. If you don't use persistent revokes your task can be executed after worker's restart.
https://docs.celeryq.dev/en/stable/userguide/workers.html#worker-persistent-revokes
revoke has an terminate option which is False by default. If you need to kill the executing task you need to set terminate to True.
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
https://docs.celeryq.dev/en/stable/userguide/workers.html#revoke-revoking-tasks
In Celery 3.1, the API of revoking tasks is changed.
According to the Celery FAQ, you should use result.revoke:
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
or if you only have the task id:
>>> from proj.celery import app
>>> app.control.revoke(task_id)
#0x00mh's answer is correct, however recent celery docs say that using the terminate option is "a last resort for administrators" because you may accidentally terminate another task which started executing in the meantime. Possibly a better solution is combining terminate=True with signal='SIGUSR1' (which causes the SoftTimeLimitExceeded exception to be raised in the task).
Per the 5.2.3 documentation, the following command can be run:
celery.control.revoke(task_id, terminate=True, signal='SIGKILL')
where
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
Link to the doc: https://docs.celeryq.dev/en/stable/reference/celery.app.control.html?highlight=revoke#celery.app.control.Control.revoke
In addition, unsatisfactory, there is another way(abort task) to stop the task, but there are many unreliability, more details, see:
http://docs.celeryproject.org/en/latest/reference/celery.contrib.abortable.html
You define celery app with broker and backend something like :
from celery import Celery
celeryapp = Celery('app', broker=redis_uri, backend=redis_uri)
When you run send task it return unique id for task:
task_id = celeryapp.send_task('run.send_email', queue = "demo")
To revoke task you need celery app and task id:
celeryapp.control.revoke(task_id, terminate=True)
from celery.app import default_app
revoked = default_app.control.revoke(task_id, terminated=True, signal='SIGKILL')
print(revoked)
See the following options for tasks: time_limit, soft_time_limit (or you can set it for workers). If you want to control not only time of execution, then see expires argument of apply_async method.

Task state not updating when using custom state

I have a task like this:
#task
def test():
time.sleep(10)
test.update_state(state="PROGRESS")
time.sleep(10)
return "done"
I then run this:
>>> from celery.execute import send_task
>>> t = send_task("testcelery.test")
>>> t.state
'PENDING'
>>> t.state
'PROGRESS'
I can see in the worker that the task has completed:
[2011-02-19 20:18:43,851: INFO/MainProcess] Task testcelery.test[7598b170-2877-4d76-89a0-9bcc4c9f877e] succeeded in 20.0225799084s: 'done'
But t.state never changes from PROGRESS to SUCCESS. What am I doing wrong?
You should upgrade to Celery 2.2.4 (released yesterday) as it fixes the bug that causes this.
See http://celeryq.org/docs/changelog.html
It looks to me like CELERY_IGNORE_RESULT set would cause this behavior. What is t.ignore_result? If it is true then either change it or change the default. If you want to always inspect the result then changing CELERY_IGNORE_RESULT makes more sense to me. But then setting it on every task would make your intentions more obvious.

Categories

Resources