I'm using Django database instead of RabbitMQ for concurrency reasons.
But I can't solve the problem of revoking a task before it execute.
I found some answers about this matter but they don't seem complete or I can't get enough help.
first answer
second answer
How can I extend celery task table using a model, add a boolean field (revoked) to set when I don't want the task to execute?
Thanks.
Since Celery tracks tasks by an ID, all you really need is to be able to tell which IDs have been canceled. Rather than modifying kombu internals, you can create your own table (or memcached etc) that just tracks canceled IDs, then check whether the ID for the current cancelable task is in it.
This is what the transports that support a remote revoke command do internally:
All worker nodes keeps a memory of revoked task ids, either in-memory
or persistent on disk (see Persistent revokes). (from Celery docs)
When you use the django transport, you are responsible for doing this yourself. In this case it's up to each task to check whether it has been canceled.
So the basic form of your task (logging added in place of an actual operation) becomes:
from celery import shared_task
from celery.exceptions import Ignore
from celery.utils.log import get_task_logger
from .models import task_canceled
logger = get_task_logger(__name__)
#shared_task
def my_task():
if task_canceled(my_task.request.id):
raise Ignore
logger.info("Doing my stuff")
You can extend & improve this in various ways, such as by creating a base CancelableTask class as in one of the other answers you linked to, but this is the basic form. What you're missing now is the model and the function to check it.
Note that the ID in this case will be a string ID like a5644f08-7d30-43ff-a61e-81c165ad9e19, not an integer. Your model can be as simple as this:
from django.db import models
class CanceledTask(models.Model):
task_id = models.CharField(max_length=200)
def cancel_task(request_id):
CanceledTask.objects.create(task_id=request_id)
def task_canceled(request_id):
return CanceledTask.objects.filter(task_id=request_id).exists()
You can now check the behavior by watching your celery service's debug logs while doing things like:
my_task.delay()
models.cancel_task(my_task.delay())
Related
I have a DRF application with a python queue that I'm writing tests for. Somehow,
My queue thread cannot find an object that exists in the test database.
The main thread cannot destroy the db as it's in use by 1 other session.
To explain the usecase a bit further, I use Django's user model and have a table for metadata of files which you can upload. One of this fields is a created_by, which is a ForeignKey to django.conf.settings.AUTH_USER_MODEL. As shown below, I create a user in the TestCase's setUp(), which I then use to create an entry in the Files table. The creation of this entry happens in a queue however. During testing, this results in an error DETAIL: Key (created_by_id)=(4) is not present in table "auth_user"..
When the tests are completed, and the tearDown tries to destroy the test DB, I get another error DETAIL: There is 1 other session using the database.. The two seem related, and I'm probably handling the queue incorrectly.
The tests are written with Django's TestCase and run with python manage.py test.
from django.contrib.auth.models import User
from rest_framework.test import APIClient
from django.test import TestCase
class MyTest(TestCase):
def setUp(self):
self.client = APIClient()
self.client.force_authenticate()
user = User.objects.create_user('TestUser', 'test#test.test', 'testpass')
self.client.force_authenticate(user)
def test_failing(self):
self.client.post('/totestapi', data={'files': [open('tmp.txt', 'rt')]})
The queue is defined in separate file, app/queue.py.
from app.models import FileMeta
from queue import Queue
from threading import Thread
def queue_handler():
while True:
user, files = queue.get()
for file in files:
upload(file)
FileMeta(user=user, filename=file.name).save()
queue.task_done()
queue = Queue()
thread = Thread(target=queue_handler, daemon=True)
def start_upload_thread():
thread.start()
def put_upload_thread(*args):
queue.put(args)
Finally, the queue is started from app/views.py, which is always called when Django is started, and contains all the APIs.
from rest_framework import APIView
from app.queue import start_upload_thread, put_upload_thread
start_upload_thread()
class ToTestAPI(APIView):
def post(self, request):
put_upload_thread(request.user, request.FILES.getlist('files'))
Apologies that this is not a "real" answer but it was getting longer than a comment would allow.
The new ticket looks good. I did notice that there was no stoping of the background thread, as you did. That is probably what is causing that issue with the db still being active.
You use TestCase, which runs a db transaction and undoes all database changes when the test function ends. That means you won't be able to see data from the test case in another thread using a different connection to the database. You can see it inside your tests and views, since they share a connection.
Celery and RQ are the standard job queues - Celery is more flexible, but RQ is simpler. Start with RQ and keep things simple and isolated.
Some notes:
Pass in the PK of objects not the whole object
Read up on pickle if you do need to pass larger data.
Set the queues to async=False (run like normal code) in tests.
Queue consumers are a separate process running anywhere in the system, so data needs to get to them somehow. If you use full objects those need to be pickled, or serialized, and saved in the queue itself (i.e. redis) to be retrieved and processed. Just be careful and don't pass large objects this way - use the PK, store the file somewhere in S3 or another object storage, etc.
For Django-RQ I use this snippet to set the queues to sync mode when in testing, and then just run things as normal.
if IS_TESTING:
for q in RQ_QUEUES.keys():
RQ_QUEUES[q]['ASYNC'] = False
Good luck!
I'm using signals for post-processing data. Because a lot needs to happen, and later I want to run that logic on the background so the user doesn't have to wait for this, I want to run this code in a separate class.
I want to run the code in my Post Save event
But I get the following error:
ImportError: cannot import name 'ActivityDetail' from 'ryf_app.models'
The model definitely exists in my models.py file
What am I missing here?
If you want to run a task asynchronously or in the background, you might use task queue like celery. For a broker or cache db there are options for redis, rabbitmq, amazon sqs. Celery have a good documentation with rabbitmq supporting broker. You can follow this link-here.
There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.
I have a Django 1.11 + MySQL + Celery 4.1 project where a view creates a new user record, and then kicks off a Celery task to perform additional long-running actions in relation to it.
The typical problem in this case is ensuring that the user creation is committed to the database before the Celery task executes. Otherwise, there's a race condition, and the task may try and access a record that doesn't exit if it executes before the transaction commits.
The way I had learned to fix this was to always wrap the record creation in a manual transaction or atomic block, and then trigger the Celery task after that. e.g.
def create_user():
with transaction.atomic():
user = User.objects.create(username='blah')
mytask.apply_async(args=[user.id])
#task
def mytask(user_id):
user = User.objects.get(id=user_id)
do_stuff(user)
However, I still occasionally see the error DoesNotExist: User matching query does not exist in my Celery worker logs, implying my task is sometimes executing before the user record gets committed.
Is this not the correct strategy or am I not implementing it correctly?
I believe a post_save signal would be more appropriate for what you're trying to do: https://docs.djangoproject.com/en/1.11/ref/signals/#post-save. This signal sends a created argument as a boolean, making it easy to operate only on object creation.
I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication