Celery subqueues (implementing sequential message delivery) - python

I'm trying to implement a chatbot system, and I need to have my celery tasks processed sequentially per user. That means each user needs to have their messages sent as FIFO, but the users need to be processed randomly or in round-robin.
I've been reading about task chains, groups and trees, but all of these celery features seem to require providing all tasks at once, whereas I need to add tasks dynamically.
My reasoning was to have a dedicated queue per user, and enable concurrency on the queues. That way I can assure the delivery order and avoid one user blocking the rest of the chats.
Is there a way I can route tasks in Celery so I get the desired behavior? Ideally, I'd set up a worker to process the messages queue, and then route tasks to messages.contact.<contact_id> or the like.
There's no explicit mention to this behavior in the docs. Is it possible? Thanks!

Related

Delete messages in a queue on a RabbitMQ (AMQP) server

I have 1 big task which consists out of 200 sub-tasks (messages) which will be published onto a queue. If I want to cancel this 1 task, the 200 messages (or the ones that are left and not processed yet) should be deleted. Is there any way to delete these published messages in a queue?
One solution I could think of is to create a queue (Q) which where I publish the name of a new queue (X). Each consumer connects then to this new dynamically created queue (X) and process the 200 published messages. If I want to abort the entire task I delete only that queue (X) from the publisher side. Is that a common approach?
I see few issues with your suggested approach.
The first problem is due to RMQ consumer prefetch which is intended to improve performance by reducing the amount of requests to the broker. If your consumers have retrieved a batch of tasks they will process them all before they ask for new ones, only then they will realize the queue was cancelled. Therefore, your cancellation request would not be handled properly most of the times. You could reduce the prefetch count to 1 to avoid this side effect but this would increase the pressure over the network and reduce overall speed.
The second issue is that the AMQP protocol does not provide mechanisms for gracefully dealing with queue deletion. Therefore your consumers would need to carefully deal with queues disappearing as they would otherwise crash. By doing so, you would loose visibility over bugs and issues. How can you distinguish when a queue was explicitly deleted from a case where it actually crashed?
What I would recommend in this case is marking all your tasks with an identifier of their parent job. Each time a consumer starts consuming a new task, it would check if the parent job is valid or has been cancelled. In the latter case, it would simply ignore the task and move to the next one. You need a supporting service for that. A Redis instance should be more then enough for example.
This mechanism would be way simpler and robust. You can spin as many consumers as you want without the need of orchestrating their connection to the right queue. Also out-of-order or interleaved tasks would not be a problem.

Celery: Perform tasks one-at-a-time

Is there a way to get all tasks being added to Celery to perform one-after-the-next?
I have a bunch of celery tasks and they can happen at any time (they're triggered by users) and I would like for them to not all run at the same time as to lighten the load on my server.
The simplest way to achieve this is to have a dedicated worker with concurrency set to 1, subscribed to a "special" queue. Then you send your tasks that you want to run sequentially to this queue. celery multi (creates multiple workers on the same node) is especially useful for such use-cases.

Celery python synchronous tasks

In my application, I have python celery tasks that connect to a rest API.. simple.
The problem I have is that the API does not allow multiple resuests with the same credentials.
Is there a way to have these api tasks blocking in the queue? Meaning, If multiple requests are made around the same time, can I have the tasks sit in the queue and execute one by one, waiting for the first in the queue to finish?
Currently, in the rabbitmq message queue (with one worker), i see the tasks go through (spawned) and not wait.
I looked over documentation but could not find a simple solution.
Thanks.
With one worker it's impossible for celery to do more than one task at a time. what you may be seeing is called prefetching which allows the worker to reserve tasks.
http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits
The default prefetch value is 4, turn it down to one and see if that fixes it.

Celery - Can a message in RabbitMQ be consumed by two or more workers at the same time?

Perhaps I'm being silly asking the question but I need to wrap my head around the basic concepts before I do further work.
I am processing a few thousand RSS feeds, using multiple Celery worker nodes and a RabbitMQ node as the broker. The URL of each feed is being written as a message in the queue. A worker just reads the URL from the queue and starts processing it. I have to ensure that a single RSS feed does not get processed by two workers at the same time.
The article Ensuring a task is only executed one at a time suggests a Memcahced-based solution for locking the feed when it's being processed.
But what I'm trying to understand is that why do I need to use Memcached (or something else) to ensure that a message on a RabbitMQ queue not be consumed by multiple workers at the same time. Is there some configuration change in RabbitMQ (or Celery) that I can do to achieve this goal?
A single MQ message will certainly not be seen by multiple consumers in a normal working setup. You'll have to do some work for the cases involving failing/crashing workers, read up on auto-acks and message rejections, but the basic case is sound.
I don't see a synchronized queue (read: MQ) in the article you've linked, so (as far as I can tell) they're using the lock mechanism (read: memcache) to synchronize, as an alternative. And I can think of a few problems which wouldn't be there in a proper MQ setup.
As noted by others you are mixing apples and oranges.
Being a celery task and a MQ message.
You can ensure that a message will be processed by only one worker at the same time.
eg.
#task(...)
def my_task(
my_task.apply(1)
the .apply publishes a message to the message broker you are using (rabbit, redis...).
Then the message will get routed to a queue and consumed by one worker at time. you dont need locking for this, you have it for free :)
The example on the celery cookbook shows how to prevent two messages like that (my_task.apply(1)) from running at the same time, this is something you need to ensure within the task itself.
You need something which you can access from all workers of course (memcached, redis ...) as they might be running on different machines.
Mentioned example typically used for other goal: it prevents you from working with different messages with the same meaning (not the same message). Eg, I have two processes: first one puts to queue some URLs, and second one - takes URL from queue and fetch them. What will be if first process puts to queue one URL twice (or even more times)?
P.S. I use for this purpose Redis storage and setnx operation (which can set key only once).

Celery design help: how to prevent concurrently executing tasks

I'm fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.
I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. Each of these "per-user" tasks read/write data for one particular user in the system. So at any given time, I might need to create tasks User1_TaskA, User1_TaskB, User1_TaskC, User2_TaskA, User2_TaskB, etc. I need to ensure that, for each user, no two tasks of any task type execute concurrently. I want a system in which no worker can execute User1_TaskA at the same time as any other worker is executing User1_TaskB or User1_TaskC, but while User1_TaskA is executing, other workers shouldn't be blocked from concurrently executing User2_TaskA, User3_TaskA, etc.
I realize this could be implemented using some sort of external locking mechanism (e.g., in the DB), but I'm hoping there's a more elegant task/queue/worker design that would work.
I suppose one possible solution is to implement queues as user buckets such that, when the workers are launched there's config that specifies how many buckets to create, and each "bucket worker" is bound to exactly one bucket. Then an "intermediate worker" would pull off tasks from the main task queue and assign them into the bucketed queues via, say, a hash/mod scheme. So UserA's tasks would always end up in the same queue, and multiple tasks for UserA would back up behind each other. I don't love this approach, as it would require the number of buckets to be defined ahead of time, and would seem to prevent (easily) adding workers dynamically. Seems to me there's got to be a better way -- suggestions would be greatly appreciated.
What's so bad in using an external locking mechanism? It's simple, straightforward, and efficient enough. You can find an example of distributed task locking in Celery here. Extend it by creating a lock per user, and you're done!

Categories

Resources