I have a Django application that uses Celery with Redis broker for asynchronous task execution. Currently, the app has 3 queues (& 3 workers) that connect to a single Redis instance for communication. Here, the first two workers are prefork-based workers and the third one is a gevent-based worker.
The Celery setting variables regarding the broker and backend look like this:
CELERY_BROKER_URL="redis://localhost:6379/0"
CELERY_RESULT_BACKEND="redis://localhost:6379/1"
Since Celery uses rpush-blpop to implement the FIFO queue, I was wondering if it'd be correct or even possible to use different Redis databases for different queues like — q1 uses database .../1 and q2 uses database .../2 for messaging? This way each worker will only listen to the dedicated database for that and pick up the task from the queue with less competition.
Does this even make any sense?
If so, how do you implement something like this in Celery?
First, if you are worried about the load, please specify your expected numbers/rates.
In my opinion, you shouldn't be concerned about the Redis capability to handle your load.
Redis has its own scale-out / scale-in capabilities whenever you'll need them.
You can use RabbitMQ as your broker (using rabbitMQ docker is dead-simple as well, you can see example) which again, has its own scale-out capabilities to support a high load, so I don't think you should be worried about this point.
As far as I know, there's no way to use different DBs for Redis broker. You can create different Celery applications with different DBs but then you cannot set dependencies between tasks (canvas: group, chain, etc). I wouldn't recommend such an option.
Related
Hi dear ladies and guys,
so i've been struggling today to find out how to make flower use the redis backend to get the historical tasks. I've read that Flower has the --persistent flag but this creates its own file.
Why does it need this file? Why doesn't it just pull the records from redis?
I don't get it. ( I have RabbitMQ as broker and Redis as backend configured in the Celery() constructor)
The short answer is that flower won't know which task results to look for in the backend. Since redis databases can be shared with other processes, flower can't guarantee that a key that looks a certain way will contain a celery result that it "should" be monitoring. The persistent flag lets flower keep track of the task results it "should" be monitoring by saving a copy of any tasks that it sees going through the broker queue and, thus, keep track of relevant results.
I'm building a web application (Using Python/Django) that is hosted on two machines connected to a load balancer.
I have a central storage server, and I have a central Redis server, single celery beat, and two celery workers on each hosting machine.
I receive files from an API endpoint (on any of the hosting machines) and then schedule a task to copy to the storage server.
The problem is that the task is scheduled using:
task.delay(args)
and then any worker can receive it, while the received files exist only on one of the 2 machines, and have to be copied from it.
I tried finding if there's a unique id for the worker that I can assign the task to but didn't find any help in the docs.
Any solution to this ? Given that the number of hosting machines can scale to more than 2.
The best solution is to put the task onto a named queue and have each worker look for jobs from their specific queue. So if you have Machine A and Machine B you could have Queue A, Queue B and Queue Shared. Machine A would watch for jobs on Queue A and Queue Shared while Machine B looked for jobs on Queue B and Queue Shared.
The best way to do this is to have a dedicated queue for each worker.
When I was learning Celery I did exactly this, and after few years completely abandoned this approach as it creates more problems than it actually solves.
Instead, I would recommend the following: any resource that you may need to share among tasks should be on a shared filesystem (NFS), or in some sort of in-memory caching servise like Redis, KeyDb or memcached. We use a combination of S3 and Redis, depending on the type of resource.
Sure, if you do not really care about scalability the queue-per-worker approach will work fine.
There's a lot of info out there and honestly it's a bit too much to digest and I'm a bit lost.
My web app has to do so some very resource intensive tasks. Standard setup right now app on server static / media on another for hosting. What I would like to do is setup celery so I can call task.delay for these resource intensive tasks.
I'd like to dedicate the resources of entire separate servers to these resource intensive tasks.
Here's the question: How do I setup celery in this way so that from my main server (where the app is hosted) the calls for .delay are sent from the apps to these servers?
Note: These functions will be kicking data back to the database / affecting models so data integrity is important here. So, how does the data (assuming the above is possible...) retrieved get sent back to the database from the seperate servers while preserving integrity?
Is this possible and if so wth do I begin - information overload?
If not what should I be doing / what am I doing wrong?
The whole point of Celery is to work in exactly this way, ie as a distributed task server. You can spin up workers on as many machines as you like, and the broker - ie rabbitmq - will distribute them as necessary.
I'm not sure what you're asking about data integrity, though. Data doesn't get "sent back" to the database; the workers connect directly to the database in exactly the same way as the rest of your Django code.
Can someone explain a little bit a statement that redis is single threaded.
What I want to do...
I'm writing a flask web site. There should be a lot of backgrund work, so I thought to separate it in multiple threads. I read that it's best to use celery and I would like to use redis as broker. Redis, because I would like to use redis for some key value storing.
So my question is.
Can multiple threads connect to redis db ( in a thread safe way) in the same time to retrieve and store data.
Also, can redis be used for site caching?
Multiple threads can connect to Redis in a thread-safe way (assuming that the Redis client is thread-safe and that the code itself is as well).
Because Redis is (mostly) single-threaded, every request to it blocks all others while it is executed. However, because Redis is so fast - requests are usually returned in under a millisecond - it can still serve a considerable number of concurrent requests, so having multiple connections to it isn't an issue.
As for whether it can be used for caching a website, that's definitely so (just Google it ;)).
We have an application that uses a Celery instance in two ways: The instance's .task attribute is used as our task decorator, and when we invoke celery workers, we pass the instance as the -A (--app) argument. This workflow uses the same Celery instance for both producing and consuming, and it has worked, but we are using the same Celery instance for both producers (the tasks) and consumers (the celery workers).
Now, we are considering using Bigwig RabbitMQ, which is an AMQP service provider, and they publish two different URLs, one optimized for message producers, the other optimized for message consumers.
What's the best way for us to modify our setup in order to take advantage of the separate broker endpoints? I'm assuming a single Celery instance can only use a single broker URL (via the BROKER_URL setting). Should we use two distinct Celery instances configured identically except for the BROKER_URL setting?
This feature will be available in Celery 4.0: http://docs.celeryproject.org/en/master/whatsnew-4.0.html#configure-broker-url-for-read-write-separately
Yes you are right one celery instance can use only one broker URL. As you said the only way is to use 2 workers with just different BROKER_URL one for consuming and one for producing.
Technically is trivial, you can take advantage of this (http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object) but off course you will have two workers running but I don't think that this introduces any problem.
There is also another option explained here , but I would avoid it.