I'm using Python RQ (backed by Redis) to feed tasks to a bunch of worker processes.
I accidentally sent a tuple when adding a job to a queue, so now I have queues like this:
high
medium
low
('low',)
default
I can't seem to figure out how to get rid of the ('low',) queue. The queue also seems to cause some issues due to its name (for instance, I couldn't view it or clear it in rq-dashboard as the page would refuse to load).
There is some discussion here: RQ - Empty & Delete Queues, but that only covers emptying a queue. I am able to empty the queue just fine from within Python, but I can't seem to actually delete the queue from the Redis server.
The RQ documentation doesn't seem to provide any information on getting rid of a queue you don't want.
I want to actually delete that queue (not just empty it) instead of carrying it around forever.
The RQ stores all the queues under rq:queues keys. This can be accessed by the redis-cli.
smembers rq:queues
I also stumbled upon Destroying / removing a Queue() in Redis Queue (rq) programmatically This might help!
Related
I am simply setting up and using celery in python. But the celery queue seems to be getting lost, so I'm checking it.
For example, if there is a process that finishes task A and sends a queue to task B, the data in A and B should match, but it often doesn't match.
It was probably lost in the process of queue being delivered to the celery task B.
When I set the '--pool=solo' option, nothing was lost, but the processing speed seems to be slow.
Previously, no options related to 'pool' were set.
May I know what might be the reason for this?
I have 3 servers in the same network. On each of those servers a redis service and some sort of producer are running. The producer enqueues jobs to a local rq queue named tasks.
So each server has it's own tasks queue.
Also, there's one more server that is running an rq worker. Is it possible to have that worker check the tasks queue on each of the 3 servers?
I have tried creating a list of connections
import redis
from rq import Queue, Worker
from rq import push_connection
# urls = [url1, url2, url3]
connections = list(map(redis.from_url, urls))
which I then use to create a list of queues.
queues = list(map(lambda c: Queue('tasks', connection=c), connections))
Afterwards I push all the connections
for connection in connections:
push_connection(connection)
and pass the queues to Worker
Worker(queues=queues).work()
This results in the worker only listening on tasks on whatever connection was pushed last.
I've been looking into the code on rq and I think I could write a custom worker class that does this but before I do that I wanted to ask if there's another way. Maybe even another queueing framework entirely?
Okay , I solved the problem. I'm still unsure if I have permission to post actual source code here so I will outline my solution.
I had to override register_birth(self), register_death(self), and dequeue_job_and_maintain_ttl(self, timeout). The original implementation for these functions can be found here.
register_birth
Basically, you have to iterate over all connections, push_connection(connection), complete the registration process, and pop_connection().
Be careful to only list the queues corresponding to that connection in the mapping variable. The original implementation uses queue_names(self) to get a list of queue names. You'll have to do the same thing queue_names(self) does but only for the relevant queues.
register_death
Essentially the same as register_birth. Iterate over all connections, push_connection(connection), complete the same steps as the original implementation, and pop_connection().
dequeue_job_and_maintain_ttl
Let's take a look at the original implementation of this function. We'll want to keep everything the same until we get to the try block. Here we want to iterate over all connections endlessly. You can do this by using itertools.cycle.
Inside the loop push_connection(connection), and set self.connection to the current connection. If self.connection = connection is missing, the result of the job may not be properly returned.
Now we'll proceed to call self.queue_class.dequeue_any similar to the original implementation. But we'll set the timeout to 1 so we can proceed to check another connection if the current one doesn't have any jobs for the worker.
Make sure self.queue_class.dequeue_any is called with a list of queues corresponding to the current connection. In this case queues contains only the relevant queues.
result = self.queue_class.dequeue_any(
queues, 1, connection=connection, job_class=self.job_class)
Afterwards pop_connection(), and do the same check on result as the original implementation. If result is not None we've found a job to do and need to break out of the loop.
Keep everything else from the original implementation. Don't forget the break at the end of the try block. It breaks out of the while True loop.
Another thing
Queues contain a reference to their connection. You could use this to create a list of (connection, queues) where queues contains all queues with connection connection.
If you pass the resulting list to itertools.cycle you get the endless iterator you need in overriding dequeue_job_and_maintain_ttl.
I'm pretty new to multiprocessing in Python and I've done a lot of digging around, but can't seem to find exactly what I'm looking for. I have a bit of a consumer/producer problem where I have a simple server with an endpoint that consumes from a queue and a function that produces onto the queue. The queue can be full, so the producer doesn't always need to be running.
While the queue isn't full, I want the producer task to run but I don't want it to block the server from receiving or servicing requests. I tried using multithreading but this producing process is very slow and the GIL slows it down too much. I want the server to be running all the time, and whenever the queue is no longer full (something has been consumed), I want to kick off this producer task as a separate process and I want it to run until the queue is full again. What is the best way to share the queue so that the producer process can access the queue used by the main process?
What is the best way to share the queue so that the producer process can access the queue used by the main process?
If this is the important part of your question (which seems like it's actually several questions), then multiprocessing.Queue seems to be exactly what you need. I've used this in several projects to have multiple processes feed a queue for consumption by a separate process, so if that's what you're looking for, this should work.
Perhaps I'm being silly asking the question but I need to wrap my head around the basic concepts before I do further work.
I am processing a few thousand RSS feeds, using multiple Celery worker nodes and a RabbitMQ node as the broker. The URL of each feed is being written as a message in the queue. A worker just reads the URL from the queue and starts processing it. I have to ensure that a single RSS feed does not get processed by two workers at the same time.
The article Ensuring a task is only executed one at a time suggests a Memcahced-based solution for locking the feed when it's being processed.
But what I'm trying to understand is that why do I need to use Memcached (or something else) to ensure that a message on a RabbitMQ queue not be consumed by multiple workers at the same time. Is there some configuration change in RabbitMQ (or Celery) that I can do to achieve this goal?
A single MQ message will certainly not be seen by multiple consumers in a normal working setup. You'll have to do some work for the cases involving failing/crashing workers, read up on auto-acks and message rejections, but the basic case is sound.
I don't see a synchronized queue (read: MQ) in the article you've linked, so (as far as I can tell) they're using the lock mechanism (read: memcache) to synchronize, as an alternative. And I can think of a few problems which wouldn't be there in a proper MQ setup.
As noted by others you are mixing apples and oranges.
Being a celery task and a MQ message.
You can ensure that a message will be processed by only one worker at the same time.
eg.
#task(...)
def my_task(
my_task.apply(1)
the .apply publishes a message to the message broker you are using (rabbit, redis...).
Then the message will get routed to a queue and consumed by one worker at time. you dont need locking for this, you have it for free :)
The example on the celery cookbook shows how to prevent two messages like that (my_task.apply(1)) from running at the same time, this is something you need to ensure within the task itself.
You need something which you can access from all workers of course (memcached, redis ...) as they might be running on different machines.
Mentioned example typically used for other goal: it prevents you from working with different messages with the same meaning (not the same message). Eg, I have two processes: first one puts to queue some URLs, and second one - takes URL from queue and fetch them. What will be if first process puts to queue one URL twice (or even more times)?
P.S. I use for this purpose Redis storage and setnx operation (which can set key only once).
I have some queue, for etc:
online_queue = self._channel.queue_declare(
durable = True,
queue = 'online'
)
At the moment, I need to flush all content in this queue.
But, at this moment, another process, probably, may publish to this queue.
If I use channel.queue_purge(queue='online'), what will happened with messages, published, while queue_purge still working?
Depending on your ultimate goal, you might be able to solve this issue by using a temporary queue.
To make things more clear, lets give things some names. Call your current queue (the one you want to purge) Queue A, and assume it is 1-1 bound to Exchange A.
If you create a new queue (Queue B) and bind it to Exchange A in the same way that Queue A is bound, Queue B will now get all of the messages (from the time of binding) that Queue A gets.
You can now safely purge Queue A without loosing any of the messages that got sent in after Queue B was bound.
Re-bind Queue A to Exchange A and you are back up and running.
You can then deal with the "interim" messages in Queue B however you might need to.
This has the advantage of having a very well defined behavior and doesn't get you into any race conditions because you can completely blow Queue A away and re-create it instead of purging.
You're describing a race condition. Some might remain in the queue and some others might get purged. Or all of them will get purged. Or none of them will get purged.
There's just no way to tell, because it's a time-dependent situation. You should re-examine your need to purge a queue which is still active, or build a more robust consumer that can live with the fact that there might be messages in the queue it is connecting to (which is basically what consumers have to live with, anyway).