I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))
Related
I have one program that collects data from a websocket, processes the data and if some conditions apply I want to call another function that does something with the data.
This is easy enough, but I want the program that collects the data from the websocket to keep running.
I have 'fixed' this quite ugly by writing the data in a database and letting the second program check the database every few seconds. But I don't want to use this solution, since I occasionally get database is locked errors.
Is there a way to start program B from program A while program A keeps running?
I have looked at multi threading and multi processing, and I feel this could be a way to solve it, but while I grasp the basic of that, it is still a bit too difficult for me to use.
Is there an easier way? and if not should I study multi threading or multi processing more?
(or if anyone knows a good guide/video, that would be great too!)
I suggest launching a worker thread, waiting for data to process. Main thread listen to websocket, and send data to worker through pipe.
The logic of worker is:
while True:
data = peek_data_or_sleep(pipe)
process_data(data)
This way you won't get thousands of workers when incoming traffic is high.
So the key point is how to send data to worker, usually a pipe or message queue.
I've used Celery with RabbitMQ as message queue. Send data to Celery from Django server, and Celery call your function from another process.
Here is an example assuming you are using asyncio for WebSockets:
import asyncio
from time import sleep
async def web_socket(queue: asyncio.Queue):
for i in range(5):
await asyncio.sleep(1.0)
await queue.put(f"Here is message n°{i}!")
await queue.put(None)
def expensive_work(message: str):
sleep(0.5)
print(message)
async def worker(queue: asyncio.Queue):
while True:
message = await queue.get()
if message is None: break
await asyncio.to_thread(expensive_work, message)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
web_socket(queue),
worker(queue)
)
if __name__ == "__main__":
asyncio.run(main())
The web_socket() function simulates a websocket listener which receives messages. For each received message, it put it in a queue that will be shared with another task running concurrently and processing the message.
The expensive_work() function simulates the processing task to apply to each message.
The worker() function will be running concurrently to the websocket listener. It reads values from the shared queue and process them. If the processing is really expensive (for instance a CPU-bound task) consider running it in a ProcessPoolExecutor (see here how to do that) to avoid blocking the event loop.
Finally, the main() function creates the shared queue, launches the two tasks concurrently with asyncio.gather() and then awaits the completion of both tasks.
If you are using threads and blocking IO, the solution is essentially similar but using threading Threads and queue.Queue. Beware not to mix multithreading and asyncio concurrency, or search on how to do it properly.
I am working on a project where I have a pool of workers. I am not using the built-in multiprocessing.Pool, but have created my own process pool.
The way it works is that I have created two instances of multiprocessing.Queue - one for sending work tasks to the workers and another to receive the results back.
Each worker just sits in a permanently running loop like this:
while True:
try:
request = self.request_queue.get(True, 5)
except Queue.Empty:
continue
else:
result = request.callable(*request.args, **request.kwargs)
self.results_queue.put((request, result))
There is also some error-handling code, but I have left it out for brewity. Each worker process has daemon set to 1.
I wish to properly shutdown the main process and all child worker processes. My experiences so far (doing Ctrl+C):
With no special implementations, each child process stops/crashes with a KeyboardInterrupt traceback, but the main process does not exist and have to be killed (sudo kill -9).
If I implement a signal handler for the child processes, set to ignore SIGINT's, the main thread shows the KeyboardInterrupt tracebok but nothing happens either way.
If I implement a signal handler for the child processes and the main process, I can see that the signal handler is called in the main process, but calling sys.exit() does not seem to have any effect.
I am looking for a "best practice" way of handling this. I also read somewhere that shutting down processes that were interacting with Queues and Pipes might cause them to deadlock with other processes (due to the Semaphores and other stuff used internally).
My current approach would be the following:
- Find a way to send an internal signal to each process (using a seperate command queue or similar) that will terminate their main loop.
- Implement a signal handler for the main loop that sends the shutdown command. The child processes will have a child handler that sets them to ignore the signal.
Is this the right approach?
The thing you need to watch out for is to deal with the possibility that there are messages in the queues at the time that you want to shutdown so you need a way for your processes to drain their input queues cleanly. Assuming that your main process is the one that will recognize that it is time to shutdown, you could do this.
Send a sentinel to each worker process. This is a special message (frequently None) that can never look like a normal message. After the sentinel, flush and close the queue to each worker process.
In your worker processes use code similar to the following pseudocode:
while True: # Your main processing loop
msg = inqueue.dequeue() # A blocking wait
if msg is None:
break
do_something()
outqueue.flush()
outqueue.close()
If it is possible that several processes could be sending messages on the inqueue you will need a more sophisticated approach. This sample taken from the source code for the monitor method in logging.handlers.QueueListener in Python 3.2 or later shows one possibility.
"""
Monitor the queue for records, and ask the handler
to deal with them.
This method runs on a separate, internal thread.
The thread will terminate if it sees a sentinel object in the queue.
"""
q = self.queue
has_task_done = hasattr(q, 'task_done')
# self._stop is a multiprocessing.Event object that has been set by the
# main process as part of the shutdown processing, before sending
# the sentinel
while not self._stop.isSet():
try:
record = self.dequeue(True)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
pass
# There might still be records in the queue.
while True:
try:
record = self.dequeue(False)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
break
I would like to check if a Consumer/Worker is present to consume a Message I am about to send.
If there isn't any Worker, I would start some workers (both consumers and publishers are on a single machine) and then go about publishing Messages.
If there is a function like connection.check_if_has_consumers, I would implement it somewhat like this -
import pika
import workers
# code for publishing to worker queue
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
# if there are no consumers running (would be nice to have such a function)
if not connection.check_if_has_consumers(queue="worker_queue", exchange=""):
# start the workers in other processes, using python's `multiprocessing`
workers.start_workers()
# now, publish with no fear of your queues getting filled up
channel.queue_declare(queue="worker_queue", auto_delete=False, durable=True)
channel.basic_publish(exchange="", routing_key="worker_queue", body="rockin",
properties=pika.BasicProperties(delivery_mode=2))
connection.close()
But I am unable to find any function with check_if_has_consumers functionality in pika.
Is there some way of accomplishing this, using pika? or maybe, by talking to The Rabbit directly?
I am not completely sure, but I really think RabbitMQ would be aware of the number of consumers subscribed to different queues, since it does dispatch messages to them and accepts acks
I just got started with RabbitMQ 3 hours ago... any help is welcome...
here is the workers.py code I wrote, if its any help....
import multiprocessing
import pika
def start_workers(num=3):
"""start workers as non-daemon processes"""
for i in xrange(num):
process = WorkerProcess()
process.start()
class WorkerProcess(multiprocessing.Process):
"""
worker process that waits infinitly for task msgs and calls
the `callback` whenever it gets a msg
"""
def __init__(self):
multiprocessing.Process.__init__(self)
self.stop_working = multiprocessing.Event()
def run(self):
"""
worker method, open a channel through a pika connection and
start consuming
"""
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='localhost')
)
channel = connection.channel()
channel.queue_declare(queue='worker_queue', auto_delete=False,
durable=True)
# don't give work to one worker guy until he's finished
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='worker_queue')
# do what `channel.start_consuming()` does but with stopping signal
while len(channel._consumers) and not self.stop_working.is_set():
channel.transport.connection.process_data_events()
channel.stop_consuming()
connection.close()
return 0
def signal_exit(self):
"""exit when finished with current loop"""
self.stop_working.set()
def exit(self):
"""exit worker, blocks until worker is finished and dead"""
self.signal_exit()
while self.is_alive(): # checking `is_alive()` on zombies kills them
time.sleep(1)
def kill(self):
"""kill now! should not use this, might create problems"""
self.terminate()
self.join()
def callback(channel, method, properties, body):
"""pika basic consume callback"""
print 'GOT:', body
# do some heavy lifting here
result = save_to_database(body)
print 'DONE:', result
channel.basic_ack(delivery_tag=method.delivery_tag)
EDIT:
I have to move forward so here is a workaround that I am going to take, unless a better approach comes along,
So, RabbitMQ has these HTTP management apis, they work after you have turned on the management plugin and at middle of HTTP apis page there is
/api/connections - A list of all open connections.
/api/connections/name - An individual connection. DELETEing it will close the connection.
So, if I connect my Workers and my Produces both by different Connection names / users, I'll be able to check if the Worker Connection is open... (there might be issues when worker dies...)
will be waiting for a better solution...
EDIT:
just found this in the rabbitmq docs, but this would be hacky to do in python:
shobhit#oracle:~$ sudo rabbitmqctl -p vhostname list_queues name consumers
Listing queues ...
worker_queue 0
...done.
so i could do something like,
subprocess.call("echo password|sudo -S rabbitmqctl -p vhostname list_queues name consumers | grep 'worker_queue'")
hacky... still hope pika has some python function to do this...
Thanks,
I was just looking into this as well. After reading through the source and docs I came across the following in channel.py:
#property
def consumer_tags(self):
"""Property method that returns a list of currently active consumers
:rtype: list
"""
return self._consumers.keys()
My own testing was successful. I used the following where my channel object is self._channel:
if len(self._channel.consumer_tags) == 0:
LOGGER.info("Nobody is listening. I'll come back in a couple of minutes.")
...
I actually found this on accident looking for a different issue, but one thing that may help you is on the Basic_Publish function, there is a parameter "Immediate" which is defaulted to False.
One idea you could do is to set the Immediate Flag to True, which will require it to be consumed by a consumer immediately, instead of sitting in a queue. If a worker is not available to consume the message, it will kick back an error, telling you to start another worker.
Depending on the throughput of your system, this would either be spawning a lot of extra workers, or spawning workers to replace dead workers. For the former issue you can write an admin-like system that simply tracks workers via a control queue, where you can tell a "Runner" like process to kill processes of workers that are now no longer necessary.
I have a program which spawns 4 threads, these threads need to stay running indefinitely and if one of them crashes I need to know so I can restart.
If I use a list with 4 numbers and pass it to each thread through using a queue. Then all each thread has to do is reset its section in the timer while the main thread counts it down.
So the queue will never be empty, only a single value could go to 0, and then if this happens then the main thread knows its child hasn't responded and it can act accordingly.
But every time I .get() from the queue, it makes it empty, so I have to get from the queue, store into a variable, modify the variable and put it back in the queue.
Is this fine using the queue like this for a watchdog.
If you're using Threads, you could regularly check through threading.enumerate to make sure that you have the correct number and kind of threads running.
But, also, passing things into a Queue that gets returned from a thread is a technique that I have at least seen used to make sure that threads are still running. So, if I'm understanding you correctly, what you're doing isn't completely crazy.
Your "thread must re-set its sentinal occasionally" might make more sense to have as a list of Queues that each Thread is expected to respond to asap. This depends on if your Threads are actually doing process-intensive stuff, or if they're just backgrounded for interface reasons. If they're not spending all their time doing math, you could do something like:
def guarded_thread(sentinal_queue, *args):
while True:
try:
sentinal_queue.get_nowait()
sentinal_queue.put('got it')
except Queue.Empty:
# we just want to make sure that we respond if we have been
# pinged
pass
# do actual work with other args
def main(arguments):
queues = [Queue() for q in range(4)]
threads = [(Thread(target=guarded_thread, args=(queue, args)), queue)
for queue, args in zip(queues, arguments)]
for thread, queue in threads:
thread.start()
while True:
for thread, queue in threads:
queue.put(True)
for thread, queue in threads:
try:
response = queue.get(True, MAX_TIMEOUT)
if response != 'got it':
# either re-send or restart the thread
except Queue.Empty:
# restart the thread
time.sleep(PING_INTERVAL)
Note that you could also use different request/response queues to avoid having different kinds of sentinal values, it depends on your actual code which one would look less crazy.
I have a queue which can contain max 4 queued objects. These objects are threads running web service requests. The thread part is OK.
I have followed many tutorials which talk about consumer and producer threads used to fill in and out the queue object.
My question is about the consumer part. In all these tutorials and regarding the python doc, the only way I have found to pile out objects from the queue is :
while len(requltArray) < amountOfThreads:
thread = q.get(True)
thread.join()
Imagine the q.get(True) piles out a thread with an invalid web service request. And imagine this thread have to wait for urllib timeout to end. My consumer will be blocked for some seconds. As my queue is limited to 4 threads and maybe the 3 others have ended yet, I waste time until consumer can continue the pile-outs (and producer can fill the queue).
Is there any way or well-known design pattern to avoid this waste of time ?
Thanks for your help
Maybe you could use conditions
Imagine you want to put a new thread into your queue but it is full. Then you could wait() for the condition object (pseudo code):
condition.acquire()
while not queue.has_free_place():
condition.wait()
add_new_thread_to_queue()
condition.release()
And inside your queued threads you could place something like the following code at the end of execution:
condition.acquire()
remove_myself_from_queue()
condition.notify()
condition.release()