Synchronous and blocking consumption in RabbitMQ using pika

Synchronous and blocking consumption in RabbitMQ using pika - python

I want to consume a queue (RabbitMQ) synchronously with blocking.
Note: below is full code ready to be run.
The system set up is using RabbitMQ as it's queuing system, but asynchronous consumption is not needed in one of our modules.
I've tried using basic_get on top of a BlockingConnection, which doesn't block (returns (None, None, None) immediately):
# declare queue
get_connection().channel().queue_declare(TEST_QUEUE)
def blocking_get_1():
channel = get_connection().channel()
# get from an empty queue (prints immediately)
print channel.basic_get(TEST_QUEUE)
I've also tried to use the consume generator, fails with "Connection Closed" after a long time of not consuming.
def blocking_get_2():
channel = get_connection().channel()
# put messages in TEST_QUEUE
for i in range(4):
channel.basic_publish(
'',
TEST_QUEUE,
'body %d' % i
)
consume_generator = channel.consume(TEST_QUEUE)
print next(consume_generator)
time.sleep(14400)
print next(consume_generator)
Is there a way to use RabbitMQ using the pika client as I would a Queue.Queue in python? or anything similar?
My option at the moment is busy-wait (using basic_get) - but I rather use the existing system to not busy-wait, if possible.
Full code:
#!/usr/bin/env python
import pika
import time
TEST_QUEUE = 'test'
def get_connection():
# define connection
connection = pika.BlockingConnection(
pika.ConnectionParameters(
host=YOUR_IP,
port=YOUR_PORT,
credentials=pika.PlainCredentials(
username=YOUR_USER,
password=YOUR_PASSWORD,
)
)
)
return connection
# declare queue
get_connection().channel().queue_declare(TEST_QUEUE)
def blocking_get_1():
channel = get_connection().channel()
# get from an empty queue (prints immediately)
print channel.basic_get(TEST_QUEUE)
def blocking_get_2():
channel = get_connection().channel()
# put messages in TEST_QUEUE
for i in range(4):
channel.basic_publish(
'',
TEST_QUEUE,
'body %d' % i
)
consume_generator = channel.consume(TEST_QUEUE)
print next(consume_generator)
time.sleep(14400)
print next(consume_generator)
print "blocking_get_1"
blocking_get_1()
print "blocking_get_2"
blocking_get_2()
get_connection().channel().queue_delete(TEST_QUEUE)

A common problem with Pika is that it is currently not handling incoming events in the background. This basically means that in many scenarios you will need to call connection.process_data_events() periodically to ensure that it does not miss heartbeats.
This also means that if you sleep for a extended period of time, pika will not be handling incoming data, and eventually die as it is not responding to heartbeats. An option here is to disable heartbeats.
I usually solve this by having a thread in the background check for new events, as seen in this example.
If you want to block completely I would do something like this (based on my own library AMQPStorm).
while True:
result = channel.basic.get(queue='simple_queue', no_ack=False)
if result:
print("Message:", message.body)
message.ack()
else:
print("Channel Empty.")
sleep(1)
This is based on the example found here.

Related

Pyhon Pika how to use GeventConnection

We have several tasks that we consume from a message queue. The runtimes of those tasks are dependent on fetching some data from a database. Therefore we would like to work with Gevent to not block the program if some database requests take a long time. We are trying to couple it with the Pika client, which has some asynchronous adapters, one of them for gevent: pika.adapters.gevent_connection.GeventConnection.
I set up some toy code, which consumes from a MQ tasks that consists of integers and publishes them on another queue, while sleeping for 4 seconds for each odd number:
# from gevent import monkey
# # Monkeypatch core python libraries to support asynchronous operations.
# monkey.patch_time()
import pika
from pika.adapters.gevent_connection import GeventConnection
from datetime import datetime
import time
def handle_delivery(unused_channel, method, header, body):
"""Called when we receive a message from RabbitMQ"""
print(f"Received: {body} at {datetime.now()}")
channel.basic_ack(method.delivery_tag)
num = int(body)
print(num)
if num % 2 != 0:
time.sleep(4)
channel.basic_publish(
exchange='my_test_exchange2',
routing_key='my_test_queue2',
body=body
)
print("Finished processing")
def on_connected(connection):
"""Called when we are fully connected to RabbitMQ"""
# Open a channel
connection.channel(on_open_callback=on_channel_open)
def on_channel_open(new_channel):
"""Called when our channel has opened"""
global channel
channel = new_channel
channel.basic_qos(prefetch_count=1)
channel.queue_declare(queue="my_queue_gevent5")
channel.exchange_declare("my_test_exchange2")
channel.queue_declare(queue="my_test_queue2")
channel.queue_bind(exchange="my_test_exchange2", queue="my_test_queue2")
channel.basic_consume("my_queue_gevent5", handle_delivery)
def start_loop(i):
conn = GeventConnection(pika.ConnectionParameters('localhost'), on_open_callback=on_connected)
conn.ioloop.start()
start_loop(1)
If I run it without the monkey.patch_time() call it works OK and it publishes results on the my_test_queue2, but it works sequentially. The expected behaviour after adding monkey.patch_time() patch would be that it still works but concurrently. However, the code gets stuck (nothing happens anymore) after it comes to the call time.sleep(4). It processes and publishes the first integer, which is 0, and then gets stuck at 1, when the if clause gets triggered. What am I doing wrong?

With the help of ChatGPT I managed to make it work. There was a gevent.spawn() call missing:
def handle_delivery(unused_channel, method, header, body):
print("Handling delivery")
gevent.spawn(process_message, method, body)
def process_message(method, body):
print(f"Received: {body} at {datetime.now()}")
channel.basic_ack(method.delivery_tag)
num = int(body)
print(num)
if num % 2 != 0:
time.sleep(4)
channel.basic_publish(
exchange='my_test_exchange2',
routing_key='my_test_queue2',
body=body
)
print("Finished processing")

Non-blocking multiprocessing.connection.Listener?

I use multiprocessing.connection.Listener for communication between processes, and it works as a charm for me. Now i would really love my mainloop to do something else between commands from client. Unfortunately listener.accept() blocks execution until connection from client process is established.
Is there a simple way of managing non blocking check for multiprocessing.connection? Timeout? Or shall i use a dedicated thread?
# Simplified code:
from multiprocessing.connection import Listener
def mainloop():
listener = Listener(address=(localhost, 6000), authkey=b'secret')
while True:
conn = listener.accept() # <--- This blocks!
msg = conn.recv()
print ('got message: %r' % msg)
conn.close()

One solution that I found (although it might not be the most "elegant" solution is using conn.poll. (documentation) Poll returns True if the Listener has new data, and (most importantly) is nonblocking if no argument is passed to it. I'm not 100% sure that this is the best way to do this, but I've had success with only running listener.accept() once, and then using the following syntax to repeatedly get input (if there is any available)
from multiprocessing.connection import Listener
def mainloop():
running = True
listener = Listener(address=(localhost, 6000), authkey=b'secret')
conn = listener.accept()
msg = ""
while running:
while conn.poll():
msg = conn.recv()
print (f"got message: {msg}")
if msg == "EXIT":
running = False
# Other code can go here
print(f"I can run too! Last msg received was {msg}")
conn.close()
The 'while' in the conditional statement can be replaced with 'if,' if you only want to get a maximum of one message at a time. Use with caution, as it seems sort of 'hacky,' and I haven't found references to using conn.poll for this purpose elsewhere.

You can run the blocking function in a thread:
conn = await loop.run_in_executor(None, listener.accept)

I've not used the Listener object myself- for this task I normally use multiprocessing.Queue; doco at the following link:
https://docs.python.org/2/library/queue.html#Queue.Queue
That object can be used to send and receive any pickle-able object between Python processes with a nice API; I think you'll be most interested in:
in process A
.put('some message')
in process B
.get_nowait() # will raise Queue.Empty if nothing is available- handle that to move on with your execution
The only limitation with this is you'll need to have control of both Process objects at some point in order to be able to allocate the queue to them- something like this:
import time
from Queue import Empty
from multiprocessing import Queue, Process
def receiver(q):
while 1:
try:
message = q.get_nowait()
print 'receiver got', message
except Empty:
print 'nothing to receive, sleeping'
time.sleep(1)
def sender(q):
while 1:
message = 'some message'
q.put('some message')
print 'sender sent', message
time.sleep(1)
some_queue = Queue()
process_a = Process(
target=receiver,
args=(some_queue,)
)
process_b = Process(
target=sender,
args=(some_queue,)
)
process_a.start()
process_b.start()
print 'ctrl + c to exit'
try:
while 1:
time.sleep(1)
except KeyboardInterrupt:
pass
process_a.terminate()
process_b.terminate()
process_a.join()
process_b.join()
Queues are nice because you can actually have as many consumers and as many producers for that exact same Queue object as you like (handy for distributing tasks).
I should point out that just calling .terminate() on a Process is bad form- you should use your shiny new messaging system to pass a shutdown message or something of that nature.

The multiprocessing module comes with a nice feature called Pipe(). It is a nice way to share resources between two processes(never tried more than two before). With the dawn of python 3.80 came the shared memory function in the multiprocessing module but i have not really tested that so i cannot vouch for it
You will use the pipe function something like
from multiprocessing import Pipe
.....
def sending(conn):
message = 'some message'
#perform some code
conn.send(message)
conn.close()
receiver, sender = Pipe()
p = Process(target=sending, args=(sender,))
p.start()
print receiver.recv() # prints "some message"
p.join()
with this you should be able to have separate processes running independently and when you get to the point which you need the input from one process. If there is somehow an error due to the unrelieved data of the other process you can put it on a kind of sleep or halt or use a while loop to constantly check pending when the other process finishes with that task and sends it over
while not parent_conn.recv():
time.sleep(5)
this should keep it in an infinite loop until the other process is done running and sends the result. This is also about 2-3 times faster than Queue. Although queue is also a good option personally I do not use it.

Pika channel.stop_consuming doesn't stop start_consuming loop

I have this piece of code, basically it run channel.start_consuming().
I want it to stop after a while.
I think that channel.stop_consuming() is the right method:
def stop_consuming(self, consumer_tag=None):
""" Cancels all consumers, signalling the `start_consuming` loop to
exit.
But it doesn't work: start_consuming() never ends (execution doesn't exit from this call, "end" is never printed).
import unittest
import pika
import threading
import time
_url = "amqp://user:password#xxx.rabbitserver.com/aaa"
class Consumer_test(unittest.TestCase):
def test_startConsuming(self):
def callback(channel, method, properties, body):
print("callback")
print(body)
def connectionTimeoutCallback():
print("connecionClosedCallback")
def _closeChannel(channel_):
print("_closeChannel")
time.sleep(1)
print("close")
if channel_.is_open:
channel_.stop_consuming()
print("stop_cosuming")
else:
print("channel is closed")
#channel_.close()
params = pika.URLParameters(_url)
params.socket_timeout = 5
connection = pika.BlockingConnection(params)
#connection.add_timeout(2, connectionTimeoutCallback)
channel = connection.channel()
channel.basic_consume(callback,
queue='test',
no_ack=True)
t = threading.Thread(target=_closeChannel, args=[channel])
t.start()
print("start_consuming")
channel.start_consuming() # start consuming (loop never ends)
connection.close()
print("end")
connection.add_timeout solve my problem, maybe call basic_cancel too, but I want to use the right method.
Thanks
Note:
I can't respond or add comment to this (pika, stop_consuming does not work) due to my low reputation points.
Note 2:
I think that I'm not sharing channel or connection across threads (Pika doesn't support this) because I use "channel_" passed as parameter and not "channel" instance of the class (Am I wrong?).

I was having the same problem; as pika is not thread safe. i.e. connections and channels can't be safely shared across threads.
So I used a separate connection to send a shutdown message; then stopped consuming the original channel from the callback function.

Pika: how to consume messages synchronously

I would like to run a process periodically(like once per 10 minutes, or once per hour) that gets all the messages from queue, processes them and then exits. Is there any way to do this with pika or should I use a different python lib?

I think an ideal solution here would be to use the basic_get method. It will fetch a single message, but if the the queue is already empty it will return None. The advantage of this is that you can clear the queue with a simple loop, and then simply break the loop once None is returned, plus it is safe to run basic_get with multiple consumers.
This example is based on my own library; amqpstorm, but you could easily implement the same with pika as well.
from amqpstorm import Connection
connection = Connection('127.0.0.1', 'guest', 'guest')
channel = connection.channel()
channel.queue.declare('simple_queue')
while True:
result = channel.basic.get(queue='simple_queue', no_ack=False)
if not result:
print("Channel Empty.")
# We are done, lets break the loop and stop the application.
break
print("Message:", result['body'])
channel.basic.ack(result['method']['delivery_tag'])
channel.close()
connection.close()

Would this work for you:
Measure the current queue length as N = queue.method.message_count
Make the callback count the processed messages and as soon as N are processed, call channel.stop_consuming.
So, client code would be something like this:
class CountCallback(object):
def __init__(self, count):
self.count = count
def __call__(self, ch, method, properties, body):
# process the message here
self.count -= 1
if not self.count:
ch.stop_consuming()
channel = conn.channel()
queue = channel.queue_declare('tasks')
callback = CountCallback(queue.method.message_count)
channel.basic_consume(callback, queue='tasks')
channel.start_consuming()

#eandersson
This example is based on my own library; amqpstorm, but you could easily implement the same with pika as well.
updated for amqpstorm 2.6.1 :
from amqpstorm import Connection
connection = Connection('127.0.0.1', 'guest', 'guest')
channel = connection.channel()
channel.queue.declare('simple_queue')
while True:
result = channel.basic.get(queue='simple_queue', no_ack=False)
if not result:
print("Channel Empty.")
# We are done, lets break the loop and stop the application.
break
print("Message:", result.body)
channel.basic.ack(result.method['delivery_tag'])
channel.close()
connection.close()

Consuming rabbitmq queue from inside python threads

This is a long one.
I have a list of usernames and passwords. For each one I want to login to the accounts and do something things. I want to use several machines to do this faster. The way I was thinking of doing this is have a main machine whose job is just having a cron which from time to time checks if the rabbitmq queue is empty. If it is, read the list of usernames and passwords from a file and send it to the rabbitmq queue. Then have a bunch of machines which are subscribed to that queue whose job is receiving a user/pass, do stuff on it, acknowledge it, and move on to the next one, until the queue is empty and then the main machine fills it up again. So far I think I have everything down.
Now comes my problem. I have checked that the things to be done with each user/passes aren't so intensive and so I could have each machine doing three of them simultaneously using python's threading. In fact for a single machine I have implemented this where I load the user/passes into a python Queue() and then have three threads consume that Queue(). Now I want to do something similar, but instead of consuming from a python Queue(), each thread of each machine should consume from a rabbitmq queue. This is where I'm stuck. To run tests I started by using rabbitmq's tutorial.
send.py:
import pika, sys
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
message = ' '.join(sys.argv[1:])
channel.basic_publish(exchange='',
routing_key='hello',
body=message)
connection.close()
worker.py
import time, pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='hello', no_ack=False)
channel.start_consuming()
For the above you can run two worker.py which will subscribe to the rabbitmq queue and consume as expected.
My threading without rabbitmq is something like this:
runit.py
class Threaded_do_stuff(threading.Thread):
def __init__(self, user_queue):
threading.Thread.__init__(self)
self.user_queue = user_queue
def run(self):
while True:
login = self.user_queue.get()
do_stuff(user=login[0], pass=login[1])
self.user_queue.task_done()
user_queue = Queue.Queue()
for i in range(3):
td = Threaded_do_stuff(user_queue)
td.setDaemon(True)
td.start()
## fill up the queue
for user in list_users:
user_queue.put(user)
## go!
user_queue.join()
This also works as expected: you fill up the queue and have 3 threads subscribe to it. Now what I want to do is something like runit.py but instead of using a python Queue(), using something like worker.py where the queue is actually a rabbitmq queue.
Here's something which I tried and didn't work (and I don't understand why)
rabbitmq_runit.py
import time, threading, pika
class Threaded_worker(threading.Thread):
def callback(self, ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
def __init__(self):
threading.Thread.__init__(self)
self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='hello')
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(self.callback, queue='hello')
def run(self):
print 'start consuming'
self.channel.start_consuming()
for _ in range(3):
print 'launch thread'
td = Threaded_worker()
td.setDaemon(True)
td.start()
I would expect that this launches three threads each of which is blocked by .start_consuming() which just stays there waiting for the rabbitmq queue to send them sometihing. Instead, this program starts, does some prints, and exits. The pattern of the exists is weird too:
launch thread
launch thread
start consuming
launch thread
start consuming
In particular notice there is one "start consuming" missing.
What's going on?
EDIT: One answer I found to a similar question is here
Consuming a rabbitmq message queue with multiple threads (Python Kombu)
and the answer is to "use celery", whatever that means. I don't buy it, I shouldn't need anything remotely as sophisticated as celery. In particular, I'm not trying to set up an RPC and I don't need to read replies from the do_stuff routines.
EDIT 2: The print pattern that I expected would be the following. I do
python send.py first message......
python send.py second message.
python send.py third message.
python send.py fourth message.
and the print pattern would be
launch thread
start consuming
[x] received 'first message......'
launch thread
start consuming
[x] received 'second message.'
launch thread
start consuming
[x] received 'third message.'
[x] received 'fourth message.'

The problem is that you're making the thread daemonic:
td = Threaded_worker()
td.setDaemon(True) # Shouldn't do that.
td.start()
Daemonic threads will be terminated as soon as the main thread exits:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property.
Leave out setDaemon(True) and you should see it behave the way you expect.
Also, the pika FAQ has a note about how to use it with threads:
Pika does not have any notion of threading in the code. If you want to
use Pika with threading, make sure you have a Pika connection per
thread, created in that thread. It is not safe to share one Pika
connection across threads.
This suggests you should move everything you're doing in __init__() into run(), so that the connection is created in the same thread you're actually consuming from the queue in.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Synchronous and blocking consumption in RabbitMQ using pika - python

Related

Pyhon Pika how to use GeventConnection

Non-blocking multiprocessing.connection.Listener?

Pika channel.stop_consuming doesn't stop start_consuming loop

Pika: how to consume messages synchronously

Consuming rabbitmq queue from inside python threads

Categories

Resources