I need to consume JSON messages from Rabbitmq and perform long-running tasks on each of these Jsons. I am using a job-queue mechanism with maxsize 1 and the channel pre-fetch count as 1 so that my code has to handle one message from Rabbitmq at a time. This part seems to be working. The code seems to fail when in the same thread from which the pika connection was made and then the long-running task is run, when the ack callback is called, the code is failing in an internal library throwing up error:
required argument is not an integer.
Have discovered that the error occurs when encode function in an internal library is called. It throws an error when it appends '>Q' string with the delivery_tag. Not sure what this string is though. The following is the piece of code where it fails:
def encode(self):
pieces = list()
pieces.append(struct.pack('>Q', self.delivery_tag))
My ack method and the code block where the callback is defined are as follows:
def __ack_message(self, delivery_tag, ack):
print("Inside the __ack_message function")
if self.channel.is_open:
try:
if ack:
self.channel.basic_ack(delivery_tag)
else:
self.channel.basic_nack(delivery_tag, requeue=False)
except Exception as e:
self.channel.basic_nack(delivery_tag, requeue=False)
print("Failure inside __ack_message consumer.py report: {}".format(e), "error")
pass
else:
pass
def do_work(self, ch, delivery_tag, body):
thread_id = threading.get_ident()
print(f'Thread id:{thread_id} Delivery tag: {delivery_tag} Message body: {body}')
# long-running work begins here
print(f"I am inside the message_consume")
if body != b'':
message = json.loads(body)
self.slave_object.push_to_job_queue(self.connection, message)
cb = functools.partial(self.__ack_message, ch, delivery_tag)
ch.connection.add_callback_threadsafe(cb)
def on_message(self, ch, method_frame, _header_frame, body, args):
print("Inside on_message function")
thrds = args
delivery_tag = method_frame.delivery_tag
t = threading.Thread(target=self.do_work, args=(ch, delivery_tag, body))
t.start()
thrds.append(t)
Related
We have several tasks that we consume from a message queue. The runtimes of those tasks are dependent on fetching some data from a database. Therefore we would like to work with Gevent to not block the program if some database requests take a long time. We are trying to couple it with the Pika client, which has some asynchronous adapters, one of them for gevent: pika.adapters.gevent_connection.GeventConnection.
I set up some toy code, which consumes from a MQ tasks that consists of integers and publishes them on another queue, while sleeping for 4 seconds for each odd number:
# from gevent import monkey
# # Monkeypatch core python libraries to support asynchronous operations.
# monkey.patch_time()
import pika
from pika.adapters.gevent_connection import GeventConnection
from datetime import datetime
import time
def handle_delivery(unused_channel, method, header, body):
"""Called when we receive a message from RabbitMQ"""
print(f"Received: {body} at {datetime.now()}")
channel.basic_ack(method.delivery_tag)
num = int(body)
print(num)
if num % 2 != 0:
time.sleep(4)
channel.basic_publish(
exchange='my_test_exchange2',
routing_key='my_test_queue2',
body=body
)
print("Finished processing")
def on_connected(connection):
"""Called when we are fully connected to RabbitMQ"""
# Open a channel
connection.channel(on_open_callback=on_channel_open)
def on_channel_open(new_channel):
"""Called when our channel has opened"""
global channel
channel = new_channel
channel.basic_qos(prefetch_count=1)
channel.queue_declare(queue="my_queue_gevent5")
channel.exchange_declare("my_test_exchange2")
channel.queue_declare(queue="my_test_queue2")
channel.queue_bind(exchange="my_test_exchange2", queue="my_test_queue2")
channel.basic_consume("my_queue_gevent5", handle_delivery)
def start_loop(i):
conn = GeventConnection(pika.ConnectionParameters('localhost'), on_open_callback=on_connected)
conn.ioloop.start()
start_loop(1)
If I run it without the monkey.patch_time() call it works OK and it publishes results on the my_test_queue2, but it works sequentially. The expected behaviour after adding monkey.patch_time() patch would be that it still works but concurrently. However, the code gets stuck (nothing happens anymore) after it comes to the call time.sleep(4). It processes and publishes the first integer, which is 0, and then gets stuck at 1, when the if clause gets triggered. What am I doing wrong?
With the help of ChatGPT I managed to make it work. There was a gevent.spawn() call missing:
def handle_delivery(unused_channel, method, header, body):
print("Handling delivery")
gevent.spawn(process_message, method, body)
def process_message(method, body):
print(f"Received: {body} at {datetime.now()}")
channel.basic_ack(method.delivery_tag)
num = int(body)
print(num)
if num % 2 != 0:
time.sleep(4)
channel.basic_publish(
exchange='my_test_exchange2',
routing_key='my_test_queue2',
body=body
)
print("Finished processing")
I am using websocket-client for a websocket server and it appears that there is a race condition occurring in the on_message function below:
def on_message(self, ws, message):
"""
Called automatically by the `websocket.WebSocketApp`
Args:
message: message received by the websocket
"""
try:
message = json.loads(message)
# True when the message received is the Websocket connection confirmation
if 'result' in message.keys():
if type(message['result']) is str:
self.subscription_id = message['result']
return
msg_id = message['params']['result']['id']
# Do some stuff
self.last_msg_id = msg_id
except Exception as e:
print(f'Error processing message: {e}')
The issue I am having seems to be that last_msg_id is not being updated by each message in the order they are received. Since I expect a message to be received every few seconds, I was wondering if this function is executed in a new thread each time, added to some sort of task queue, or maybe something else entirely?
How might I be able to ensure that last_msg_id is always updated in the order that the messages come in? I thought about using a lock/mutex to ensure mutual exclusion in the body of the function.
I am using a third party API that provides websocket functionality for continuous stream of messages.
I want to use variables that are defined out of the on_message() function. The structure of the code is as below:
soc = WebSocket()
Live_Data = {}
Prev_Data = {}
def on_message(ws, message):
if <check condition about Prev_Data>:
<Do something on Prev_Data and Live_Data>
def on_open(ws):
print("on open")
def on_error(ws, error):
print(error)
def on_close(ws):
print("Close")
# Assign the callbacks.
soc._on_open = on_open
soc._on_message = on_message
soc._on_error = on_error
soc._on_close = on_close
soc.connect()
But when I run this, it throws error that
Prev_data is referenced before it is assigned
I think this is because the on_message() method is asynchronous and the next on_message() method tries to access the Prev_Data before the first on_message() has finished writing it.
So what is the mechanism to synchronously access the Prev_Data and some other such variables here?
P.S: When I don't use the Prev_Data at all, the code runs fine.
So I have the following code snippet running on a separate thread:
#Starts listening at the defined port on a separate thread. Terminates when 'stop' is received.
def start(self):
try:
if not self.is_running:
self.is_running = True
while self.is_running:
self.socket.listen(1)
conn, addr = self.socket.accept()
#Messages are split with $ symbol to indicate end of command in the stream.
jStrs = [jStr for jStr in conn.recv(self.buffer_size).decode().split('$') if jStr != '']
DoSomethingWith(jStrs)
except Exception as ex:
raise SystemExit(f"Server raised error: {ex}")
On the sender part I have something like this:
#Sends a string message to the desired socket.
##param message: The string message to send.
def send(self, message):
if not self.connected:
self.connect()
self.socket.send(message.encode())
#self.close()
#self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
What I send over the socket and how I use it does not seem relevant to the problem so I left it out for clarity. When I use my send method the first time everything is ok and works as intended. Debugger runs through the whole While routine and stops at self.socket.accept(). When I do the same send after say time.sleep(2) nothing happens. My send method doesn't block though I checked.
Notice the commented lines in the sender. When I close the connection and construct a new socket after every send I don't have this issue, but why?
When I do both sends right one after the other without any time between both will arrive at once which is expected behaviour. Why does my self.socket.accept() never get called a second time if there is a time period between the two calls (even as small as the time it takes to print something)?
Create your socket object once, then send messages across it.
sender = socket.socket()
def send(sender, message):
if not sender.connected:
sender.connect()
sender.send(message.encode())
You could wrap this in a class with an init if need be.
https://pythontic.com/modules/socket/introduction
I managed to fix the issue thanks to #Alex F from the comments. It seems that I executed the loop in the wrong place. You don't want to loop self.socket.accept() but the conn.recv() part once the socket is accepted. Simply moving the while loop below self.socket.accept() worked for me.
#Starts listening at the defined port on a separate thread. Terminates when 'stop' is received.
def start(self):
try:
if not self.is_running:
self.is_running = True
self.socket.listen(1)
#<---- moved from here
conn, addr = self.socket.accept()
while self.is_running: #<---- to here
#Messages are split with $ symbol to indicate end of command in the stream.
jStrs = [jStr for jStr in conn.recv(self.buffer_size).decode().split('$') if jStr != '']
#Loads the arguments and passes them to the remotes dictionary
#which holds all the methods
for jStr in jStrs:
jObj = json.loads(jStr)
func_name = jObj["name"]
del jObj["name"]
remote.all[func_name](**jObj)
except Exception as ex:
raise SystemExit(f"Server raised error: {ex}")
I have stumbled upon this problem while I was documenting Kombu for the new SO documentation project.
Consider the following Kombu code of a Consumer Mixin:
from kombu import Connection, Queue
from kombu.mixins import ConsumerMixin
from kombu.exceptions import MessageStateError
import datetime
# Send a message to the 'test_queue' queue
with Connection('amqp://guest:guest#localhost:5672//') as conn:
with conn.SimpleQueue(name='test_queue') as queue:
queue.put('String message sent to the queue')
# Callback functions
def print_upper(body, message):
print body.upper()
message.ack()
def print_lower(body, message):
print body.lower()
message.ack()
# Attach the callback function to a queue consumer
class Worker(ConsumerMixin):
def __init__(self, connection):
self.connection = connection
def get_consumers(self, Consumer, channel):
return [
Consumer(queues=Queue('test_queue'), callbacks=[print_even_characters, print_odd_characters]),
]
# Start the worker
with Connection('amqp://guest:guest#localhost:5672//') as conn:
worker = Worker(conn)
worker.run()
The code fails with:
kombu.exceptions.MessageStateError: Message already acknowledged with state: ACK
Because the message was ACK-ed twice, on print_even_characters() and print_odd_characters().
A simple solution that works would be ACK-ing only the last callback function, but it breaks modularity if I want to use the same functions on other queues or connections.
How to ACK a queued Kombu message that is sent to more than one callback function?
Solutions
1 - Checking message.acknowledged
The message.acknowledged flag checks whether the message is already ACK-ed:
def print_upper(body, message):
print body.upper()
if not message.acknowledged:
message.ack()
def print_lower(body, message):
print body.lower()
if not message.acknowledged:
message.ack()
Pros: Readable, short.
Cons: Breaks Python EAFP idiom.
2 - Catching the exception
def print_upper(body, message):
print body.upper()
try:
message.ack()
except MessageStateError:
pass
def print_lower(body, message):
print body.lower()
try:
message.ack()
except MessageStateError:
pass
Pros: Readable, Pythonic.
Cons: A little long - 4 lines of boilerplate code per callback.
3 - ACKing the last callback
The documentation guarantees that the callbacks are called in order. Therefore, we can simply .ack() only the last callback:
def print_upper(body, message):
print body.upper()
def print_lower(body, message):
print body.lower()
message.ack()
Pros: Short, readable, no boilerplate code.
Cons: Not modular: the callbacks can not be used by another queue, unless the last callback is always last. This implicit assumption can break the caller code.
This can be solved by moving the callback functions into the Worker class. We give up some modularity - these functions will not be called from outside - but gain safety and readability.
Summary
The difference between 1 and 2 is merely a matter of style.
Solution 3 should be picked if the order of execution matters, and whether a message should not be ACK-ed before it went through all the callbacks successfully.
1 or 2 should be picked if the message should always be ACK-ed, even if one or more callbacks failed.
Note that there are other possible designs; this answer refers to callback functions that reside outside the worker.