I was going through the PubSub pull docs here
from google.cloud import pubsub_v1
# TODO project_id = "Your Google Cloud Project ID"
# TODO subscription_name = "Your Pub/Sub subscription name"
# TODO timeout = 5.0 # "How long the subscriber should listen for
# messages in seconds"
subscriber = pubsub_v1.SubscriberClient()
# The `subscription_path` method creates a fully qualified identifier
# in the form `projects/{project_id}/subscriptions/{subscription_name}`
subscription_path = subscriber.subscription_path(
project_id, subscription_name
)
def callback(message):
print("Received message: {}".format(message))
message.ack()
streaming_pull_future = subscriber.subscribe(
subscription_path, callback=callback
)
print("Listening for messages on {}..\n".format(subscription_path))
# result() in a future will block indefinitely if `timeout` is not set,
# unless an exception is encountered first.
try:
streaming_pull_future.result(timeout=timeout)
except: # noqa
streaming_pull_future.cancel()
In the above example, message is ack-ed as soon as it is received. But I want to acknowledge only when my local celery workers finish processing the message so that PubSub can redeliver the message if the worker fails. So I take the ack_id of the message, and pass it onto the worker.
params["ack_id"] = message._ack_id
start_aggregation.delay(params)
I just can't figure out how I can use the ack_id in the worker to acknowledge the message. I know that you can use a pubsub end-point to ack a message like given here. But I can't figure out how I can use a service account credentials to do the same - they do it using OAuth in that doc. Any pointers are appreciated. Thanks.
Acking messages received from the client library with a direct call to the acknowledge API would cause issues in the client. The client has flow control limits, which determine the maximum number of messages that can be outstanding (delivered, but not acked). The removal of messages from the count occurs when one calls message.ack() or message.nack(). If you were to call the acknowledge API directly, then this count would not change, resulting in messages no longer flowing once the limit is reached.
If you are trying to use celery to get more parallelism in your processing, you can probably do it directly without this intermediate step. One option is to start up instances of the subscriber client with the same subscription in different processes. The messages will be distributed among the subscribers. Alternatively, you could replace the scheduler with one that is process-based instead of thread-based, though that would be some more work.
Related
I am using nameko.messaging.consume for consuming messages from a queue. Here’s a sample code -
from kombu import Queue
from nameko.messaging import consume
class Service:
name = "sample_service"
QUEUE = Queue("queue_name", no_declare=True)
#consume(QUEUE)
def process_message(self, payload):
# Some long running code ...
return result
By default behaviour, ACK will be sent to rabbitMQ broker after process_message function returns a response (Here, statement return result). I want to send an ACK as soon as consumer consumes the message. How can I do that?
*In library pika, consumer acknowledges as soon as message is consumed. That will be good example what I want to replicate with nameko’s consumer.
Thanks :)
I have a problem with the RabbitMQ implementation PIKA in Python.
I want to consume 1 message from a queue, work with it and acknowledge it when the work is done. Then the next message should be received.
I used the prefetch_count = 1 option, to tell rabbitMQ that this consumer only wants 1 message at a time and don't want a new message until this message is acknowledged.
Here is my (very simple) code:
credentials = pika.PlainCredentials("username","password")
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='1.2.3.4', credentials=credentials))
channel = connection.channel()
def consume(ch, method, properties, body):
time.sleep(5) # Here is the work, now just hold 5 seconds
ch.basic_ack(method.delivery_tag)
def init():
channel.basic_consume(
queue="raw.archive", on_message_callback=consume, auto_ack=False)
channel.basic_qos(prefetch_count=1)
channel.start_consuming()
if __name__ == "__main__":
init()
So my question is, why does rabbitmq deliver more documents (40/sec) than acknowledged (0.20/sec, correct, because of 5 seconds pause). Shouldn't these two be equal?
Furthermore the Unacked value (1650) should never be greater than 1, because it should not deliver any document, until this document gets acknowleged.
The second view shows, that the consumer has no prefetch count. But the prefetch count is set on the connection. Maybe I must set it to the consumer, but I don't know, how to set this.
What am I doing wrong?
Thanks in advance.
As confirmed by Marcel,
Issue is related to when the basic_qos is set on the channel.
It seems it should be set prior to the basic_consume.
def init():
channel.basic_qos(prefetch_count=1)
channel.basic_consume(
queue="raw.archive", on_message_callback=consume, auto_ack=False)
channel.start_consuming()
I am trying to reliably send a message from a publisher to multiple consumers using RabbitMQ topic exchange.
I have configured durable queues (one per consumer) and I am sending persistent messages delivery_mode=2. I am also setting the channel in confim_delivery mode, and have added mandatory=True flag to publish.
Right now the service is pretty reliable, but messages get lost to one of the consumers if it stays down during a broker restart followed by a
message publication.
It seems that broker can recover queues and messages on restart, but it doesn't seem to keep the binding between consumers and queues. So messages only reach one of the consumers and get lost for the one that is down.
Note: Messages do reach the queue and the consumer if the broker doesn't suffer a restart during the time a consumer is down. They accumulate properly on the queue and they are delivered to the consumer when it is up again.
Edit - adding consumers code:
import pika
class Consumer(object):
def __init__(self, queue_name):
self.queue_name = queue_name
def consume(self):
credentials = pika.PlainCredentials(
username='myuser', password='mypassword')
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='myhost', credentials=credentials))
channel = connection.channel()
channel.exchange_declare(exchange='myexchange', exchange_type='topic')
channel.queue_declare(queue=self.queue_name, durable=True)
channel.queue_bind(
exchange='myexchange', queue=self.queue_name, routing_key='my.route')
channel.basic_consume(
consumer_callback=self.message_received, queue=self.queue_name)
channel.start_consuming()
def message_received(self, channel, basic_deliver, properties, body):
print(f'Message received: {body}')
channel.basic_ack(delivery_tag=basic_deliver.delivery_tag)
You can assume each consumer server does something similar to:
c = Consumer('myuniquequeue') # each consumer has a permanent queue name
c.consume()
Edit - adding publisher code:
def publish(message):
credentials = pika.PlainCredentials(
username='myuser', password='mypassword')
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='myhost', credentials=credentials))
channel = connection.channel()
channel.exchange_declare(exchange='myexchange', exchange_type='topic')
channel.confirm_delivery()
success = channel.basic_publish(
exchange='myexchange',
routing_key='my.route',
body=message,
properties=pika.BasicProperties(
delivery_mode=2, # make message persistent
),
mandatory=True
)
if success:
print("Message sent")
else:
print("Could not send message")
# Save for sending later
It is worth saying that I am handling the error case on my own, and it is not the part I would like to improve. When my messages get lost to some of the consumers the flow goes through the success section
Use basic.ack(delivery_tag=basic_deliver.delivery_tag) in your consumer callback method. This acknowledgement tells whether the consumer has received a message and processed it or not. If it's a negative acknowledgement, the message will be requeued.
Edit #1
In order to receive messages during broker crash, the broker needs to be distributed. It is a concept called Mirrored Queues in RabbitMQ. Mirrored Queues lets your queues to be replicated across the nodes in your cluster. If one of the nodes containing the queue goes down, the other node containing the queue will act as your broker.
For complete understanding refer this Mirrored Queues
I implemented asynchronous pull subscriber using Python. This is the basic code
def receive_messages(project, subscription_name):
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
project, subscription_name)
def callback(message):
print ("A")
time.sleep(2)
print('Received message: {}'.format(message))
message.ack()
print ("B")
subscriber.subscribe(subscription_path, callback=callback)
print('Listening for messages on {}'.format(subscription_path))
while True:
time.sleep(60)
I need to print like
A,
message
B
A
message
B
(I need to run sequentially) or receive messages via given no of threads. I don't find a way to limit no of threads. My program given Segmentation fault due to many threads.
How I control no of threads to receive messages.
Problem can solve using Policy
from google.cloud import pubsub_v1
from concurrent import futures
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project, subscription_name)
def callback(message):
print (str(message.data) + " " + str(threading.current_thread()))
message.ack()
flow_control = pubsub_v1.types.FlowControl(max_messages=10)
executor = futures.ThreadPoolExecutor(max_workers=5)
policy = pubsub_v1.subscriber.policy.thread.Policy(subscriber, subscription_path, executor=executor, flow_control=flow_control)
policy.open(callback)
We can set maximum thread count using max_workers. Also flow control settings can be set.
If you need your processing callbacks to run sequentially, you would be better off using a message passing model than modifying the subscriber internals. If you push the received messages to an explicit queue.Queue, you can ensure that only one worker is pulling off of this queue, and only one is being processed at a time. Note however, that while this provides you a ‘one at a time’ guarantee for processing if there is only one subscribing job, it does not provide you with any ordering guarantees. Messages may still be processed in any arbitrary order relative to the order that they were published.
If someone looking for a newer version
from concurrent import futures
from google.cloud import pubsub_v1
executor = futures.ThreadPoolExecutor(max_workers=1)
scheduler = pubsub_v1.subscriber.scheduler.ThreadScheduler(executor)
with pubsub_v1.SubscriberClient() as subscriber:
streaming_pull_future = subscriber.subscribe(subscription_name, callback, scheduler=scheduler, await_callbacks_on_shutdown=True)
timeout = 5 * 60 # seconds
try:
streaming_pull_future.result(timeout=timeout)
except Exception:
streaming_pull_future.cancel() # Trigger the shutdown.
streaming_pull_future.result() # Block until the shutdown is complete.
I am currently writing two scripts to subscribe to a message server using the stomp client library, write.py to write data and read.py to get data.
If I start read.py first and then run write.py, write.py receives the messages correctly.
However, if I run write.py first and then run read.py, read.py does not retrieve any messages previously sent to the server.
Below are relevant parts of the scripts.
How can I achieve that messages put into the queue by write.py are being retained until read.py subscribes and retrieves them?
write.py
def writeMQ(msg):
queue = '/topic/test'
conn = stomp.Connection(host_and_ports=[(MQ_SERVER, MQ_PORT)])
try:
conn.start()
conn.connect(MQ_USER, MQ_PASSWD, wait=True)
conn.send(body=msg, destination=queue, persistent=True)
except:
traceback.print_exc()
finally:
conn.disconnect()
return
read.py
class MyListener(stomp.ConnectionListener):
def on_error(self, headers, message):
print ('received an error {0}'.format(message))
def on_message(self, headers, message):
print ('received an message {0}'.format(message))
def readMQ():
queue = '/topic/test'
conn = stomp.Connection(host_and_ports=[(MQ_SERVER, MQ_PORT)])
try:
conn.set_listener("", MyListener())
conn.start()
conn.connect(MQ_USER, MQ_PASSWD, wait=True)
conn.subscribe(destination=queue, ack="auto", id=1)
stop = raw_input()
except:
traceback.print_exc()
finally:
conn.disconnect()
return
The problem is that the messages are being sent to a topic.
The Apollo Documentation describes the difference between topics and queues as follows:
Queues hold on to unconsumed messages even when there are no subscriptions attached, while a topic will drop messages when there are no connected subscriptions.
Thus, when read.py is startet first and listening, the topic recognizes the subscription and forwards the message. But when write.py is startet first the message is dropped because there is no subscribed client.
So you can use a queue instead of a topic. If the server is able to create a queue silently simply set
queue = '/queue/test' .
I don't know which version of stomp is being used, but I cannot find the parameter
send(..., persistent=True) .
Anyway persisting is not the right way to go since it still does not allow for messages to simply be retained for a later connection, but saves the messages in case of a server failure.
You can use the
retain:set
header for topic messages instead.