I am pulling pubsub messages through a subscription and need to acknowledge these before processing as I am doing multiprocessing and that throws an error of SSL corruption on account of the grpc module.
I want to ack all messages beforehand and unack in case there was an error, I am aware that we can do this for an asynchronous pull but is there a way where we can implement unack in synchronous pull as well?
I am using the official python module to pull from subscription
I suppose that unack you mean nack explained in Python API reference:
In Pub/Sub, the term ack stands for “acknowledge”.
...
It is also possible to nack a message, which is the opposite...
The same documentation contain part Pulling a Subscription Synchronously
in which it is explained how to nack with modify_ack_deadline():
If you want to nack some of the received messages (...), you can use the modify_ack_deadline() method and set their
acknowledge deadlines to zero. This will cause them to be dropped by
this client and the backend will try to re-deliver them.
Related
I need to interact with a the Apache Qpid C++ broker using Qpid Proton in Python. The objective is to generate a receiver that hooks to a topic exchange and receives only messages marked with a topic. I am aware that this can be easily done with Qpid API, but I needs to use Qpid Proton.
I have a horrible hack that inspects any message that arrives and check if topic (subject) of the message match some pattern, in a similar way as AMPQ 0-10 exchange binding do with routing keys. My idea is to move this to a Filter in the receiver, but after hours googling I have found no way to proceed.
My current implementation is using blocking connections, both to sender and receiver.
Any suggestion?
I have a unique id in my data and I am sending to kafka with kafka-python library. When I send samne data to kafka topic, it consumes same data anyway. Is there way to make kafka skip previous messages and contiunue from new messages.
def consume_from_kafka():
consumer = KafkaConsumer(
TOPIC,
bootstrap_servers=["localhost"],
group_id='my-group')
Ok, I finally got your question. Avoiding a message that has been sent multiple times by a producer (incidentally) could be very complicated.
There are generally 2 cases:
The simple one where you have a single instance that consumes the messages. In that case your producer can add a uuid to the message payload and your consumer can keep the ids of the processed messages in a in memory cache.
The complicated one is where you have multiple instances that consume messages (that is usually why you'd need message brokers - a distributed system). In this scenario you would need to use an external service that would play the role of the distributed cache. Redis is a good choice. Alternatively you can use a relational database (which you probably already have in your stack) and record processed message ids there.
Hope that helps.
Someone might need this here. I solved the duplicate message problem using the code below; I am using the Kafka-python lib.
consumer = KafkaConsumer('TOPIC', bootstrap_servers=KAFKA,
auto_offset_reset='earliest', enable_auto_commit=True,
auto_commit_interval_ms=1000, group_id='my-group')
I am using The python client (That comes as part of google-cloud 0.30.0) to process messages.
Sometimes (about 10% ) my messages are being duplicated. I will get the same message again and again up to 50 instances within a few hours.
My Subscription setup is for a 600 seconds ack time but a message may be resent a minute after its predecessor.
While running , I would occasionally get 503 errors (Which I log with my policy_class)
Has anybody experienced that behavior? any ideas ?
My code look like
c = pubsub_v1.SubscriberClient(policy_class)
subscription = c.subscribe(c.subscription_path(my_proj ,my_topic)
res = subscription.open(callback=callback_func)
res.result()
def callback_func(msg)
try:
log.info('got %s', msg.data )
...
finally:
ms.ack()
The client library you are using uses a new Pub/Sub API for subscribing called StreamingPull. One effect of this is that the subscription deadline you have set is no longer used, and instead one calculated by the client library is. The client library also automatically extends the deadlines of messages for you.
When you get these duplicate messages - have you already ack'd the message when it is redelivered, or is this while you are still processing it? If you have already ack'd, are there some messages you have avoided acking? Some messages may be duplicated if they were ack'd but messages in the same batch needed to be sent again.
Also keep in mind that some duplicates are expected currently if you take over a half hour to process a message.
This seems to be an issue with google-cloud-pubsub python client, I upgraded to version 0.29.4 and ack() work as expected
In general, duplicates can happen given that Google Cloud Pub/Sub offers at-least-once delivery. Typically, this rate should be very low. A rate of 10% would be very high. In this particular instance, it was likely an issue in the client libraries that resulted in excessive duplicates, which was fixed in April 2018.
For the general case of excessive duplicates there are a few things to check to determine if the problem is on the user side or not. There are two places where duplication can happen: on the publish side (where there are two distinct messages that are each delivered once) or on the subscribe side (where there is a single message delivered multiple times). The way to distinguish the cases is to look at the messageID provided with the message. If the same ID is repeated, then the duplication is on the subscribe side. If the IDs are unique, then duplication is happening on the publish side. In the latter case, one should look at the publisher to see if it is getting errors that are resulting in publish retries.
If the issue is on the subscriber side, then one should check to ensure that messages are being acknowledged before the ack deadline. Messages that are not acknowledged within this time will be redelivered. If this is the issue, then the solution is to either acknowledge messages faster (perhaps by scaling up with more subscribers for the subscription) or by increasing the acknowledgement deadline. For the Python client library, one sets the acknowledgement deadline by setting the max_lease_duration in the FlowControl object passed into the subscribe method.
How can I send one XMPP message to all connected clients/resources using a Python libraries for example:
xmpppy, jabber.py, jabberbot. Any other commandline solution is well.
So far I've only been able to send an echo or a single message to only one client.
The purpose is to send a message to all resources/clients connected, not grouped.
This might be triggered by a command but is not 'really' necessary.
Thank you.
I cannot give you a specific python example, but I explain how the logic works.
When you send a message to a bare Jid then it depends on the server software or configuration how its routed. Some servers send the message to the "most available resource", and some servers send it to all resources. E.g. Google Talk sends it to all resources.
If you control the server software and it allows you to route messages to a bare Jid to all connected resources then this would be the easiest way.
When your code must work on any server then you should collect all available resources of your contacts. You get them with the presence, most libraries have a callback for this. Then you can send out the messages to full Jids (with resources) in a loop.
I think If you set the same priorities for all connected resources, It would work but I did not try actually.
However in ejabberd there is a module named Mssage Carbon which do this for you, this feature or property is also available in open fire under the name of "route.all-resource".
Hint: If Message carbons used, XMPP client library should suport this too for making it working.
Django and Flask make use of signals — the latter uses the Blinker library. In the context of Python, Blinker and the Python pubsub library, how do signals and pubsub compare? When would I use one or the other?
Blinker docs and PubSub docs.
As far as Blinker and PubSub go, they are the same thing. The difference is in how they go about it:
With Blinker when you subscribe to a signal you give the name of the signal, and when you activate the signal you pass the activating object.
With PubSub when you subscribe to a listener you give the name (same as Blinker), but when you notify the listener you pass the data directly as keyword arguments. Because of the keyword argument method of passing data, it is possible to have many more safety checks using PubSub.
Personally, I would go with Blinker as it matches my way of thinking better, but PubSub certainly has a place also.
This might clear up exactly how Pubsub relates to signals: http://pubsub.sourceforge.net/apidocs/concepts.html
Pubsub facilitates the decoupling of components (callables, modules, packages) within an application. It does this by:
Allowing parts of the application to send messages to “the rest of the application” without having to know
if the messages will be handled:
perhaps the message will be ignored completely,
or handled by a many different parts of the application;
how the messages will be handled:
what will be done with the message and its contents;
in what order any given message will be sent to the rest of the application;
Allowing parts of the application to receive and handle messages from “the rest of the application” without having to know who sent the messages.
A listener is “a part of the application that wants to receive messages”. A listener subscribes to one or more topics. A sender is any part of the application that asks Pubsub to send a message of a given topic. The sender provides data, if any. Pubsub will send the message, including any data, to all listeners of the message’s topic.