Get Latest Message for a Confluent Kafka Topic in Python

Get Latest Message for a Confluent Kafka Topic in Python - python

Here's what I've tried so far:
from confluent_kafka import Consumer
c = Consumer({... several security/server settings skipped...
'auto.offset.reset': 'beginning',
'group.id': 'my-group'})
c.subscribe(['my.topic'])
msg = poll(30.0) # msg is of None type.
msg almost always ends up being None though. I think the issue might be that 'my-group' has already consumed all the messages for 'my.topic'... but I don't care whether a message has already been consumed or not - I still need the latest message. Specifically, I need the timestamp from that latest message.
I tried a bit more, and from this it looks like there are probably 25 messages in the topic, but I have no idea how to get at them:
a = c.assignment()
print(a) # Outputs [TopicPartition{topic=my.topic,partition=0,offset=-1001,error=None}]
offsets = c.get_watermark_offsets(a[0])
print(offsets) # Outputs: (25, 25)
If there are no messages because the topic has never had anything written to it at all, how can I determine that? And if that's the case, how can I determine how long the topic has existed for? I'm looking to write a script that automatically deletes any topics that haven't been written to in the past X days (14 initially - will probably tweak it over time.)

I run into the same issue, and no example on this. In my case there is one partition, and I need to read the last message, to know the some info from that message to setup the consumer/producer component I have.
Logic is that start Consumer, subscribe to topic, poll for message -> this triggers on_assign, where the rewinding happens, by assigning the modified partitions back. After on_assign finishes, the poll for msg continues and reads the last message from topic.
settings = {
"bootstrap.servers": "my.kafka.server",
"group.id": "my-work-group",
"client.id": "my-work-client-1",
"enable.auto.commit": False,
"session.timeout.ms": 6000,
"default.topic.config": {"auto.offset.reset": "largest"},
}
consumer = Consumer(settings)
def on_assign(a_consumer, partitions):
# get offset tuple from the first partition
last_offset = a_consumer.get_watermark_offsets(partitions[0])
# position [1] being the last index
partitions[0].offset = last_offset[1] - 1
consumer.assign(partitions)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(6.0)
Now msg is having the last message inside.

If anyone still needs an example for case with multiple partitions; this is how I did it:
from confluent_kafka import OFFSET_END, Consumer
settings = {
'bootstrap.servers': "my.kafka.server",
'group.id': "my-work-group",
'auto.offset.reset': "latest"
}
def on_assign(consumer, partitions):
for partition in partitions:
partition.offset = OFFSET_END
consumer.assign(partitions)
consumer = Consumer(settings)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(1.0)

Related

Pubsub from a Function published in the background?

The whole reason I use PubSub is in order to not make my Cloud Function to wait, so things happens auto. To publish a topic from a Function, the code Google Docs show is :
# Publishes a message to a Cloud Pub/Sub topic.
def publish(topic_name,message):
# Instantiates a Pub/Sub client
publisher = pubsub_v1.PublisherClient()
if not topic_name or not message:
return ('Missing "topic" and/or "message" parameter.', 400)
# References an existing topic
topic_path = publisher.topic_path(PROJECT_ID, topic_name)
message_json = json.dumps({
'data': {'message': message},
})
message_bytes = message_json.encode('utf-8')
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
except Exception as e:
print(e)
return (e, 500)
Which means Function is waiting for respond, but i want my Function to spend 0 seconds on this. How can I publish and forget ? not wait for respond? (without more dependencies?)

As you can see from the comments in the code, it is waiting to make sure that the publish succeeded. It's not waiting for any sort of response from any of the subscribers on that topic. It's extremely important the code wait until the publish succeeds, otherwise the message might not actually be sent at all, and you risk losing that data entirely. This is because Cloud Functions terminates the code and locks down CPU and I/O after the function returns.
If you really want to risk it, you could try removing the call to result(), but I don't think it's a good idea if you want a reliable system.

You can schedule your functions to run at certain times of the day or every 'interval' time. In this example, this would go into your index.js file and deployed to your functions.
The code would run 'every minute' in the background. The error would simply return to your logs in google cloud console.
If you are using firestore and need to manage documents, you can make the function run on specific events like on document create or update etc..
https://firebase.google.com/docs/functions/firestore-events
EDIT: Not exactly sure if this example matches your use case but hope this example helps
exports.scheduledFx = functions.pubsub.schedule('every minute').onRun(async (context) => {
// Cron time string Description
// 30 * * * * Execute a command at 30 minutes past the hour, every hour.
// 0 13 * * 1 Execute a command at 1:00 p.m. UTC every Monday.
// */5 * * * * Execute a command every five minutes.
// 0 */2 * * * Execute a command every second hour, on the hour.
try {
//your code here
} catch (error) {
return error
}
})

Azure Function Python - serviceBusTrigger - How to deadletter a message

I have a plain simple Python function which should dead-letter a message if it does not match few constraint. Actually I'm raising an exception and everything works fine (I mean the message is being dead-lettered), but I would like to understand if there is a "clean" way to dead-letter the message without raising an exception.
async def function_handler(message: func.ServiceBusMessage, starter: str):
for msg in [message]:
client = df.DurableOrchestrationClient(starter)
message_body = msg.get_body().decode("utf-8")
msg = json.loads(message_body)
if 'valid' in msg:
instance_id = await client.start_new('orchestrator', None, json.dumps(message_body))
else:
raise Exception(f'not found valid {msg["id"]}')
This is part of host.json, this should indicate I'm working with version 2.0 of Azure Functions
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
Suggestions are welcome

At time of writing, in Python it is not possible interactively send a message in dead-letter.
I found out that autocomplete=false is only supported for C#.
This basically means that the only way to dead letter a message is raise an exception, just like I was doing in my code.
Thanks to #GauravMantri to pointing me the right way (i.e. have a look at how to use the autocomplete configuration parameter).

Azure Service Bus Queue has this Max Delivery Count property that you can make use of. Considering you only want to process a message exactly once and then deadletter the message in case Function is unable to process, what you can do is set the max delivery count to 1. That way the message will be automatically deadlettered after 1st delivery.
By default, Function runtime tries to auto-complete the message if there is no exception in processing the message. You do not want Function runtime to do that. For that what you would need to do is set auto complete setting to false. However if the message is processed successfully, you would want to delete that message thus you will need to call auto complete manually if the message processing is successful.
Something like:
if 'valid' in msg:
instance_id = await client.start_new('orchestrator', None, json.dumps(message_body))
//auto complete the message here...
else:
//do nothing and the message will be dead-lettered

receiving and sending mavlink messages using pymavlink library

I have created a proxy between QGC(Ground Control Station) and vehicle in Python. Here is the code:
gcs_conn = mavutil.mavlink_connection('tcpin:localhost:15795')
gcs_conn.wait_heartbeat()
print("Heartbeat from system (system %u component %u)" %(gcs_conn.target_system, gcs_conn.target_system))
vehicle = mavutil.mavlink_connection('tcp:localhost:5760')
vehicle.wait_heartbeat() # recieving heartbeat from the vehicle
print("Heartbeat from system (system %u component %u)" %(vehicle.target_system, vehicle.target_system))
while True:
gcs_msg = gcs_conn.recv_match()
if gcs_msg == None:
pass
else:
vehicle.mav.send(gcs_msg)
print(gcs_msg)
vcl_msg = vehicle.recv_match()
if vcl_msg == None:
pass
else:
gcs_conn.mav.send(vcl_msg)
print(vcl_msg)
I need to receive the messages from the QGC and then forward them to the vehicle and also receive the messages from the vehicle and forward them to the QGC.
When I run the code I get this error.
is there any one who can help me?

If you print your message before sending you'll notice it always fails when you try to send a BAD_DATA message type.
So this should fix it (same for vcl_msg):
if gcs_msg and gcs_msg.get_type() != 'BAD_DATA':
vehicle.mav.send(gcs_msg)
PD: I noticed that you don't specify tcp as input or output, it defaults to input. Than means both connections are inputs. I recommend setting up the GCS connection as output:
gcs_conn = mavutil.mavlink_connection('tcp:localhost:15795', input=False)
https://mavlink.io/en/mavgen_python/#connection_string

For forwarding MAVLink successfully a few things need to happen. I'm assuming you need a usable connection to a GCS, like QGroundControl or MissionPlanner. I use QGC, and my design has basic testing with it.
Note that this is written with Python3. This snippet is not tested, but I have a (much more complex) version tested and working.
from pymavlink import mavutil
import time
# PyMAVLink has an issue that received messages which contain strings
# cannot be resent, because they become Python strings (not bytestrings)
# This converts those messages so your code doesn't crash when
# you try to send the message again.
def fixMAVLinkMessageForForward(msg):
msg_type = msg.get_type()
if msg_type in ('PARAM_VALUE', 'PARAM_REQUEST_READ', 'PARAM_SET'):
if type(msg.param_id) == str:
msg.param_id = msg.param_id.encode()
elif msg_type == 'STATUSTEXT':
if type(msg.text) == str:
msg.text = msg.text.encode()
return msg
# Modified from the snippet in your question
# UDP will work just as well or better
gcs_conn = mavutil.mavlink_connection('tcp:localhost:15795', input=False)
gcs_conn.wait_heartbeat()
print(f'Heartbeat from system (system {gcs_conn.target_system} component {gcs_conn.target_system})')
vehicle = mavutil.mavlink_connection('tcp:localhost:5760')
vehicle.wait_heartbeat()
print(f'Heartbeat from system (system {vehicle.target_system} component {vehicle.target_system})')
while True:
# Don't block for a GCS message - we have messages
# from the vehicle to get too
gcs_msg = gcs_conn.recv_match(blocking=False)
if gcs_msg is None:
pass
elif gcs_msg.get_type() != 'BAD_DATA':
# We now have a message we want to forward. Now we need to
# make it safe to send
gcs_msg = fixMAVLinkMessageForForward(gcs_msg)
# Finally, in order to forward this, we actually need to
# hack PyMAVLink so the message has the right source
# information attached.
vehicle.mav.srcSystem = gcs_msg.get_srcSystem()
vehicle.mav.srcComponent = gcs_msg.get_srcComponent()
# Only now is it safe to send the message
vehicle.mav.send(gcs_msg)
print(gcs_msg)
vcl_msg = vehicle.recv_match(blocking=False)
if vcl_msg is None:
pass
elif vcl_msg.get_type() != 'BAD_DATA':
# We now have a message we want to forward. Now we need to
# make it safe to send
vcl_msg = fixMAVLinkMessageForForward(vcl_msg)
# Finally, in order to forward this, we actually need to
# hack PyMAVLink so the message has the right source
# information attached.
gcs_conn.mav.srcSystem = vcl_msg.get_srcSystem()
gcs_conn.mav.srcComponent = vcl_msg.get_srcComponent()
gcs_conn.mav.send(vcl_msg)
print(vcl_msg)
# Don't abuse the CPU by running the loop at maximum speed
time.sleep(0.001)
Notes
Make sure your loop isn't being blocked
The loop must quickly check if a message is available from one connection or the other, instead of waiting for a message to be available from a single connection. Otherwise a message on the other connection will not go through until the blocking connection has a message.
Check message validity
Check that you actually got a valid message, as opposed to a BAD_DATA message. Attempting to send BAD_DATA will crash
Make sure the recipient gets the correct information about the sender
By default PyMAVLink, when sending a message, will encode YOUR system and component IDs (usually left at zero), instead of the IDs from the message. A GCS receiving this may be confused (ie, QGC) and not properly connect to the vehicle (despite showing the messages in MAVLink inspector).
This is fixed by hacking PyMAVLink such that your system and component IDs match the forwarded message. This can be revered after the message is sent if necessary. See the example to see how I did it.
Loop update rate
It's important that the update rate is fast enough to handle high traffic conditions (especially, say, for downloading params), but it shouldn't peg out the CPU either. I find that a 1000hz update rate works well enough.

Confluent kafka python pause-resume functionality example

Was trying to use the confluent kafka consumer's pause and resume functionality but couldnt find any examples over the internet except the main link.
https://docs.confluent.io/5.0.0/clients/confluent-kafka-python/index.html
Couldn't understand the parameters to be passed to it. Either its list of patitions or topic names or what?

As OneCricketeer mentioned pause() and resume() takes list of TopicPartition
and to initialize TopicPartition class you need topic, partition, and offset which you can get from the message object
This is how you can achieve it through Confluent-Kafka-Python:
import time
from confluent_kafka import Consumer, Producer, TopicPartition
conf = {
'bootstrap.servers': "localhost:9092",
'group.id': "test-consumer-group",
'auto.offset.reset': 'earliest',
'enable.auto.commit': False
}
topics = ['topic1']
consumer = Consumer(conf)
consumer.subscribe(topics)
while True:
try:
msg = consumer.poll(1.0)
if msg is None:
print("Waiting for message or event/error in poll()...")
continue
if msg.error():
print("Error: {}".format(msg.error()))
continue
else:
# Call to your processing function and pause the consumer
consumer.pause([TopicPartition(msg.topic(),msg.partition(),msg.offset())])
time.sleep(60) # Think of it as processing time
# Once the processing is done resume the consumer and commit the message
consumer.resume([TopicPartition(msg.topic(),msg.partition(),msg.offset())])
consumer.commit()
except Exception as e:
print(e)
This is just an example and you might want to modify it based on your use case.

Pause and resume take a list of TopicPartition
class confluent_kafka.TopicPartition
TopicPartition is a generic type to hold a single partition and various information about it.
It is typically used to provide a list of topics or partitions for various operations, such as Consumer.assign().
TopicPartition(topic[, partition][, offset])
Instantiate a TopicPartition object.
Parameters:
topic (string) – Topic name
partition (int) – Partition id
offset (int) – Initial partition offset

PyAPNs and the need to Sleep between Sends

I am using PyAPNs to send notifications to iOS devices. I am often sending groups of notifications at once. If any of the tokens is bad for any reason, the process will stop. As a result I am using the enhanced setup and the following method:
apns.gateway_server.register_response_listener
I use this to track which token was the problem and then I pick up from there sending the rest. The issue is that when sending the only way to trap these errors is to use a sleep timer between token sends. For example:
for x in self.retryAPNList:
apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
If I don't use a sleep timer no errors are caught and thus my entire APN list is not sent to as the process stops when there is a bad token. However, this sleep timer is somewhat arbitrary. Sometimes the .5 seconds is enough while other times I have had to set it to 1. In no case has it worked without some sleep delay being added. Doing this slows down web calls and it feels less than bullet proof to enter random sleep times.
Any suggestions for how this can work without a delay between APN calls or is there a best practice for the delay needed?
Adding more code due to the request made below. Here are 3 methods inside of a class that I use to control this:
class PushAdmin(webapp2.RequestHandler):
retryAPNList=[]
channelID=""
channelName = ""
userName=""
apns = APNs(use_sandbox=True,cert_file="mycert.pem", key_file="mykey.pem", enhanced=True)
def devChannelPush(self,channel,name,sendAlerts):
ucs = UsedChannelStore()
pus = PushUpdateStore()
channelName = ""
refreshApnList = pus.getAPN(channel)
if sendAlerts:
alertApnList,channelName = ucs.getAPN(channel)
if not alertApnList: alertApnList=[]
if not refreshApnList: refreshApnList=[]
pushApnList = list(set(alertApnList+refreshApnList))
elif refreshApnList:
pushApnList = refreshApnList
else:
pushApnList = []
self.retryAPNList = pushApnList
self.channelID = channel
self.channelName = channelName
self.userName = name
self.retryAPNPush()
def retryAPNPush(self):
token = -1
payload = Payload(alert="A message from " +self.userName+ " posted to "+self.channelName, sound="default", badge=1, custom={"channel":self.channelID})
if len(self.retryAPNList)>0:
token +=1
for x in self.retryAPNList:
self.apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
Below is the calling class (abbreviate to reduce non-related items):
class ChannelStore(ndb.Model):
def writeMessage(self,ID,name,message,imageKey,fileKey):
notify = PushAdmin()
notify.devChannelPush(ID,name,True)
Below is the slight change I made to the placement of the sleep timer that seems to have resolved the issue. I am, however, still concerned for whether the time given will be the right amount in all circumstances.
def retryAPNPush(self):
identifier = 1
token = -1
payload = Payload(alert="A message from " +self.userName+ " posted to "+self.channelName, sound="default", badge=1, custom={"channel":self.channelID})
if len(self.retryAPNList)>0:
token +=1
for x in self.retryAPNList:
self.apns.gateway_server.send_notification(x, payload, identifier = token)
time.sleep(0.5)
Resolution:
As noted in the comments at bottom, the resolution to this problem was to move the following statement to the module level outside the class. By doing this there is no need for any sleep statements.
apns = APNs(use_sandbox=True,cert_file="mycert.pem", key_file="mykey.pem", enhanced=True)

In fact, PyAPNS will auto resend dropped notifications for you, please see PyAPNS
So you don't have to retry by yourself, you can just record what notifications have bad tokens.
The behavior of your code might be result from APNS object kept in local scope (within if len(self.retryAPNList)>0:)
I suggest you to pull out APNS object to class or module level, so that it can complete its error handling procedure and reuse the TCP connection.
Please kindly let me know if it helps, thanks :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get Latest Message for a Confluent Kafka Topic in Python - python

Related

Pubsub from a Function published in the background?

Azure Function Python - serviceBusTrigger - How to deadletter a message

receiving and sending mavlink messages using pymavlink library

Confluent kafka python pause-resume functionality example

PyAPNs and the need to Sleep between Sends

Categories

Resources