Was trying to use the confluent kafka consumer's pause and resume functionality but couldnt find any examples over the internet except the main link.
https://docs.confluent.io/5.0.0/clients/confluent-kafka-python/index.html
Couldn't understand the parameters to be passed to it. Either its list of patitions or topic names or what?
As OneCricketeer mentioned pause() and resume() takes list of TopicPartition
and to initialize TopicPartition class you need topic, partition, and offset which you can get from the message object
This is how you can achieve it through Confluent-Kafka-Python:
import time
from confluent_kafka import Consumer, Producer, TopicPartition
conf = {
'bootstrap.servers': "localhost:9092",
'group.id': "test-consumer-group",
'auto.offset.reset': 'earliest',
'enable.auto.commit': False
}
topics = ['topic1']
consumer = Consumer(conf)
consumer.subscribe(topics)
while True:
try:
msg = consumer.poll(1.0)
if msg is None:
print("Waiting for message or event/error in poll()...")
continue
if msg.error():
print("Error: {}".format(msg.error()))
continue
else:
# Call to your processing function and pause the consumer
consumer.pause([TopicPartition(msg.topic(),msg.partition(),msg.offset())])
time.sleep(60) # Think of it as processing time
# Once the processing is done resume the consumer and commit the message
consumer.resume([TopicPartition(msg.topic(),msg.partition(),msg.offset())])
consumer.commit()
except Exception as e:
print(e)
This is just an example and you might want to modify it based on your use case.
Pause and resume take a list of TopicPartition
class confluent_kafka.TopicPartition
TopicPartition is a generic type to hold a single partition and various information about it.
It is typically used to provide a list of topics or partitions for various operations, such as Consumer.assign().
TopicPartition(topic[, partition][, offset])
Instantiate a TopicPartition object.
Parameters:
topic (string) – Topic name
partition (int) – Partition id
offset (int) – Initial partition offset
Related
I'm using kafka-python library for my fastapi consumer app and I'm consuming messages in batch with maximum of 100 records. Since the topic has huge traffic and have only one partition, consuming, processing and committing should be as quick as possible hence I want to use commit_async(), instead of synchronous commit().
But I'm not able to find a good example of commit_async(). I'm looking for an example for commit_async() with callback so that I can log in case of commit failure. But I'm not sure what does that callback function takes as argument and what field those arguments contain.
However the docs related to commit_async mentions the arguments, I'm not completely sure how to use them.
I need help in completing my callback function on_commit(), someone please help here
Code
import logging as log
from kafka import KafkaConsumer
from message_handler_impl import MessageHandlerImpl
def on_commit():
pass
class KafkaMessageConsumer:
def __init__(self, bootstrap_servers: str, topic: str, group_id: str):
self.bootstrap_servers = bootstrap_servers
self.topic = topic
self.group_id = group_id
self.consumer = KafkaConsumer(topic, bootstrap_servers=bootstrap_servers, group_id=group_id, enable_auto_commit=False, auto_offset_reset='latest')
def consume_messages(self, max_poll_records: int,
message_handler: MessageHandlerImpl = MessageHandlerImpl()):
try:
while True:
try:
msg_pack = self.consumer.poll(max_records=max_poll_records)
for topic_partition, messages in msg_pack.items():
message_handler.process_messages(messages)
self.consumer.commit_async(callback=on_commit)
except Exception as e:
log.error("Error while consuming message due to: %s", e, exc_info=True)
finally:
log.error("Something went wrong, closing consumer...........")
self.consumer.close()
if __name__ == "__main__":
kafka_consumer = KafkaMessageConsumer("localhost:9092", "test-topic", "test-group")
kafka_consumer.consume_messages(100)
The docs are fairly clear.
Called as callback(offsets, response) with response as either an Exception or an OffsetCommitResponse struct.
def on_commit(offsets, response):
# or maybe try checking type(response)
if hasattr(response, '<some attribute unique to OffsetCommitResponse>'):
print('committed ' + str(offsets))
else:
print(response) # exception
I'm sure you could look at the source code an maybe find a unit test that covers a full example
I have a topics_subscriber.py that continuously checks for any new topics and subscribes to them.
def _subscribe_topics(consumer: KafkaConsumer) -> None:
topics_to_subscribe = {
topic
for topic in consumer.topics()
if topic.startswith("topic-")
}
subscribed_topics = consumer.subscription()
print(subscribed_topics)
new_topics = (
topics_to_subscribe - subscribed_topics
if subscribed_topics
else topics_to_subscribe
)
if new_topics:
print("new topics detected:")
for topic in new_topics:
print(topic)
print("\nsubscribing to old+new topics\n")
consumer.subscribe(topics=list(topics_to_subscribe)) // subscribe() is not incremental, hence subscribing to old+new topics
print(consumer.subscription())
else:
print("\nno new topics detected: exiting normally\n")
def main() -> None:
consumer = kafka.KafkaConsumer(
client_id="client_1",
group_id="my_group",
bootstrap_servers=['localhost:9092']
)
while True:
_subscribe_topics(consumer)
print("\nsleeping 10 sec\n")
time.sleep(10)
Now, in another script kafka_extractor.py, I want to create a new consumer and join the my_group consumer group and start consuming messages from the topics that are subscribed by the group. i.e without specifically subscribing to topics for this new consumer.
def main() -> None:
consumer = kafka.KafkaConsumer(
client_id="client_2",
group_id="my_group",
bootstrap_servers=['localhost:9092']
)
print("created consumer")
print(consumer.subscription())
for msg in consumer:
print(msg.topic)
Two things to note in the output of kafka_extractor.py:
print(consumer.subscription()) outputs as None
for msg in consumer: -> is stuck and does not move forward nor exits the program.
Any directions would be appreciated here.
Joining a consumer to a group does not gain the group's existing topics because groups can contain multiple topics, and consumer instances decide which topics to consume by subscribing
Your loop is stuck because it's defaulted to read from the end of the subscribed topics, but there are none
If you wanted to consume a list of topics that's refreshed periodically, use a regex subscription
Here's what I've tried so far:
from confluent_kafka import Consumer
c = Consumer({... several security/server settings skipped...
'auto.offset.reset': 'beginning',
'group.id': 'my-group'})
c.subscribe(['my.topic'])
msg = poll(30.0) # msg is of None type.
msg almost always ends up being None though. I think the issue might be that 'my-group' has already consumed all the messages for 'my.topic'... but I don't care whether a message has already been consumed or not - I still need the latest message. Specifically, I need the timestamp from that latest message.
I tried a bit more, and from this it looks like there are probably 25 messages in the topic, but I have no idea how to get at them:
a = c.assignment()
print(a) # Outputs [TopicPartition{topic=my.topic,partition=0,offset=-1001,error=None}]
offsets = c.get_watermark_offsets(a[0])
print(offsets) # Outputs: (25, 25)
If there are no messages because the topic has never had anything written to it at all, how can I determine that? And if that's the case, how can I determine how long the topic has existed for? I'm looking to write a script that automatically deletes any topics that haven't been written to in the past X days (14 initially - will probably tweak it over time.)
I run into the same issue, and no example on this. In my case there is one partition, and I need to read the last message, to know the some info from that message to setup the consumer/producer component I have.
Logic is that start Consumer, subscribe to topic, poll for message -> this triggers on_assign, where the rewinding happens, by assigning the modified partitions back. After on_assign finishes, the poll for msg continues and reads the last message from topic.
settings = {
"bootstrap.servers": "my.kafka.server",
"group.id": "my-work-group",
"client.id": "my-work-client-1",
"enable.auto.commit": False,
"session.timeout.ms": 6000,
"default.topic.config": {"auto.offset.reset": "largest"},
}
consumer = Consumer(settings)
def on_assign(a_consumer, partitions):
# get offset tuple from the first partition
last_offset = a_consumer.get_watermark_offsets(partitions[0])
# position [1] being the last index
partitions[0].offset = last_offset[1] - 1
consumer.assign(partitions)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(6.0)
Now msg is having the last message inside.
If anyone still needs an example for case with multiple partitions; this is how I did it:
from confluent_kafka import OFFSET_END, Consumer
settings = {
'bootstrap.servers': "my.kafka.server",
'group.id': "my-work-group",
'auto.offset.reset': "latest"
}
def on_assign(consumer, partitions):
for partition in partitions:
partition.offset = OFFSET_END
consumer.assign(partitions)
consumer = Consumer(settings)
consumer.subscribe(["test-topic"], on_assign=on_assign)
msg = consumer.poll(1.0)
I have a Google Cloud Function triggered by a PubSub. The doc states messages are acknowledged when the function end with success.
link
But randomly, the function retries (same execution ID) exactly 10 minutes after execution. It is the PubSub ack max timeout.
I also tried to get message ID and acknowledge it programmatically in Function code but the PubSub API respond there is no message to ack with that id.
In StackDriver monitoring, I see some messages not being acknowledged.
Here is my code : main.py
import base64
import logging
import traceback
from google.api_core import exceptions
from google.cloud import bigquery, error_reporting, firestore, pubsub
from sql_runner.runner import orchestrator
logging.getLogger().setLevel(logging.INFO)
def main(event, context):
bigquery_client = bigquery.Client()
firestore_client = firestore.Client()
publisher_client = pubsub.PublisherClient()
subscriber_client = pubsub.SubscriberClient()
logging.info(
'event=%s',
event
)
logging.info(
'context=%s',
context
)
try:
query_id = base64.b64decode(event.get('data',b'')).decode('utf-8')
logging.info(
'query_id=%s',
query_id
)
# inject dependencies
orchestrator(
query_id,
bigquery_client,
firestore_client,
publisher_client
)
sub_path = (context.resource['name']
.replace('topics', 'subscriptions')
.replace('function-sql-runner', 'gcf-sql-runner-europe-west1-function-sql-runner')
)
# explicitly ack message to avoid duplicates invocations
try:
subscriber_client.acknowledge(
sub_path,
[context.event_id] # message_id to ack
)
logging.warning(
'message_id %s acknowledged (FORCED)',
context.event_id
)
except exceptions.InvalidArgument as err:
# google.api_core.exceptions.InvalidArgument: 400 You have passed an invalid ack ID to the service (ack_id=982967258971474).
logging.info(
'message_id %s already acknowledged',
context.event_id
)
logging.debug(err)
except Exception as err:
# catch all exceptions and log to prevent cold boot
# report with error_reporting
error_reporting.Client().report_exception()
logging.critical(
'Internal error : %s -> %s',
str(err),
traceback.format_exc()
)
if __name__ == '__main__': # for testing
from collections import namedtuple # use namedtuple to avoid Class creation
Context = namedtuple('Context', 'event_id resource')
context = Context('666', {'name': 'projects/my-dev/topics/function-sql-runner'})
script_to_start = b' ' # launch the 1st script
script_to_start = b'060-cartes.sql'
main(
event={"data": base64.b64encode(script_to_start)},
context=context
)
Here is my code : runner.py
import logging
import os
from retry import retry
PROJECT_ID = os.getenv('GCLOUD_PROJECT') or 'my-dev'
def orchestrator(query_id, bigquery_client, firestore_client, publisher_client):
"""
if query_id empty, start the first sql script
else, call the given query_id.
Anyway, call the next script.
If the sql script is the last, no call
retrieve SQL queries from FireStore
run queries on BigQuery
"""
docs_refs = [
doc_ref.get() for doc_ref in
firestore_client.collection(u'sql_scripts').list_documents()
]
sorted_queries = sorted(docs_refs, key=lambda x: x.id)
if not bool(query_id.strip()) : # first execution
current_index = 0
else:
# find the query to run
query_ids = [ query_doc.id for query_doc in sorted_queries]
current_index = query_ids.index(query_id)
query_doc = sorted_queries[current_index]
bigquery_client.query(
query_doc.to_dict()['request'], # sql query
).result()
logging.info(
'Query %s executed',
query_doc.id
)
# exit if the current query is the last
if len(sorted_queries) == current_index + 1:
logging.info('All scripts were executed.')
return
next_query_id = sorted_queries[current_index+1].id.encode('utf-8')
publish(publisher_client, next_query_id)
#retry(tries=5)
def publish(publisher_client, next_query_id):
"""
send a message in pubsub to call the next query
this mechanism allow to run one sql script per Function instance
so as to not exceed the 9min deadline limit
"""
logging.info('Calling next query %s', next_query_id)
future = publisher_client.publish(
topic='projects/{}/topics/function-sql-runner'.format(PROJECT_ID),
data=next_query_id
)
# ensure publish is successfull
message_id = future.result()
logging.info('Published message_id = %s', message_id)
It looks like the pubsub message is not ack on success.
I do not think I have background activity in my code.
My question : why my Function is randomly retrying even when success ?
Cloud Functions does not guarantee that your functions will run exactly once. According to the documentation, background functions, including pubsub functions, are given an at-least-once guarantee:
Background functions are invoked at least once. This is because of the
asynchronous nature of handling events, in which there is no caller
that waits for the response. The system might, in rare circumstances,
invoke a background function more than once in order to ensure
delivery of the event. If a background function invocation fails with
an error, it will not be invoked again unless retries on failure are
enabled for that function.
Your code will need to expect that it could possibly receive an event more than once. As such, your code should be idempotent:
To make sure that your function behaves correctly on retried execution
attempts, you should make it idempotent by implementing it so that an
event results in the desired results (and side effects) even if it is
delivered multiple times. In the case of HTTP functions, this also
means returning the desired value even if the caller retries calls to
the HTTP function endpoint. See Retrying Background Functions for more
information on how to make your function idempotent.
I could not find in the documentation how to set the retention time when creating a producer using confluent-kafka.
If I just specify 'bootstrap-servers' the default retention time is 1 day. I would like to be able to change that.
(I want to do this in the python API not on the command line.)
Retention time is set when you create a topic, not on the producer configuration.
If your server.properties allows for auto-topic creation, then you will get the defaults set in there.
Otherwise, you can use the AdminClient API to send a NewTopic request which supports a config attribute of dict<str,str>
from confluent_kafka.admin import AdminClient, NewTopic
# a = AdminClient(...)
topics = list()
t = NewTopic(topic, num_partitions=3, replication_factor=1, config={'log.retention.hours': '168'})
topics.append(t)
# Call create_topics to asynchronously create topics, a dict
# of <topic,future> is returned.
fs = a.create_topics(topics)
# Wait for operation to finish.
# Timeouts are preferably controlled by passing request_timeout=15.0
# to the create_topics() call.
# All futures will finish at the same time.
for topic, f in fs.items():
try:
f.result() # The result itself is None
print("Topic {} created".format(topic))
except Exception as e:
print("Failed to create topic {}: {}".format(topic, e))
In the same link, you can find an alter topic request
the retention time is not a property of the producer.
The default retention time is set in broker configfile server.properties and properties like log.retention.hours, e.g. /etc/kafka/server.properties ...depending on your installation.
You can alter the retention time on a per topic base via e.g.
$ <path-to-kafka>/bin/kafka-topics.sh --zookeeper <zookeeper-quorum> --alter --topic <topic-name> --config retention.ms=<your-desired-retention-in-ms>
HTH....