Kafka consumer implementation not working in Python - python

First time working with Kafka and I've run into a problem.
I have a following implementation of my consumer:
from kafka import KafkaConsumer
import config
class KafkaMessageConsumer:
def __init__(self):
self.consumer = KafkaConsumer(
bootstrap_servers=config.KAFKA_BOOTSTRAP_SERVER,
security_protocol=config.KAFKA_SECURITY_PROTOCOL,
sasl_mechanism=config.KAFKA_SASL_MECHANISM,
sasl_plain_username=config.KAFKA_USERNAME,
sasl_plain_password=config.KAFKA_PASSWORD,
value_deserializer=lambda x: json.loads(x.decode("utf-8")),
)
def receive_messages(self, topic):
self.consumer.subscribe(topics=[topic])
print(f"Subscribed to topics: {self.consumer.subscription()}")
for msg in self.consumer:
yield msg.value
if __name__ == "__main__":
consumer = KafkaMessageConsumer()
for message in consumer.receive_messages(config.KAFKA_TOPIC):
print("Received message:", message)
Where the credentials should be implemented correctly. I get the message of subscribing to the topic without error, but there are no yielded messages eventhough I know for sure that there are messages to be consumed on the topic. Am I missing some neccessery config here?

I'm no expert in Python but it looks like you haven't consumed any messages. You have subscribed to the topic but you would need to poll() for messages https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html#kafka.KafkaConsumer.poll
Also, where have you set the topic name? [topic]

Related

Slack app.message is reacting toold messages

Im trying to create a slack bot with slack_bolt package that is searching if there are matches in slack history in the channel on a new message. For some reason app.message is triggered on messages that have been sent before the app was running. Is this preventable or do i have to handle this case? It would be to best to prevent it since if i want to use this on a channel with a big history that might cause some problems. I don't want it to check all the X thousand messages in the history.
wanted result:
ME: Hi!
-- starting slack app --
ME: Hi!
SlackBOT in thread: A similar message has been posted: (link to first massage)
result:
ME: Hi!
SlackBOT in thread: A similar message has been posted (link to itself) (this only arrives after the slackbot message to the second Hi)
-- starting slack app --
ME: Hi!
SlackBOT in thread: A similar message has been posted (with link to first message)
import os
import json
import ssl
import certifi
from slack_bolt import App
from slack_sdk.web import WebClient
from slack_bolt.adapter.socket_mode import SocketModeHandler
from utils import get_list_of_same_message_indexes, get_ts_of_message_for_link
history_limit = 50
ssl_context = ssl.create_default_context(cafile=certifi.where())
slack_client = WebClient(token=os.environ['BOT_TOKEN'], ssl=ssl_context)
app = App(client=slack_client)
#app.message(".*")
def check_history(message, say):
# say() sends a message to the channel where the event was triggered
history = app.client.conversations_history(channel=slack_channel_ID,limit=history_limit).get("messages")
#if the message is not comming from a thread
print(message.get("text"))
if not message.get("thread_ts"):
print("new message")
thread= message.get("ts")
list_of_same_messages = get_list_of_same_message_indexes(message,history)
if len(list_of_same_messages) > 0:
print(list_of_same_messages)
example_ts = get_ts_of_message_for_link(history[list_of_same_messages[0]])
print(example_ts)
#channel = history[list_of_same_messages[0]].get("channel")
message_text = f"A similar message has been posted: https://test-dev.slack.com/archives/{slack_channel_ID}/p{example_ts}"
say({"text": message_text,"thread_ts": thread})
else:
print("Nothing")
if __name__ == "__main__":
SocketModeHandler(app, os.environ["APP_TOKEN"]).start()

Python produce to different Kafka partition

I am trying to learn Kafka by taking the classic Twitter streaming example. I am trying to use my producer to stream twitter data based on 2 filters to different partition of same topic. For example, twitter data with tracks='Google' to one partition and track='Apple' to another.
class Producer(StreamListener):
def __init__(self, producer):
self.producer = producer
def on_data(self, data):
self.producer.send(topic_name, value=data)
return True
def on_error(self, error):
print(error)
twitter_stream = Stream(auth, Producer(producer))
twitter_stream.filter(track=["Google"])
How do i add another track and stream that data to another partition.
Likewise, how do i make my consumer consume from a specific partition.
consumer = KafkaConsumer(
topic_name,
bootstrap_servers=['localhost:9092'],
auto_offset_reset='latest',
enable_auto_commit=True,
auto_commit_interval_ms = 5000,
max_poll_records = 100,
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
After some research, I was able to resolve this issue:
In the producer side, specify the partition:
self.producer.send(topic_name, value=data,partition=0)
In the consumer side,
consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
auto_offset_reset='latest',
enable_auto_commit=True,
auto_commit_interval_ms = 5000,
max_poll_records = 100,
value_deserializer=lambda x: json.loads(x.decode('utf-8')))
consumer.assign([TopicPartition('trial', 0)])
Kafka partitions data on the key of the message. In your given code, you are only passing in a value to the Producer message, so the key will be null, and therefore will round-robin between all partitions.
Refer the documentation for your Kafka library to see how you can give a key for each message

Faust example of publishing to a kafka topic

I'm curious about how you are supposed to express that you want a message delivered to a Kafka topic in faust. The example in their readme doesn't seem to write to a topic:
import faust
class Greeting(faust.Record):
from_name: str
to_name: str
app = faust.App('hello-app', broker='kafka://localhost')
topic = app.topic('hello-topic', value_type=Greeting)
#app.agent(topic)
async def hello(greetings):
async for greeting in greetings:
print(f'Hello from {greeting.from_name} to {greeting.to_name}')
#app.timer(interval=1.0)
async def example_sender(app):
await hello.send(
value=Greeting(from_name='Faust', to_name='you'),
)
if __name__ == '__main__':
app.main()
I would expect hello.send in the above code to publish a message to the topic, but it doesn't appear to.
There are many examples of reading from topics, and many examples of using the cli to push an ad-hoc message. After combing through the docs, I don't see any clear examples of publishing to topics in code. Am I just being crazy and the above code should work?
You can use sink to tell Faust where to deliver the results of an agent function. You can also use multiple topics as sinks at once if you want.
#app.agent(topic_to_read_from, sink=[destination_topic])
async def fetch(records):
async for record in records:
result = do_something(record)
yield result
The send() function is the correct one to call to write to topics. You can even specify a particular partition, just like the equivalent Java API call.
Here is the reference for the send() method:
https://faust.readthedocs.io/en/latest/reference/faust.topics.html#faust.topics.Topic.send
If you want a Faust producer only (not combined with a consumer/sink), the original question actually has the right bit of code, here's a fully functional script that publishes messages to a 'faust_test' Kafka topic that is consumable by any Kafka/Faust consumer.
Run the code below like this: python faust_producer.py worker
"""Simple Faust Producer"""
import faust
if __name__ == '__main__':
"""Simple Faust Producer"""
# Create the Faust App
app = faust.App('faust_test_app', broker='localhost:9092')
topic = app.topic('faust_test')
# Send messages
#app.timer(interval=1.0)
async def send_message(message):
await topic.send(value='my message')
# Start the Faust App
app.main()
So we just ran into the need to send a message to a topic other than the sink topics.
The easiest way we found was: foo = await my_topic.send_soon(value="wtfm8").
You can also use send directly like below using the asyncio event loop.
loop = asyncio.get_event_loop()
foo = await ttopic.send(value="wtfm8??")
loop.run_until_complete(foo)
Dont know how relevant this is anymore but I came across this issue when trying to learn Faust. From what I read, here is what is happening:
topic = app.topic('hello-topic', value_type=Greeting)
The misconception here is that the topic you have created is the topic you are trying to consume/read from. The topic you created currently does not do anything.
await hello.send(
value=Greeting(from_name='Faust', to_name='you'),
)
this essentially creates an intermediate kstream which sends the values to your hello(greetings) function. def hello(...) will be called when there is a new message to the stream and will process the message that is being sent.
#app.agent(topic)
async def hello(greetings):
async for greeting in greetings:
print(f'Hello from {greeting.from_name} to {greeting.to_name}')
This is receiving the kafka stream from hello.send(...) and simply printing it to the console (no output to the 'topic' created). This is where you can send a message to a new topic. so instead of printing you can do:
topic.send(value = "my message!")
Alternatively:
Here is what you are doing:
example_sender() sends a message to hello(...) (through intermediate kstream)
hello(...) picks up the message and prints it
NOTICE: no sending of messages to the correct topic
Here is what you can do:
example_sender() sends a message to hello(...) (through intermediate kstream)
hello(...) picks up the message and prints
hello(...) ALSO sends a new message to the topic created(assuming you are trying to transform the original data)
app = faust.App('hello-app', broker='kafka://localhost')
topic = app.topic('hello-topic', value_type=Greeting)
output_topic = app.topic('test_output_faust', value_type=str)
#app.agent(topic)
async def hello(greetings):
async for greeting in greetings:
new_message = f'Hello from {greeting.from_name} to {greeting.to_name}'
print(new_message)
await output_topic.send(value=new_message)
I found a solution to how to send data to kafka topics using Faust, but I don't really understand how it works.
There are several methods for this in Faust: send(), cast(), ask_nowait(), ask(). In the documentation they are called RPC operations.
After creating the sending task, you need to run the Faust application in the mode Client-Only Mode. (start_client(), maybe_start_client())
The following code (the produce() function) demonstrates their application (pay attention to the comments):
import asyncio
import faust
class Greeting(faust.Record):
from_name: str
to_name: str
app = faust.App('hello-app', broker='kafka://localhost')
topic = app.topic('hello-topic', value_type=Greeting)
result_topic = app.topic('result-topic', value_type=str)
#app.agent(topic)
async def hello(greetings):
async for greeting in greetings:
s = f'Hello from {greeting.from_name} to {greeting.to_name}'
print(s)
yield s
async def produce(to_name):
# send - universal method for sending data to a topic
await hello.send(value=Greeting(from_name='SEND', to_name=to_name), force=True)
await app.maybe_start_client()
print('SEND')
# cast - allows you to send data without waiting for a response from the agent
await hello.cast(value=Greeting(from_name='CAST', to_name=to_name))
await app.maybe_start_client()
print('CAST')
# ask_nowait - it seems to be similar to cast
p = await hello.ask_nowait(
value=Greeting(from_name='ASK_NOWAIT', to_name=to_name),
force=True,
reply_to=result_topic
)
# without this line, ask_nowait will not work; taken from the ask implementation
await app._reply_consumer.add(p.correlation_id, p)
await app.maybe_start_client()
print(f'ASK_NOWAIT: {p.correlation_id}')
# blocks the execution flow
# p = await hello.ask(value=Greeting(from_name='ASK', to_name=to_name), reply_to=result_topic)
# print(f'ASK: {p.correlation_id}')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(produce('Faust'))
Starting Fast worker with the command faust -A <example> worker
Then we can launch the client part of the application and check that everything is working: python <example.py>
<example.py> output:
SEND
CAST
ASK_NOWAIT: bbbe6795-5a99-40e5-a7ad-a9af544efd55
It is worth noting that you will also see a traceback of some error that occurred after delivery, which does not interfere with the program (it seems so)
Faust worker output:
[2022-07-19 12:06:27,959] [1140] [WARNING] Hello from SEND to Faust
[2022-07-19 12:06:27,960] [1140] [WARNING] Hello from CAST to Faust
[2022-07-19 12:06:27,962] [1140] [WARNING] Hello from ASK_NOWAIT to Faust
I don't understand why it works this way, why it's so difficult and why very little is written about in the documentation 😓.

producer based on confluent_kafka: message seems to never get delivered although Kafka is reachable

I'm trying to create a simple Kafka producer based on confluent_kafka. My code is the following:
#!/usr/bin/env python
from confluent_kafka import Producer
import json
def delivery_report(err, msg):
"""Called once for each message produced to indicate delivery result.
Triggered by poll() or flush().
see https://github.com/confluentinc/confluent-kafka-python/blob/master/README.md"""
if err is not None:
print('Message delivery failed: {}'.format(err))
else:
print('Message delivered to {} [{}]'.format(
msg.topic(), msg.partition()))
class MySource:
"""Kafka producer"""
def __init__(self, kafka_hosts, topic):
"""
:kafka_host list(str): hostnames or 'host:port' of Kafka
:topic str: topic to produce messages to
"""
self.topic = topic
# see https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
config = {
'metadata.broker.list': ','.join(kafka_hosts),
'group.id': 'mygroup',
}
self.producer = Producer(config)
#staticmethod
def main():
topic = 'my-topic'
message = json.dumps({
'measurement': [1, 2, 3]})
mys = MySource(['kafka'], topic)
mys.producer.produce(
topic, message, on_delivery=delivery_report)
mys.producer.flush()
if __name__ == "__main__":
MySource.main()
The first time I use a topic (here: "my-topic"), Kafka does react with "Auto creation of topic my-topic with 1 partitions and replication factor 1 is successful (kafka.server.KafkaApis)". However, the call-back function (on_delivery=delivery_report) is never called and it hangs at flush() (it terminates if I set a timeout for flush) neither the first time nor subsequent times. The Kafka logs does not show anything if I use an existing topic.

Retrieve messages from a topic on a message hub

I am trying to get messages from a topic on a message hub on bluemix using Confluent Kafka Python. My code is found below, but something is not working. The topic and the message hub is up and running, so there is probably something with the code.
from confluent_kafka import Producer, KafkaError, Consumer
consumer_settings = {
'bootstrap.servers': 'broker-url-here',
'group.id': 'mygroup',
'default.topic.config': {'auto.offset.reset': 'smallest'},
'sasl.mechanisms': 'PLAIN',
'security.protocol': 'ssl',
'sasl.username': 'username-here',
'sasl.password': 'password-here',
}
c = Consumer(**consumer_settings)
c.subscribe(['topic-here'])
running = True
while running:
msg = c.poll()
if msg.error():
print("Error while retrieving message")
c.close()
sys.exit(10)
elif (msg is not None):
for x in msg:
print(x)
else:
sys.exit(10)
When I run the code, it seems to get stuck at msg = c.poll(). So I guess it is either failing to connect, or failing to retrieve messages. The credentials themselves are correct.
The consume logic look fine but the configuration for the consumer is incorrect.
security.protocol needs to be set to sasl_ssl
ssl.ca.location needs to point to a PEM file containing trusted certificates. The location of that file varies for each OS, but for the most common it's:
Bluemix/Ubuntu: /etc/ssl/certs
Red Hat: /etc/pki/tls/cert.pem
macOS: /etc/ssl/certs.pem
We also have a sample app using this client that can easily be started or deployed to Bluemix: https://github.com/ibm-messaging/message-hub-samples/tree/master/kafka-python-console-sample

Categories

Resources