I am stuck with an issue related to Kafka consumer using confluent-kafka's python library.
CONTEXT
I have a Kafka topic on AWS EC2 that I need to consume.
SCENARIO
Consumer Script (my_topic_consumer.py) uses confluent-kafka-python to create a consumer (shown below) and subscribe to the 'my_topic' topic. The issue is that the consumer is not able to read messages from the Kafka cluster.
All required security steps are met:
1. SSL - security protocol for the consumer and broker.
2. Addition of the consumer EC2 IP block has been added to the Security Group on the cluster.
#my_topic_consumer.py
from confluent_kafka import Consumer, KafkaError
c = Consumer({
'bootstrap.servers': 'my_host:my_port',
'group.id': 'my_group',
'auto.offset.reset': 'latest',
'security.protocol': 'SSL',
'ssl.ca.location': '/path/to/certificate.pem'
})
c.subscribe(['my_topic'])
try:
while True:
msg = c.poll(5)
if msg is None:
print('None')
continue
if msg.error():
print(msg)
continue
else:
#process_msg(msg) - Writes messages to a data file.
except KeyboardInterrupt:
print('Aborted by user\n')
finally:
c.close()
URLS
Broker Host: my_host
Port: my_port
Group ID: my_group
CONSOLE COMMANDS
working - Running the console-consumer script, I am able to see the data:
kafka-console-consumer --bootstrap-server my_host:my_port --consumer.config client-ssl.properties --skip-message-on-error --topic my_topic | jq
Note: client-ssl.properties: points to the JKS file which has the certs.
Further debugging on the Kafka cluster (separate EC2 instance from consumer), I couldn't see any registration of my consumer by my group_id (my_group):
kafka-consumer-groups --botstrap-server my_host:my_port --command-config client-ssl.properties --descrive --group my_group
This leads me to believe the consumer is not getting registered on the cluster, so may be the SSL handshake is failing? How do I check this from consumer side in python?
Note
- the cluster is behind a proxy (corporate), but I do run the proxy on the consumer EC2 before testing.
- ran the process via pm2, yet didn't see any error logs like req timeouts etc.
Is there any way I can check that the Consumer creation is failing in a definite way and find out the root cause? Any help and feedback is appreciated.
Related
I would like to know if it's possible to configure 2 different Kafka cluster in a Kafka producer.
Currently I'm trying to have my producers & consumer failback automatically to a passive cluster without reconfiguring (bootstrap.servers) and restarting their application.
I'm using Apache Kafka 2.8 and the confluent_kafka==1.8.2 package with Python 3.7.
Below the producer code:
from time import sleep
from confluent_kafka import Producer
p = Producer({'bootstrap.servers': 'clusterA:32531, clusterB:30804'})
def delivery_report(err, msg):
""" Called once for each message produced to indicate delivery result.
Triggered by poll() or flush(). """
if err is not None:
print('Message delivery failed: {}'.format(err))
else:
print(f'Message delivered to {msg.offset()}')
with open('test_data.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
# Check file as empty
if header is not None:
# Iterate over each row after the header in the csv
for row in csv_reader:
sleep(0.02)
p.produce(topic='demo', key=row[5], value=str(row), callback=delivery_report)
p.flush()
When I killed clusterB I got the following error message.
%4|1643837239.074|CLUSTERID|rdkafka#producer-1| [thrd:main]: Broker clusterA:32531/bootstrap reports different ClusterId "MLWCRsVXSxOf2YGPRIivjA" than previously known "6ZtcQCRPQ5msgeD3r7I11w": a client must not be simultaneously connected to multiple clusters
%3|1643837240.995|FAIL|rdkafka#producer-1| [thrd:clusterB:30804/bootstrap]: 172.27.176.222:30804/bootstrap: Connect to ipv4#clusterB:30804 failed: Unknown error (after 2044ms in state CONNECT)
At the moment, You will have to update the bootstrap information to secondary Cluster manually and this will require the restart of the client to failover.
Programmatically, Inorder to connect to a separate cluster you will have to stop the current producer instance and start a new instance with the new bootstrap server config. However this can get quite complicated.
Other options are,
You configure kafka with a LB or a VIP (Not recommended because by nature a direct connection from the client to broker is required)
Configure a shared store (memcached or redis) where you store the bootstrap server config. Your client will fetch the bootstrap server during the bootstrap process. During failure you change the value in the store and restart your clients. (This makes the operation quite easy)
I am asked to use the azure service bus instead of celery in a Django application.
Read the documentation provided but didn't get a clear picture of using service bus instead of a celery task. Any advice provided would be of great help.
Before getting into it, I would like to highlight the differences between Azure Service Bus and Celery.
Azure Service Bus :
Microsoft Azure Service Bus is a fully managed enterprise integration message broker.
You could refer this to know more about the service bus
Celery :
Distributed task queue. Celery is an asynchronous task queue/job queue based on distributed message passing.
I could think of 2 possibilities in your case :
You would like to use Service Bus with Celery in place of other
message brokers.
Replace Celery with the Service Bus
1 : You would like to use Service Bus with Celery in place of other message brokers.
You could refer this to understand why celery needs a message broker.
I am not sure which messaging broker you are using currently, but you could use the Kombu library to meet your requirement.
Reference for Azure Service Bus : https://docs.celeryproject.org/projects/kombu/en/stable/reference/kombu.transport.azureservicebus.html
Reference for others : https://docs.celeryproject.org/projects/kombu/en/stable/reference/index.html
2 : Replace Celery with the Service Bus completely
To meet your requirement :
Consider
Message senders are producers
Message receivers are consumers
These are two different application that you will have to work on.
You could refer the below to get more sample code to build on.
https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/servicebus/azure-servicebus/samples
Explanation :
Every time you would like to execute the actions, you could send
messages to a topic from the producer client.
The Consumer Client - the application that is listening, will receive the message and process the same. You could attach your custom process to it - in that way the your custom process gets executed whenever a message is received at the consumer client end.
The below is sample of the receiving client :
from azure.servicebus.aio import SubscriptionClient
import asyncio
import nest_asyncio
nest_asyncio.apply()
Receiving = True
#Topic 1 receiver :
conn_str= "<>"
name="Allmessages1"
SubsClient = SubscriptionClient.from_connection_string(conn_str, name)
receiver = SubsClient.get_receiver()
async def receive_message_from1():
await receiver.open()
print("Opening the Receiver for Topic1")
async with receiver:
while(Receiving):
msgs = await receiver.fetch_next()
for m in msgs:
print("Received the message from topic 1.....")
##### - Your code to execute when a message is received - ########
print(str(m))
##### - Your code to execute when a message is received - ########
await m.complete()
loop = asyncio.get_event_loop()
topic1receiver = loop.create_task(receive_message_from1())
the section between the below line would be instruction that will be executed every time a message is received.
##### - Your code to execute when a message is received - ########
First I would like to say that I am a newbie to Kafka and also stackoverflow, So I am sorry if I am not asking this in the right way.
I am trying to implement the producer-consumer using kafka-python.
But its not working properly
I have the zookeeper installed and its up and running. I have up the kafka-server also. But when I am running the consumer and producer through pycharm, the messages are not getting received by the receiver.The consumer keeps on running but the producer stops.
consumer.py
from kafka import KafkaConsumer
consumer = KafkaConsumer('test', group_id='test-consumer-group',
bootstrap_servers=['my_ip:9092'], api_version=(0, 10, 1),
auto_offset_reset='earliest')
print("Consuming messages")
for msg in consumer:
print(msg)
producer.py
from kafka import KafkaProducer
print('above producer')
producer = KafkaProducer(bootstrap_servers=['my_ip:9092'], api_version=(0, 10, 1),
compression_type=None
)
print('after producer')
for _ in range(100):
producer.send('test', b'HELLO NITHIN chandran')
print('after sending messages')
In the place of my_ip, I have provided with my system ip address from ipconfig.
consumer.py Output -
Consuming messages
The consumer.py doesnt stop running
producer.py Output -
above producer
after producer
after sending messages
Process finished with exit code 0
The producer.py stops running and the process is finished as shown in the output.
Please help me in resolving this issue.
All help are appreciated
Your code is ok, the problem is about your broker configuration. Please set it to initial configuration, just change the log.dirs to the path that you want to store Kafka data.
After changing the config file follow these steps:
Stop zookeeper and kafka
Clear both kafka and zookeeper data dir
Run zookeeper and kafa
Start consumer and producer
I have got a successful connection between a Kafka producer and consumer on a Google Cloud Platform cluster established by:
$ cd /usr/lib/kafka
$ bin/kafka-console-producer.sh config/server.properties --broker-list \
PLAINTEXT://[project-name]-w-0.c.[cluster-id].internal:9092 --topic test
and executing in a new shell
$ cd /usr/lib/kafka
$ bin/kafka-console-consumer.sh --bootstrap-server \
PLAINTEXT://[project-name]-w-0.c.[cluster-id].internal:9092 --topic test \
--from-beginning
Now, I want to send messages to the Kafka producer server using the following python script:
from kafka import *
topic = 'test'
producer = KafkaProducer(bootstrap_servers='PLAINTEXT://[project-name]-w-0.c.[cluster-id].internal:9092',
api_version=(0,10))
producer.send(topic, b"Test test test")
However, this results in a KafkaTimeoutError:
"Failed to update metadata after %.1f secs." % (max_wait,))
kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs.
Looking around online told me to consider:
uncommenting listeners=... and advertised.listeners=... in the /usr/lib/kafka/config/server.properties file.
However, listeners=PLAINTEXT://:9092 does not work and this post suggests to set PLAINTEXT://<external-ip>:9092.
So, I started wondering about accessing a Kafka server through an external (static) IP address of the GCP cluster. Then, we have set up a firewall rule to access the port (?) and allow https access to the cluster. But I am unsure whether this is an overkill of the problem.
I definitely need some guidance to connect successfully to the Kafka server from the python script.
You need to set advertised.listeners to the address that your client connects to.
More info: https://rmoff.net/2018/08/02/kafka-listeners-explained/
Thanks Robin! The link you posted was very helpful to find the below working configurations.
Despite the fact that SimpleProducer seems to be a deprecated approach, the following settings finally worked for me:
Python script:
from kafka import *
topic = 'test'
kafka = KafkaClient('[project-name]-w-0.c.[cluster-id].internal:9092')
producer = SimpleProducer(kafka)
message = "Test"
producer.send_messages(topic, message.encode('utf-8'))
and uncomment in the /usr/lib/kafka/config/server.properties file:
listeners=PLAINTEXT://[project-name]-w-0.c.[cluster-id].internal:9092
advertised.listeners=PLAINTEXT://[project-name]-w-0.c.[cluster-id].internal:9092
I have Kafka v1.0.1 running on the single node and I am able to push the messages to the topic but somehow unable to consume the message from another node using the below python code.
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'kotak-test',
bootstrap_servers=['kmblhdpedge:9092'],
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',
value_deserializer-lambda x: loads (x.decode('utf-8')))
for message in consumer:
message = message.value
print (message)
I am constantly pushing the messages from the console using the below command:
bin/kafka-console-producer --zookeeper <zookeeper-node>:<port> --topic <topic_name>
and also I can read via console
You're using the old Zookeeper based producer, but the newer Kafka based Consumer. The logic for how these work and store offsets are not the same.
You need to use --broker-list on the Console Producer
Similarly with Console Consumer, use --bootstrap-server, not --zookeeper
Also, these properties should not have spaces in them
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',