I would like to know if it's possible to configure 2 different Kafka cluster in a Kafka producer.
Currently I'm trying to have my producers & consumer failback automatically to a passive cluster without reconfiguring (bootstrap.servers) and restarting their application.
I'm using Apache Kafka 2.8 and the confluent_kafka==1.8.2 package with Python 3.7.
Below the producer code:
from time import sleep
from confluent_kafka import Producer
p = Producer({'bootstrap.servers': 'clusterA:32531, clusterB:30804'})
def delivery_report(err, msg):
""" Called once for each message produced to indicate delivery result.
Triggered by poll() or flush(). """
if err is not None:
print('Message delivery failed: {}'.format(err))
else:
print(f'Message delivered to {msg.offset()}')
with open('test_data.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
# Check file as empty
if header is not None:
# Iterate over each row after the header in the csv
for row in csv_reader:
sleep(0.02)
p.produce(topic='demo', key=row[5], value=str(row), callback=delivery_report)
p.flush()
When I killed clusterB I got the following error message.
%4|1643837239.074|CLUSTERID|rdkafka#producer-1| [thrd:main]: Broker clusterA:32531/bootstrap reports different ClusterId "MLWCRsVXSxOf2YGPRIivjA" than previously known "6ZtcQCRPQ5msgeD3r7I11w": a client must not be simultaneously connected to multiple clusters
%3|1643837240.995|FAIL|rdkafka#producer-1| [thrd:clusterB:30804/bootstrap]: 172.27.176.222:30804/bootstrap: Connect to ipv4#clusterB:30804 failed: Unknown error (after 2044ms in state CONNECT)
At the moment, You will have to update the bootstrap information to secondary Cluster manually and this will require the restart of the client to failover.
Programmatically, Inorder to connect to a separate cluster you will have to stop the current producer instance and start a new instance with the new bootstrap server config. However this can get quite complicated.
Other options are,
You configure kafka with a LB or a VIP (Not recommended because by nature a direct connection from the client to broker is required)
Configure a shared store (memcached or redis) where you store the bootstrap server config. Your client will fetch the bootstrap server during the bootstrap process. During failure you change the value in the store and restart your clients. (This makes the operation quite easy)
How to use kafka-python to send customized payload?
I have "two ubuntu machines", and both are under same WIFI network(one address is 172.20.10.2, the other is 172.20.10.7), I can use deepstream test4 python script successfully transmit the detected bounding box info through kafka by use the above ip. But I want the customized payload...
Thus, I tried some kafka-python scrpit.
For producer:
from time import sleep
from json import dumps
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['172.20.10.7:9092'])
for e in range(100):
data = {'number' : e}
producer.send('demo01', value=data)
sleep(1)
For consumer:
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'demo01',
bootstrap_servers=['172.20.10.7:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group',
value_deserializer=lambda x: loads(x.decode('utf-8')))
for message in consumer:
print(message.value)
did not work... So need some suggestion or the executable code if possible!
You need to serialize the producer data as JSON if you want to json.loads it
producer.send('demo01', value=json.dumps(data).encode('utf8'))
I am stuck with an issue related to Kafka consumer using confluent-kafka's python library.
CONTEXT
I have a Kafka topic on AWS EC2 that I need to consume.
SCENARIO
Consumer Script (my_topic_consumer.py) uses confluent-kafka-python to create a consumer (shown below) and subscribe to the 'my_topic' topic. The issue is that the consumer is not able to read messages from the Kafka cluster.
All required security steps are met:
1. SSL - security protocol for the consumer and broker.
2. Addition of the consumer EC2 IP block has been added to the Security Group on the cluster.
#my_topic_consumer.py
from confluent_kafka import Consumer, KafkaError
c = Consumer({
'bootstrap.servers': 'my_host:my_port',
'group.id': 'my_group',
'auto.offset.reset': 'latest',
'security.protocol': 'SSL',
'ssl.ca.location': '/path/to/certificate.pem'
})
c.subscribe(['my_topic'])
try:
while True:
msg = c.poll(5)
if msg is None:
print('None')
continue
if msg.error():
print(msg)
continue
else:
#process_msg(msg) - Writes messages to a data file.
except KeyboardInterrupt:
print('Aborted by user\n')
finally:
c.close()
URLS
Broker Host: my_host
Port: my_port
Group ID: my_group
CONSOLE COMMANDS
working - Running the console-consumer script, I am able to see the data:
kafka-console-consumer --bootstrap-server my_host:my_port --consumer.config client-ssl.properties --skip-message-on-error --topic my_topic | jq
Note: client-ssl.properties: points to the JKS file which has the certs.
Further debugging on the Kafka cluster (separate EC2 instance from consumer), I couldn't see any registration of my consumer by my group_id (my_group):
kafka-consumer-groups --botstrap-server my_host:my_port --command-config client-ssl.properties --descrive --group my_group
This leads me to believe the consumer is not getting registered on the cluster, so may be the SSL handshake is failing? How do I check this from consumer side in python?
Note
- the cluster is behind a proxy (corporate), but I do run the proxy on the consumer EC2 before testing.
- ran the process via pm2, yet didn't see any error logs like req timeouts etc.
Is there any way I can check that the Consumer creation is failing in a definite way and find out the root cause? Any help and feedback is appreciated.
First I would like to say that I am a newbie to Kafka and also stackoverflow, So I am sorry if I am not asking this in the right way.
I am trying to implement the producer-consumer using kafka-python.
But its not working properly
I have the zookeeper installed and its up and running. I have up the kafka-server also. But when I am running the consumer and producer through pycharm, the messages are not getting received by the receiver.The consumer keeps on running but the producer stops.
consumer.py
from kafka import KafkaConsumer
consumer = KafkaConsumer('test', group_id='test-consumer-group',
bootstrap_servers=['my_ip:9092'], api_version=(0, 10, 1),
auto_offset_reset='earliest')
print("Consuming messages")
for msg in consumer:
print(msg)
producer.py
from kafka import KafkaProducer
print('above producer')
producer = KafkaProducer(bootstrap_servers=['my_ip:9092'], api_version=(0, 10, 1),
compression_type=None
)
print('after producer')
for _ in range(100):
producer.send('test', b'HELLO NITHIN chandran')
print('after sending messages')
In the place of my_ip, I have provided with my system ip address from ipconfig.
consumer.py Output -
Consuming messages
The consumer.py doesnt stop running
producer.py Output -
above producer
after producer
after sending messages
Process finished with exit code 0
The producer.py stops running and the process is finished as shown in the output.
Please help me in resolving this issue.
All help are appreciated
Your code is ok, the problem is about your broker configuration. Please set it to initial configuration, just change the log.dirs to the path that you want to store Kafka data.
After changing the config file follow these steps:
Stop zookeeper and kafka
Clear both kafka and zookeeper data dir
Run zookeeper and kafa
Start consumer and producer
I am having trouble with KafaConsumer to make it read from the beginning, or from any other explicit offset.
Running the command line tools for the consumer for the same topic , I do see messages with the --from-beginning option and it hangs otherwise
$ ./kafka-console-consumer.sh --zookeeper {localhost:port} --topic {topic_name} --from-beginning
If I run it through python, it hangs, which I suspect to be caused by incorrect consumer configs
consumer = KafkaConsumer(topic_name,
bootstrap_servers=['localhost:9092'],
group_id=None,
auto_commit_enable=False,
auto_offset_reset='smallest')
print "Consuming messages from the given topic"
for message in consumer:
print "Message", message
if message is not None:
print message.offset, message.value
print "Quit"
Output:
Consuming messages from the given topic
(hangs after that)
I am using kafka-python 0.9.5 and the broker runs kafka 8.2. Not sure what the exact problem is.
Set _group_id=None_ as suggested by dpkp to emulate the behavior of console consumer.
The difference between the console-consumer and the python consumer code you have posted is the python consumer uses a consumer group to save offsets: group_id="test-consumer-group" . If instead you set group_id=None, you should see the same behavior as the console consumer.
I ran into the same problem: I can recieve in kafka console but can't get message with python script using package kafka-python.
Finally I figure the reason is that I didn't call producer.flush() and producer.close() in my producer.py which is not mentioned in its documentation .
auto_offset_reset='earliest' solved it for me.
auto_offset_reset='earliest' and group_id=None solved it for me.
My take is: to print and ensure offset is what you expect it to be. By using position() and seek_to_beginning(), please see comments in the code.
I can't explain:
Why after instantiating KafkaConsumer, the partitions are not assigned, is this by design? Hack around is to call poll() once before seek_to_beginning()
Why sometimes after seek_to_beginning(), first call to poll() returns no data and doesnt change the offset.
Code:
import kafka
print(kafka.__version__)
from kafka import KafkaProducer, KafkaConsumer
from time import sleep
KAFKA_URL = 'localhost:9092' # kafka broker
KAFKA_TOPIC = 'sida3_sdtest_topic' # topic name
# ASSUMING THAT the topic exist
# write to the topic
producer = KafkaProducer(bootstrap_servers=[KAFKA_URL])
for i in range(20):
producer.send(KAFKA_TOPIC, ('msg' + str(i)).encode() )
producer.flush()
# read from the topic
# auto_offset_reset='earliest', # auto_offset_reset is needed when offset is not found, it's NOT what we need here
consumer = KafkaConsumer(KAFKA_TOPIC,
bootstrap_servers=[KAFKA_URL],
max_poll_records=2,
group_id='sida3'
)
# (!?) wtf, why we need this to get partitions assigned
# AssertionError: No partitions are currently assigned if poll() is not called
consumer.poll()
consumer.seek_to_beginning()
# also AssertionError: No partitions are currently assigned if poll() is not called
print('partitions of the topic: ',consumer.partitions_for_topic(KAFKA_TOPIC))
from kafka import TopicPartition
print('before poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
# (!?) sometimes the first call to poll() returns nothing and doesnt change the offset
messages = consumer.poll()
sleep(1)
messages = consumer.poll()
print('after poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
print('messages: ', messages)
Output:
2.0.1
partitions of the topic: {0, 1}
before poll() x2:
0
0
after poll() x2:
0
2
messages: {TopicPartition(topic='sida3_sdtest_topic', partition=1): [ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=0, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg0', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1), ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=1, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg1', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1)]}
I faced the same issue before, so I ran kafka-topics locally at the machine running the code to test and I got UnknownHostException. I added the IP and the host name in hosts file and it worked fine in both kafka-topics and the code.
It seems that KafkaConsumer was trying to fetch the messages but failed without raising any exceptions.
For me, I had to specify the router's IP in the kafka PLAINTEXT configuration.
Get the router's IP with:
echo $(ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)
and then add PLAINTEXT_HOST://<touter_ip>:9092 to the kafka advertised listeners. In case of a confluent docker service the configuration is as follows:
kafka:
image: confluentinc/cp-kafka:7.0.1
container_name: kafka
depends_on:
- zookeeper
ports:
- 9092:9092
- 29092:29092
environment:
- KAFKA_BROKER_ID=1
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://172.28.0.1:9092
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
- KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
- KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
and finally the python consumer is:
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'my-topic',
bootstrap_servers=['172.28.0.1:9092'],
auto_offset_reset = 'earliest',
group_id=None,
)
print('Listening')
for msg in consumer:
print(msg)