Kafka producer automatic fail-over/fail-back - python

I would like to know if it's possible to configure 2 different Kafka cluster in a Kafka producer.
Currently I'm trying to have my producers & consumer failback automatically to a passive cluster without reconfiguring (bootstrap.servers) and restarting their application.
I'm using Apache Kafka 2.8 and the confluent_kafka==1.8.2 package with Python 3.7.
Below the producer code:
from time import sleep
from confluent_kafka import Producer
p = Producer({'bootstrap.servers': 'clusterA:32531, clusterB:30804'})
def delivery_report(err, msg):
""" Called once for each message produced to indicate delivery result.
Triggered by poll() or flush(). """
if err is not None:
print('Message delivery failed: {}'.format(err))
else:
print(f'Message delivered to {msg.offset()}')
with open('test_data.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
# Check file as empty
if header is not None:
# Iterate over each row after the header in the csv
for row in csv_reader:
sleep(0.02)
p.produce(topic='demo', key=row[5], value=str(row), callback=delivery_report)
p.flush()
When I killed clusterB I got the following error message.
%4|1643837239.074|CLUSTERID|rdkafka#producer-1| [thrd:main]: Broker clusterA:32531/bootstrap reports different ClusterId "MLWCRsVXSxOf2YGPRIivjA" than previously known "6ZtcQCRPQ5msgeD3r7I11w": a client must not be simultaneously connected to multiple clusters
%3|1643837240.995|FAIL|rdkafka#producer-1| [thrd:clusterB:30804/bootstrap]: 172.27.176.222:30804/bootstrap: Connect to ipv4#clusterB:30804 failed: Unknown error (after 2044ms in state CONNECT)

At the moment, You will have to update the bootstrap information to secondary Cluster manually and this will require the restart of the client to failover.
Programmatically, Inorder to connect to a separate cluster you will have to stop the current producer instance and start a new instance with the new bootstrap server config. However this can get quite complicated.
Other options are,
You configure kafka with a LB or a VIP (Not recommended because by nature a direct connection from the client to broker is required)
Configure a shared store (memcached or redis) where you store the bootstrap server config. Your client will fetch the bootstrap server during the bootstrap process. During failure you change the value in the store and restart your clients. (This makes the operation quite easy)

Related

confluent-kafka-python consumer unable to read messages

I am stuck with an issue related to Kafka consumer using confluent-kafka's python library.
CONTEXT
I have a Kafka topic on AWS EC2 that I need to consume.
SCENARIO
Consumer Script (my_topic_consumer.py) uses confluent-kafka-python to create a consumer (shown below) and subscribe to the 'my_topic' topic. The issue is that the consumer is not able to read messages from the Kafka cluster.
All required security steps are met:
1. SSL - security protocol for the consumer and broker.
2. Addition of the consumer EC2 IP block has been added to the Security Group on the cluster.
#my_topic_consumer.py
from confluent_kafka import Consumer, KafkaError
c = Consumer({
'bootstrap.servers': 'my_host:my_port',
'group.id': 'my_group',
'auto.offset.reset': 'latest',
'security.protocol': 'SSL',
'ssl.ca.location': '/path/to/certificate.pem'
})
c.subscribe(['my_topic'])
try:
while True:
msg = c.poll(5)
if msg is None:
print('None')
continue
if msg.error():
print(msg)
continue
else:
#process_msg(msg) - Writes messages to a data file.
except KeyboardInterrupt:
print('Aborted by user\n')
finally:
c.close()
URLS
Broker Host: my_host
Port: my_port
Group ID: my_group
CONSOLE COMMANDS
working - Running the console-consumer script, I am able to see the data:
kafka-console-consumer --bootstrap-server my_host:my_port --consumer.config client-ssl.properties --skip-message-on-error --topic my_topic | jq
Note: client-ssl.properties: points to the JKS file which has the certs.
Further debugging on the Kafka cluster (separate EC2 instance from consumer), I couldn't see any registration of my consumer by my group_id (my_group):
kafka-consumer-groups --botstrap-server my_host:my_port --command-config client-ssl.properties --descrive --group my_group
This leads me to believe the consumer is not getting registered on the cluster, so may be the SSL handshake is failing? How do I check this from consumer side in python?
Note
- the cluster is behind a proxy (corporate), but I do run the proxy on the consumer EC2 before testing.
- ran the process via pm2, yet didn't see any error logs like req timeouts etc.
Is there any way I can check that the Consumer creation is failing in a definite way and find out the root cause? Any help and feedback is appreciated.

Kafka Consumer not reading messages

I have Kafka v1.0.1 running on the single node and I am able to push the messages to the topic but somehow unable to consume the message from another node using the below python code.
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'kotak-test',
bootstrap_servers=['kmblhdpedge:9092'],
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',
value_deserializer-lambda x: loads (x.decode('utf-8')))
for message in consumer:
message = message.value
print (message)
I am constantly pushing the messages from the console using the below command:
bin/kafka-console-producer --zookeeper <zookeeper-node>:<port> --topic <topic_name>
and also I can read via console
You're using the old Zookeeper based producer, but the newer Kafka based Consumer. The logic for how these work and store offsets are not the same.
You need to use --broker-list on the Console Producer
Similarly with Console Consumer, use --bootstrap-server, not --zookeeper
Also, these properties should not have spaces in them
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',

How to connect to a RabbitMQ cluster with a Python Client using pika?

I have a Python client that uses Pika package (0.9.13) and retrieves data from one node in a RabbitMQ cluster. The cluster is composed of two nodes placed in two different host (url_1 and url_2). How can I make my Python client to subscribe to both nodes?
That is the main structure of my code:
import pika
credentials = pika.PlainCredentials(user, password)
connection = pika.BlockingConnection(pika.ConnectionParameters(host=url_1,
credentials=credentials, ssl=ssl, port=port))
channel = connection.channel()
channel.exchange_declare(exchange=exchange.name,
type=exchange.type, durable=exchange.durable)
result = channel.queue_declare(queue=queue.name, exclusive=queue.exclusive,
durable=queue.durable, auto_delete=queue.autoDelete)
channel.queue_bind(exchange=exchange.name, queue=queue.name,
routing_key=binding_key)
channel.basic_consume(callback,
queue=queue.name,
no_ack=True)
channel.start_consuming()
If you have a cluster it is not important in which node you are connected.
Usually in to solve this typical problem it is enough to configure a simple load-balancer and connect the clients to the L.B.
clients-----> LB ------> rabbitmq(s) instances cluster.
EDIT:
Load balancer as HAPROXY or for example http://crossroads.e-tunity.com/ they are light and simple to use.
I'd like to add also this: RabbitMQ Client connect to several hosts

Setting up Rabbit MQ Heartbeat with Kombu

Edit:
The main issue is the 3rd party rabbitmq machine seems to kill idle connections every now and then. That's when I start getting "Broken Pipe" exceptions. The only way to gets comms. back to normal is for me to kill the processes and restart them. I assume there's a better way?
--
I'm a little lost here. I am connecting to a 3rd party RabbitMQ server to push messages to. Every now and then all the sockets on their machine gets dropped and I end up getting a "Broken Pipe" exception.
I've been told to implement a heartbeat check in my code but I'm not sure how exactly. I've found some info here: http://kombu.readthedocs.org/en/latest/changelog.html#version-2-3-0 but no real example code.
Do I only need to add "?heartbeat=x" to the connection string? Does Kombu do the rest? I see I need to call "Connection.heartbeat_check()" at "x/2". Should I create a periodic task to call this? How does the connection get re-established?
I'm using:
celery==3.0.12
kombu==2.5.4
My code looks like this right now. A simple Celery task gets called to send the message through to the 3rd party RabbitMQ server (removed logging and comments to keep it short, basic enough):
class SendMessageTask(Task):
name = "campaign.backends.send"
routing_key = "campaign.backends.send"
ignore_result = True
default_retry_delay = 60 # 1 minute.
max_retries = 5
def run(self, send_to, message, **kwargs):
payload = "Testing message"
try:
conn = BrokerConnection(
hostname=HOSTNAME,
port=PORT,
userid=USER_ID,
password=PASSWORD,
virtual_host=VHOST
)
with producers[conn].acquire(block=True) as producer:
publish = conn.ensure(producer, producer.publish, errback=sending_errback, max_retries=3)
publish(
body=payload,
routing_key=OUT_ROUTING_KEY,
delivery_mode=2,
exchange=EXCHANGE,
serializer=None,
content_type='text/xml',
content_encoding = 'utf-8'
)
except Exception, ex:
print ex
Thanks for any and all help.
While you certainly can add heartbeat support to a producer, it makes more sense for consumer processes.
Enabling heartbeats means that you have to send heartbeats regularly, e.g. if the heartbeat is set to 1 second, then you have to send a heartbeat every second or more or the remote will close the connection.
This means that you have to use a separate thread or use async io to reliably send heartbeats in time, and since a connection cannot be shared between threads this leaves us with async io.
The good news is that you probably won't get much benefit adding heartbeats to a produce-only connection.

tornado - transferring a file to cdn without blocking

I have the nginx upload module handling site uploads, but still need to transfer files (let's say 3-20mb each) to our cdn, and would rather not delegate that to a background job.
What is the best way to do this with tornado without blocking other requests? Can i do this in an async callback?
You may find it useful in the overall architecture of your site to add a message queuing service such as RabbitMQ.
This would let you complete the upload via the nginx module, then in the tornado handler, post a message containing the uploaded file path and exit. A separate process would be watching for these messages and handle the transfer to your CDN. This type of service would be useful for many other tasks that could be handled offline ( sending emails, etc.. ). As your system grows, this also provides you a mechanism to scale by moving queue processing to separate machines.
I am using an architecture very similar to this. Just make sure to add your message consumer process to supervisord or whatever you are using to manage your processes.
In terms of implementation, if you are on Ubuntu installing RabbitMQ is a simple:
sudo apt-get install rabbitmq-server
On CentOS w/EPEL repositories:
yum install rabbit-server
There are a number of Python bindings to RabbitMQ. Pika is one of them and it happens to be created by an employee of LShift, who is responsible for RabbitMQ.
Below is a bit of sample code from the Pika repo. You can easily imagine how the handle_delivery method would accept a message containing a filepath and push it to your CDN.
import sys
import pika
import asyncore
conn = pika.AsyncoreConnection(pika.ConnectionParameters(
sys.argv[1] if len(sys.argv) > 1 else '127.0.0.1',
credentials = pika.PlainCredentials('guest', 'guest')))
print 'Connected to %r' % (conn.server_properties,)
ch = conn.channel()
ch.queue_declare(queue="test", durable=True, exclusive=False, auto_delete=False)
should_quit = False
def handle_delivery(ch, method, header, body):
print "method=%r" % (method,)
print "header=%r" % (header,)
print " body=%r" % (body,)
ch.basic_ack(delivery_tag = method.delivery_tag)
global should_quit
should_quit = True
tag = ch.basic_consume(handle_delivery, queue = 'test')
while conn.is_alive() and not should_quit:
asyncore.loop(count = 1)
if conn.is_alive():
ch.basic_cancel(tag)
conn.close()
print conn.connection_close
advice on the tornado google group points to using an async callback (documented at http://www.tornadoweb.org/documentation#non-blocking-asynchronous-requests) to move the file to the cdn.
the nginx upload module writes the file to disk and then passes parameters describing the upload(s) back to the view. therefore, the file isn't in memory, but the time it takes to read from disk–which would cause the request process to block itself, but not other tornado processes, afaik–is negligible.
that said, anything that doesn't need to be processed online shouldn't be, and should be deferred to a task queue like celeryd or similar.

Categories

Resources