I am asked to use the azure service bus instead of celery in a Django application.
Read the documentation provided but didn't get a clear picture of using service bus instead of a celery task. Any advice provided would be of great help.
Before getting into it, I would like to highlight the differences between Azure Service Bus and Celery.
Azure Service Bus :
Microsoft Azure Service Bus is a fully managed enterprise integration message broker.
You could refer this to know more about the service bus
Celery :
Distributed task queue. Celery is an asynchronous task queue/job queue based on distributed message passing.
I could think of 2 possibilities in your case :
You would like to use Service Bus with Celery in place of other
message brokers.
Replace Celery with the Service Bus
1 : You would like to use Service Bus with Celery in place of other message brokers.
You could refer this to understand why celery needs a message broker.
I am not sure which messaging broker you are using currently, but you could use the Kombu library to meet your requirement.
Reference for Azure Service Bus : https://docs.celeryproject.org/projects/kombu/en/stable/reference/kombu.transport.azureservicebus.html
Reference for others : https://docs.celeryproject.org/projects/kombu/en/stable/reference/index.html
2 : Replace Celery with the Service Bus completely
To meet your requirement :
Consider
Message senders are producers
Message receivers are consumers
These are two different application that you will have to work on.
You could refer the below to get more sample code to build on.
https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/servicebus/azure-servicebus/samples
Explanation :
Every time you would like to execute the actions, you could send
messages to a topic from the producer client.
The Consumer Client - the application that is listening, will receive the message and process the same. You could attach your custom process to it - in that way the your custom process gets executed whenever a message is received at the consumer client end.
The below is sample of the receiving client :
from azure.servicebus.aio import SubscriptionClient
import asyncio
import nest_asyncio
nest_asyncio.apply()
Receiving = True
#Topic 1 receiver :
conn_str= "<>"
name="Allmessages1"
SubsClient = SubscriptionClient.from_connection_string(conn_str, name)
receiver = SubsClient.get_receiver()
async def receive_message_from1():
await receiver.open()
print("Opening the Receiver for Topic1")
async with receiver:
while(Receiving):
msgs = await receiver.fetch_next()
for m in msgs:
print("Received the message from topic 1.....")
##### - Your code to execute when a message is received - ########
print(str(m))
##### - Your code to execute when a message is received - ########
await m.complete()
loop = asyncio.get_event_loop()
topic1receiver = loop.create_task(receive_message_from1())
the section between the below line would be instruction that will be executed every time a message is received.
##### - Your code to execute when a message is received - ########
I am stuck with an issue related to Kafka consumer using confluent-kafka's python library.
CONTEXT
I have a Kafka topic on AWS EC2 that I need to consume.
SCENARIO
Consumer Script (my_topic_consumer.py) uses confluent-kafka-python to create a consumer (shown below) and subscribe to the 'my_topic' topic. The issue is that the consumer is not able to read messages from the Kafka cluster.
All required security steps are met:
1. SSL - security protocol for the consumer and broker.
2. Addition of the consumer EC2 IP block has been added to the Security Group on the cluster.
#my_topic_consumer.py
from confluent_kafka import Consumer, KafkaError
c = Consumer({
'bootstrap.servers': 'my_host:my_port',
'group.id': 'my_group',
'auto.offset.reset': 'latest',
'security.protocol': 'SSL',
'ssl.ca.location': '/path/to/certificate.pem'
})
c.subscribe(['my_topic'])
try:
while True:
msg = c.poll(5)
if msg is None:
print('None')
continue
if msg.error():
print(msg)
continue
else:
#process_msg(msg) - Writes messages to a data file.
except KeyboardInterrupt:
print('Aborted by user\n')
finally:
c.close()
URLS
Broker Host: my_host
Port: my_port
Group ID: my_group
CONSOLE COMMANDS
working - Running the console-consumer script, I am able to see the data:
kafka-console-consumer --bootstrap-server my_host:my_port --consumer.config client-ssl.properties --skip-message-on-error --topic my_topic | jq
Note: client-ssl.properties: points to the JKS file which has the certs.
Further debugging on the Kafka cluster (separate EC2 instance from consumer), I couldn't see any registration of my consumer by my group_id (my_group):
kafka-consumer-groups --botstrap-server my_host:my_port --command-config client-ssl.properties --descrive --group my_group
This leads me to believe the consumer is not getting registered on the cluster, so may be the SSL handshake is failing? How do I check this from consumer side in python?
Note
- the cluster is behind a proxy (corporate), but I do run the proxy on the consumer EC2 before testing.
- ran the process via pm2, yet didn't see any error logs like req timeouts etc.
Is there any way I can check that the Consumer creation is failing in a definite way and find out the root cause? Any help and feedback is appreciated.
I have Kafka v1.0.1 running on the single node and I am able to push the messages to the topic but somehow unable to consume the message from another node using the below python code.
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'kotak-test',
bootstrap_servers=['kmblhdpedge:9092'],
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',
value_deserializer-lambda x: loads (x.decode('utf-8')))
for message in consumer:
message = message.value
print (message)
I am constantly pushing the messages from the console using the below command:
bin/kafka-console-producer --zookeeper <zookeeper-node>:<port> --topic <topic_name>
and also I can read via console
You're using the old Zookeeper based producer, but the newer Kafka based Consumer. The logic for how these work and store offsets are not the same.
You need to use --broker-list on the Console Producer
Similarly with Console Consumer, use --bootstrap-server, not --zookeeper
Also, these properties should not have spaces in them
auto offset reset='earliest',
enable auto commit=True,
group id=' test1',
The code for my Kafka consumer looks like this
def read_messages_from_kafka():
topic = 'my-topic'
consumer = KafkaConsumer(
bootstrap_servers=['my-host1', 'my-host2'],
client_id='my-client',
group_id='my-group',
auto_offset_reset='earliest',
enable_auto_commit=False,
api_version=(0, 8, 2)
)
consumer.assign([TopicPartition(topic, 0), TopicPartition(topic, 1)])
messages = consumer.poll(timeout_ms=kafka_config.poll_timeout_ms, max_records=kafka_config.poll_max_records)
for partition in messages.values():
for message in partition:
log.info("read {}".format(message))
if messages:
consumer.commit()
next_offset0, next_offset1 = consumer.position(TopicPartition(topic, 0)), consumer.position(TopicPartition(topic, 1))
log.info("next offset0={} and offset1={}".format(next_offset0, next_offset1))
while True:
read_messages_from_kafka()
sleep(kafka_config.poll_sleep_ms / 1000.0)
I have realised that this setup of consumer is not able to read all the messages. And I am not able to reproduce this as it's intermittent issue.
When I compare last 100 messages using kafka-cat to this consumer, I found that my consumer intermittently misses few messages randomly. What's wrong with my consumer?
kafkacat -C -b my-host1 -X broker.version.fallback=0.8.2.1 -t my-topic -o -100
There are just too many ways to consume messages in python. There should be one and preferably only one obvious way to do it.
There is a problem of missing messages in your Kafka client.
I found solution here:
while True:
raw_messages = consumer.poll(timeout_ms=1000, max_records=5000)
for topic_partition, messages in raw_messages.items():
application_message = json.loads(message.value.decode())
Also there is another Kafka client exists: confluent_kafka. It has no such problem.
I am having trouble with KafaConsumer to make it read from the beginning, or from any other explicit offset.
Running the command line tools for the consumer for the same topic , I do see messages with the --from-beginning option and it hangs otherwise
$ ./kafka-console-consumer.sh --zookeeper {localhost:port} --topic {topic_name} --from-beginning
If I run it through python, it hangs, which I suspect to be caused by incorrect consumer configs
consumer = KafkaConsumer(topic_name,
bootstrap_servers=['localhost:9092'],
group_id=None,
auto_commit_enable=False,
auto_offset_reset='smallest')
print "Consuming messages from the given topic"
for message in consumer:
print "Message", message
if message is not None:
print message.offset, message.value
print "Quit"
Output:
Consuming messages from the given topic
(hangs after that)
I am using kafka-python 0.9.5 and the broker runs kafka 8.2. Not sure what the exact problem is.
Set _group_id=None_ as suggested by dpkp to emulate the behavior of console consumer.
The difference between the console-consumer and the python consumer code you have posted is the python consumer uses a consumer group to save offsets: group_id="test-consumer-group" . If instead you set group_id=None, you should see the same behavior as the console consumer.
I ran into the same problem: I can recieve in kafka console but can't get message with python script using package kafka-python.
Finally I figure the reason is that I didn't call producer.flush() and producer.close() in my producer.py which is not mentioned in its documentation .
auto_offset_reset='earliest' solved it for me.
auto_offset_reset='earliest' and group_id=None solved it for me.
My take is: to print and ensure offset is what you expect it to be. By using position() and seek_to_beginning(), please see comments in the code.
I can't explain:
Why after instantiating KafkaConsumer, the partitions are not assigned, is this by design? Hack around is to call poll() once before seek_to_beginning()
Why sometimes after seek_to_beginning(), first call to poll() returns no data and doesnt change the offset.
Code:
import kafka
print(kafka.__version__)
from kafka import KafkaProducer, KafkaConsumer
from time import sleep
KAFKA_URL = 'localhost:9092' # kafka broker
KAFKA_TOPIC = 'sida3_sdtest_topic' # topic name
# ASSUMING THAT the topic exist
# write to the topic
producer = KafkaProducer(bootstrap_servers=[KAFKA_URL])
for i in range(20):
producer.send(KAFKA_TOPIC, ('msg' + str(i)).encode() )
producer.flush()
# read from the topic
# auto_offset_reset='earliest', # auto_offset_reset is needed when offset is not found, it's NOT what we need here
consumer = KafkaConsumer(KAFKA_TOPIC,
bootstrap_servers=[KAFKA_URL],
max_poll_records=2,
group_id='sida3'
)
# (!?) wtf, why we need this to get partitions assigned
# AssertionError: No partitions are currently assigned if poll() is not called
consumer.poll()
consumer.seek_to_beginning()
# also AssertionError: No partitions are currently assigned if poll() is not called
print('partitions of the topic: ',consumer.partitions_for_topic(KAFKA_TOPIC))
from kafka import TopicPartition
print('before poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
# (!?) sometimes the first call to poll() returns nothing and doesnt change the offset
messages = consumer.poll()
sleep(1)
messages = consumer.poll()
print('after poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
print('messages: ', messages)
Output:
2.0.1
partitions of the topic: {0, 1}
before poll() x2:
0
0
after poll() x2:
0
2
messages: {TopicPartition(topic='sida3_sdtest_topic', partition=1): [ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=0, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg0', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1), ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=1, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg1', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1)]}
I faced the same issue before, so I ran kafka-topics locally at the machine running the code to test and I got UnknownHostException. I added the IP and the host name in hosts file and it worked fine in both kafka-topics and the code.
It seems that KafkaConsumer was trying to fetch the messages but failed without raising any exceptions.
For me, I had to specify the router's IP in the kafka PLAINTEXT configuration.
Get the router's IP with:
echo $(ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)
and then add PLAINTEXT_HOST://<touter_ip>:9092 to the kafka advertised listeners. In case of a confluent docker service the configuration is as follows:
kafka:
image: confluentinc/cp-kafka:7.0.1
container_name: kafka
depends_on:
- zookeeper
ports:
- 9092:9092
- 29092:29092
environment:
- KAFKA_BROKER_ID=1
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://172.28.0.1:9092
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
- KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
- KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
and finally the python consumer is:
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'my-topic',
bootstrap_servers=['172.28.0.1:9092'],
auto_offset_reset = 'earliest',
group_id=None,
)
print('Listening')
for msg in consumer:
print(msg)