I am just getting started with Kafka, kafka-python. In the code below, I am trying to read the messages as they arrive. But for some reason, the consumer seems to be waiting till a certain number of messages accrue before fetching them.
I initially thought it was because of the producer which was publishing in batch. When I ran "kafka-console-consumer --bootstrap-servers --topic ", I could see every message that was being received as soon as it got published ( as seen on the consumer console)
But the python script is not able to receive the messages in the same way.
def run():
success_consumer = KafkaConsumer('success_logs',
bootstrap_servers=KAFKA_BROKER_URL,
group_id=None,
fetch_min_bytes=1,
fetch_max_bytes=10,
enable_auto_commit=True)
#dummy poll
success_consumer.poll()
for msg in success_consumer:
print(msg)
success_consumer.close()
Can someone point out what configuration changed with KafkaConsumer? Why is it not able to read messages like "kafka-console-consumer" ?
The KafkaConsumer class also has a fetch_max_wait_ms parameter. You should set it to 0
success_consumer = KafkaConsumer(...,fetch_max_wait_ms=0)
Related
I am trying to understand how a Kafka producer works. Below is the python producer code I wrote to send a message. I started the Kafka console consumer first and then I am running the python code
from confluent_kafka import Producer
from Product import Product
from faker import Faker
if __name__ == '__main__':
config = {
"bootstrap.servers":"localhost:9092"
}
producer = Producer(config)
fake = Faker()
product = Product(fake.name())
print(product.serial())
producer.produce(topic="first_topic",value=product.serial())
The issue that I facing is that if I call flush method after calling produce the message appears on the console consumer however without flush the message doesn't appear in the console consumer. As per Kafka documentation flush will make the producer synchronous. Is there a way to avoid using the flush and still make sure the message gets consumed. Thank you
You either need to flush, or you need to fill the batch size of the producer via more messages or reduce the batch size below a single one of your messages via some librdkafka configuration setting within the confluent_kafka package
You can either fill the batch size as #OneCricteer suggested or you can set the linger.ms to 0. So that it will not wait to fill up-to the batch size before send.
I'm writing a python project which need to send message by mqtt. I find a problem that when I send a order which need subscriber to download a big file which need to spend few minutes, whereagfter subscriber called on_connect func again, at the moment the subscriber cannot receive any message it subscribed. this bug happens occasionally.
After many test, I find that ss long as the on_connect function is called after downloading a large file, the subscriber cannot receive other messages.
And subscriber can send a message stating that mqtt has no problem, can also receive after opening a terminal subscription message.
So, I guess the subscription was disconnected after downloading the large file. I need to check the topic of the broker's internal connection subscription to verify my guess.
But I don't konw how to check it. Please tell me the method to inspect broker and how to fix this question if the guess is proved correct
Because there's too much code, I'm going to outline it
cloud send a series of order including download file, modified and the likes by mqtt
devices receive message by mqtt, then operate order and feedback
After devices download big file, there is a probability that other MQTT messages cannot be received when terminal print the result that wait handle : Connected with result code 0 of on_connect func which is type
def on_connect(client, userdata, flags, rc):
print("wait handle : Connected with result code " + str(rc))
The problem is most likely that you are doing long running tasks in either the on_connect or on_message callbacks.
These callbacks run on the MQTT client's network thread, this thread is used to handle the sending and receiving of network packets. If it blocks for too long then the keep alive (time between MQTT packets) will expire and the broker will disconnect the client.
If you have long running tasks they need to be run on a separate thread.
Using subprocess and waiting for it to finish so you can get the output then you are blocking for the length of time the process takes to run, so you might as well be running it on the same thread.
In my python3 code. When i get the a message, first i need to do a long-time job. So i want to acknowledge the message when the job done. But if i dont acknowledge the message right now, it will consume the same message after 1min. So, i want to know can i set this time longer?
my code:
def do_work(body, tag):
print(datetime.datetime.now(), 'I get body:', body, tag)
# simulate a long time job
time.sleep(70)
# ack the message
channel.basic_ack(tag)
if __name__ == '__main__':
for method, properties, body in channel.consume('dolphin'):
t = Thread(target=do_work, args=(body, method.delivery_tag))
t.start()
""" console output:
2019-04-16 17:32:02.200645 I get body: b'2019-04-16 17:31:31.440033' 1
2019-04-16 17:33:05.879708 I get body: b'2019-04-16 17:31:31.440033' 2
2019-04-16 17:34:10.885120 I get body: b'2019-04-16 17:31:31.440033' 3
"""
When you ask a question, you must provide the name and version of all software you are using. In this case, what version of RabbitMQ, Erlang, Python and what Python library you are using.
But if i dont acknowledge the message right now, it will consume the
same message after 1min.
This is probably because your sleep call blocks heartbeat messages and RabbitMQ thinks your client application has died. RabbitMQ will close the connection and re-enqueue the message.
If you are using the Pika library, your code is not correct. You can't acknowledge a message from another thread.
Please see this example code for how to correctly acknowledge a message from another thread.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I have two microservices.
MProducer - sending messages to kafka queue
MConsumer - reading messages from kafka queue
When consumer crashes and restart, I want to continue consuming from last message.
consumer = KafkaConsumer(bootstrap_servers='localhost:9092',
auto_offset_reset='latest',
enable_auto_commit=False)
It looks like you are using kafka-python, so you'll need to pass the group_id argument to your Consumer. See the description for this argument in the KafkaConsumer documentation.
By setting a group id, the Consumer will periodically commit its position to Kafka and will automatically retrieve it upon restarting.
You do that by having a consumer group. Assuming you're using confluent library then just add 'group.id': 'your-group'
When the service is down then coming up, it will start from last committed point.
The information about each consumer group is saved in a special topic in Kafka (starting from v0.9) called __consumer_offsets. More info in kafka docs [https://kafka.apache.org/intro#intro_consumers]
I am trying to launch dynamic consumer whenever new topic is created in Kafka but dynamically launched consumer is missing starting/first message always but consuming the message from there on. I am using kafka-python module and am using updated KafkaConsumer and KafkaProducer.
Code for Producer is
producer = KafkaProducer(bootstrap_servers='localhost:9092')
record_metadata = producer.send(topic, data)
and code for consumer is
consumer = KafkaConsumer(topic,group_id="abc",bootstrap_servers='localhost:9092',auto_offset_reset='earliest')
Please suggest something to over come this problem or any configuration i have to include in my producer and consumer instances.
Can you set auto_offset_reset to earliest.
When a new consumer stream is created, it starts from latest offset (which is default value for auto_offset_reset) and you will miss messages which were sent while consumer wasn't started.
You can read about it in kafka python doc. Relevant portion is below
auto_offset_reset (str) – A policy for resetting offsets on
OffsetOutOfRange errors: ‘earliest’ will move to the oldest available
message, ‘latest’ will move to the most recent. Any ofther value will
raise the exception. Default: ‘latest’.