I am trying to understand how a Kafka producer works. Below is the python producer code I wrote to send a message. I started the Kafka console consumer first and then I am running the python code
from confluent_kafka import Producer
from Product import Product
from faker import Faker
if __name__ == '__main__':
config = {
"bootstrap.servers":"localhost:9092"
}
producer = Producer(config)
fake = Faker()
product = Product(fake.name())
print(product.serial())
producer.produce(topic="first_topic",value=product.serial())
The issue that I facing is that if I call flush method after calling produce the message appears on the console consumer however without flush the message doesn't appear in the console consumer. As per Kafka documentation flush will make the producer synchronous. Is there a way to avoid using the flush and still make sure the message gets consumed. Thank you
You either need to flush, or you need to fill the batch size of the producer via more messages or reduce the batch size below a single one of your messages via some librdkafka configuration setting within the confluent_kafka package
You can either fill the batch size as #OneCricteer suggested or you can set the linger.ms to 0. So that it will not wait to fill up-to the batch size before send.
Related
I have a unique id in my data and I am sending to kafka with kafka-python library. When I send samne data to kafka topic, it consumes same data anyway. Is there way to make kafka skip previous messages and contiunue from new messages.
def consume_from_kafka():
consumer = KafkaConsumer(
TOPIC,
bootstrap_servers=["localhost"],
group_id='my-group')
Ok, I finally got your question. Avoiding a message that has been sent multiple times by a producer (incidentally) could be very complicated.
There are generally 2 cases:
The simple one where you have a single instance that consumes the messages. In that case your producer can add a uuid to the message payload and your consumer can keep the ids of the processed messages in a in memory cache.
The complicated one is where you have multiple instances that consume messages (that is usually why you'd need message brokers - a distributed system). In this scenario you would need to use an external service that would play the role of the distributed cache. Redis is a good choice. Alternatively you can use a relational database (which you probably already have in your stack) and record processed message ids there.
Hope that helps.
Someone might need this here. I solved the duplicate message problem using the code below; I am using the Kafka-python lib.
consumer = KafkaConsumer('TOPIC', bootstrap_servers=KAFKA,
auto_offset_reset='earliest', enable_auto_commit=True,
auto_commit_interval_ms=1000, group_id='my-group')
I'm currently trying to figure out how to send OSC messages from Python to Max/MSP. I'm currently using osc4py3 to do so, and I have a sample code from the documentation that should hypothetically be working, written out here:
from osc4py3.as_eventloop import *
from osc4py3 import oscbuildparse
# Start the system.
osc_startup()
# Make client channels to send packets.
osc_udp_client("127.0. 0.1", 5000, "tester")
msg = oscbuildparse.OSCMessage("/test/me", ",sif", ["text", 672, 8.871])
osc_send(msg, "tester")
The receiver in Max is just a udprecieve object listening to port 5000. I managed to get Processing to send OSC messages to Max and it worked pretty simply using the oscp5 library, but I can't seem to have the same luck in Python.
What is it I'm missing? Moreover, I don't entirely understand the structure for building OSC messages in osc4py3, even after doing my best with the documentation; if someone would be willing to explain what exactly is going on (namely, the arguments) in something like
msg = oscbuildparse.OSCMessage("/test/me", ",sif", ["text", 672, 8.871])
then I would be forever grateful.
I'm entirely open to using another OSC library, but all I ask is a run-through on how to send a message (I've attempted using pyOSC but that too proved too confusing for me).
Maybe you already solved it but in the posted code there are two problems. One is the IP address format (there is a space before the second "0"). Then you need the command osc.process() at the end. So the following way should work
from osc4py3.as_eventloop import *
from osc4py3 import oscbuildparse
# Start the system.
osc_startup()
# Make client channels to send packets.
osc_udp_client("127.0.0.1", 5000, "tester")
msg = oscbuildparse.OSCMessage("/test/me", ",sif", ["text", 672,
8.871])
osc_send(msg, "tester")
osc_process()
Hope it will work out
There are different possible scheduling policies in osc4py3. The documentation uses the event-loop model with as_eventloop, where user code must periodically call osc_process() to have osc4py3 deal with internal messages queues and communications.
The client example for sending OSC messages wrap osc_process() call in a loop (generally it is into an event processing loop).
You may dismiss osc_process() call simply by importing names with full multithreading scheduling policy at the beginning of your code:
from osc4py3.as_allthreads import *
The third scheduling policy is as_comthreads, where communications are processed in background threads, but received messages (in server side) are processed synchronously at the osc_process() call.
(by the author of osc4py3)
I am just getting started with Kafka, kafka-python. In the code below, I am trying to read the messages as they arrive. But for some reason, the consumer seems to be waiting till a certain number of messages accrue before fetching them.
I initially thought it was because of the producer which was publishing in batch. When I ran "kafka-console-consumer --bootstrap-servers --topic ", I could see every message that was being received as soon as it got published ( as seen on the consumer console)
But the python script is not able to receive the messages in the same way.
def run():
success_consumer = KafkaConsumer('success_logs',
bootstrap_servers=KAFKA_BROKER_URL,
group_id=None,
fetch_min_bytes=1,
fetch_max_bytes=10,
enable_auto_commit=True)
#dummy poll
success_consumer.poll()
for msg in success_consumer:
print(msg)
success_consumer.close()
Can someone point out what configuration changed with KafkaConsumer? Why is it not able to read messages like "kafka-console-consumer" ?
The KafkaConsumer class also has a fetch_max_wait_ms parameter. You should set it to 0
success_consumer = KafkaConsumer(...,fetch_max_wait_ms=0)
I am trying to launch dynamic consumer whenever new topic is created in Kafka but dynamically launched consumer is missing starting/first message always but consuming the message from there on. I am using kafka-python module and am using updated KafkaConsumer and KafkaProducer.
Code for Producer is
producer = KafkaProducer(bootstrap_servers='localhost:9092')
record_metadata = producer.send(topic, data)
and code for consumer is
consumer = KafkaConsumer(topic,group_id="abc",bootstrap_servers='localhost:9092',auto_offset_reset='earliest')
Please suggest something to over come this problem or any configuration i have to include in my producer and consumer instances.
Can you set auto_offset_reset to earliest.
When a new consumer stream is created, it starts from latest offset (which is default value for auto_offset_reset) and you will miss messages which were sent while consumer wasn't started.
You can read about it in kafka python doc. Relevant portion is below
auto_offset_reset (str) – A policy for resetting offsets on
OffsetOutOfRange errors: ‘earliest’ will move to the oldest available
message, ‘latest’ will move to the most recent. Any ofther value will
raise the exception. Default: ‘latest’.
This project is for real time search engine - log analysis performance.
I have a live streaming data out from Spark processing to Kafka.
Now with the Kafka output,
I want to get the data from the Kafka using Flask.. and visualize it using Chartjs or some other visualization..
How do I get the live streaming data from Kafka using the python flask?
Any idea how do I start with?
Any help would be greatly appreciated!
Thanks!
I would check out the Kafka package for python:
http://kafka-python.readthedocs.org/en/master/usage.html
This should get you setup to stream data from Kafka. Additionally, I might check out this project: https://github.com/travel-intelligence/flasfka which has to do with using Flask and Kafka together (just found it on a google search).
I'm working on a similar problem (small Flask app with live streaming data coming out of Kafka).
You have to do a couple things to set this up. First, you need a KafkaConsumer to grab messages:
from kafka import KafkaConsumer
consumer = KafkaConsumer(group_id='groupid', boostrap_servers=kafkakserver)
consumer.subscribe(topics=['topicid'])
try:
# this method should auto-commit offsets as you consume them.
# If it doesn't, turn on logging.DEBUG to see why it gets turned off.
# Not assigning a group_id can be one cause
for msg in consumer:
# TODO: process the kafka messages.
finally:
# Always close your producers/consumers when you're done
consumer.close()
This is about the most basic KafkaConsumer. The for loop blocks the thread and loops until it's committed the last message. There is also the consumer.poll() method to just grab what messages you can in a given time, depending on how you want to architect the data flow. Kafka was designed with long-running consumer processes in mind, but if you're committing messages properly you can open and close consumers on an as needed basis as well.
Now you have the data, so you can stream it to the browser with Flask. I'm not familiar with ChartJS, but live streaming from Flask centers on calling a python function that ends in yield inside a loop instead of just a return at the end of processing.
Check out Michael Grinberg's blog and his followup on streaming as practical examples of streaming with Flask. (Note: anyone actually streaming video in a serious Web app will probably want to encode it into a video codec like the widely used H.264 using ffmpy and wrap it in MPEG-DASH ...or maybe choose a framework that does more of this stuff for you.)