This project is for real time search engine - log analysis performance.
I have a live streaming data out from Spark processing to Kafka.
Now with the Kafka output,
I want to get the data from the Kafka using Flask.. and visualize it using Chartjs or some other visualization..
How do I get the live streaming data from Kafka using the python flask?
Any idea how do I start with?
Any help would be greatly appreciated!
Thanks!
I would check out the Kafka package for python:
http://kafka-python.readthedocs.org/en/master/usage.html
This should get you setup to stream data from Kafka. Additionally, I might check out this project: https://github.com/travel-intelligence/flasfka which has to do with using Flask and Kafka together (just found it on a google search).
I'm working on a similar problem (small Flask app with live streaming data coming out of Kafka).
You have to do a couple things to set this up. First, you need a KafkaConsumer to grab messages:
from kafka import KafkaConsumer
consumer = KafkaConsumer(group_id='groupid', boostrap_servers=kafkakserver)
consumer.subscribe(topics=['topicid'])
try:
# this method should auto-commit offsets as you consume them.
# If it doesn't, turn on logging.DEBUG to see why it gets turned off.
# Not assigning a group_id can be one cause
for msg in consumer:
# TODO: process the kafka messages.
finally:
# Always close your producers/consumers when you're done
consumer.close()
This is about the most basic KafkaConsumer. The for loop blocks the thread and loops until it's committed the last message. There is also the consumer.poll() method to just grab what messages you can in a given time, depending on how you want to architect the data flow. Kafka was designed with long-running consumer processes in mind, but if you're committing messages properly you can open and close consumers on an as needed basis as well.
Now you have the data, so you can stream it to the browser with Flask. I'm not familiar with ChartJS, but live streaming from Flask centers on calling a python function that ends in yield inside a loop instead of just a return at the end of processing.
Check out Michael Grinberg's blog and his followup on streaming as practical examples of streaming with Flask. (Note: anyone actually streaming video in a serious Web app will probably want to encode it into a video codec like the widely used H.264 using ffmpy and wrap it in MPEG-DASH ...or maybe choose a framework that does more of this stuff for you.)
Related
I am going to use Kafka as a message broker in my application. This application is written entirely using Python. For a part of this application (Login and Authentication), I need to implement a request-reply messaging system. In other words, the producer needs to get the response of the produced message from the consumer, synchronously.
Is it feasible using Kafka and its Python libraries (kafka-python, ...) ?
I'm facing the same issue (request-reply for an HTTP hit in my case)
My first bet was (100% python):
start a consumer thread,
publish the request message (including a request_id)
join the consumer thread
get the answer from the consumer thread
The consumer thread subscribe to the reply topic (seeked to end) and deals with received messages until finding the request_id (modulus timeout)
If it works for a basic testing, unfortunatly, creating a KafkaConsumer object is a slow process (~300ms) so it's not an option for a system with massive traffic.
In addition, if your system deals with parallel request-reply (for example, multi-threaded like a web server is) you'll need to create a KafkaConsumer dedicated to request_id (basically by using request_id as consumer_group) to avoid to have reply to request published by thread-A consumed (and ignored) by thread-B.
So you can't here reclycle your KafkaConsumer and have to pay the creation time for each request (in addition to processing time on backend).
If your request-reply processing is not parallelizable you can try to keep the KafkaConsuser object available for threads started to get answer
The only solution I can see at this point is to use a DB (relational/noSQL):
requestor store request_id in DB (as local as possible) aznd publish request in kafka
requestor poll DB until finding answer to request_id
In parallel, a consumer process receiving messages from reply topic and storing result in DB
But I don't like polling..... It wil generate heavy load on DB in a massive traffic system
My 2CTS
I have an IoT device that is streaming data to a python server every 15ms. The python server uploads the data to kafka and another server consumes it.
I have a partition key for the data based on sensor id. The latency for the first few messages are sub 30ms, but then it skyrockets to 500ms before slowly coming down and then repeating (image below).
My assumption here is that the producer is batching the data before sending it. I can't seem to find a setting to turn this off so that my latency is consistent. Issue seems to happen even if I have a blank message.
Here's the code for my producer
producer = KafkaProducer(
bootstrap_servers=get_kafka_brokers(),
security_protocol='SSL',
ssl_context=get_kafka_ssl_context(),
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
batch_size=0,
acks=1
)
message = {}
producer.send(app.config['NEW_LOG_TOPIC'], message, key=str(device.id).encode('utf-8'))
I have been reading the documentation up and down and have tried several different configurations but nothing has helped. My servers and kafka instance are running on heroku.
Any help is appreciated.
I am a newbie to Kafka and PyKafka.I know that a producer and a consumer are made in PyKafka via the below code.
from pykafka import KafkaClient
client = KafkaClient("localhost:9092")
topic = client.topics["topicname"]
producer = topic.get_producer()
consumer = topic.get_simple_consumer()
I want to know what is KafkaClient, and how it is helping in creating producer and consumer.
I have read we can create cluster and broker also using client.cluster and client.broker, but I can't understand the use of client here.
To make terms simpler, replace Kafka with "server".
You interact with servers with clients.
To interact with Kafka, in particular, you send messages to topics via producers, and get messages with consumers.
I don't know this library off-hand, but .broker and .cluster aren't actually "making a Kafka broker / cluster", only establishing a connection to an existing one, from which you can issue later commands.
You need the client. on those function calls because the client is a wrapper around both
To know why it is structured in this way, you'd have to ask the developers themselves
pykafka.KafkaClient is the root object of the PyKafka API, providing an interface to Kafka brokers as well as the ability to instantiate consumer and producer instances. The KafkaClient can be thought of as a representation of the totality of one Python process' interaction with a given Kafka cluster. There is no direct comparison between KafkaClient and any of the concepts mentioned in the official Kafka documentation.
It's totally possible in theory to design a python Kafka client library that doesn't have a "client" class like KafkaClient. We decided not to since in our opinion a single root class provides a cleaner, more learnable interface than a bag of various classes.
I thinking about architecture of service with video streaming.
2 people call to each other in browsers with webcams (like group talks in google plus, but only 2 people). Their conversation dumping to server. Server know when conversation was started, when it was finished and which clients communicate.
At the backend I plan to use Python/Django. I don't have any ideas about what to use for streaming/frontend(html5, flash). I want to controle streaming process (dumping, start|stop conversation) using python.
What you can recommend to me?
I think you need to look at the following two -
http://videocapture.sourceforge.net/ #For video recording purpose
and
http://twistedmatrix.com/trac/ #For transmission over the network
Following example using Twisted might be of some help -
http://www.rtmpy.org/index.html
I am looking to hack together a kafka consumer in Python or R (preferably R).
Using the kafka console consumer I can grep for a string and retrieve the relevant data but I am at a loss when it comes to parsing it suitably in R.
There are kafka clients available in other languages (for example: PHP, CPP) but one in R would be helpful from a data analytics point of view.
It would be great if the expert R developers on this forum could hint at/suggest resources that would allow me to make headway in this direction.
Apache Kafka : incubator.apache.org/kafka/
Kafka Consumer Client(s) : https://github.com/kafka-dev/kafka/tree/master/clients
[2015 Update] there is a library that allows you to connect to kafka - rkafka
http://cran.r-project.org/web/packages/rkafka/rkafka.pdf
As there is a C++ API for Kafka, you could use Rcpp to bring it to R.
Edit in response to comment on R-only solution: I do not know Kafka well enough to answer, but generally speaking, middleware runs fast, connecting multiple clients, streams etc. So you would to simplify some thing somewhere to get R (single-threaded as it is) to play with it.