kafka consumer in R

kafka consumer in R - python

I am looking to hack together a kafka consumer in Python or R (preferably R).
Using the kafka console consumer I can grep for a string and retrieve the relevant data but I am at a loss when it comes to parsing it suitably in R.
There are kafka clients available in other languages (for example: PHP, CPP) but one in R would be helpful from a data analytics point of view.
It would be great if the expert R developers on this forum could hint at/suggest resources that would allow me to make headway in this direction.
Apache Kafka : incubator.apache.org/kafka/
Kafka Consumer Client(s) : https://github.com/kafka-dev/kafka/tree/master/clients

[2015 Update] there is a library that allows you to connect to kafka - rkafka
http://cran.r-project.org/web/packages/rkafka/rkafka.pdf

As there is a C++ API for Kafka, you could use Rcpp to bring it to R.
Edit in response to comment on R-only solution: I do not know Kafka well enough to answer, but generally speaking, middleware runs fast, connecting multiple clients, streams etc. So you would to simplify some thing somewhere to get R (single-threaded as it is) to play with it.

Related

How to communicate between a NodeJS graphql-apollo server and a Python project?

Since this stack is new to me I'd figure it wouldnt hurt to ask the community for input.
I'm exploring a new type of project and have been brainstorming how/if it's possible to communicate between an apollo-graphql server (NodeJS) and a client Python program. I have a web development background and I intend to keep my app and server in Typescript/Node.
However ...
As a project and exercise, I would like to play around with connecting a watering system via the GPIO package in Python + RaspberryPi and then send the updates to my apollo-graphql server. I think conceptually, in a traditional RESTful api it would be straightforward ... However, with GraphQL things are a bit more nebulous to me. It would be nice to build out hardware connected services in various languages as appropriate and not be bottled into having everything in typescript/node and to that end Python is the most widely used language within the RaspberryPi community.
Has anyone done this before, or something similar, and have insights and experience to share?
edit: since posting I became more familiar with graphql and played around with it in Postman. It simply has an endpoint that I can post to it with queries (and I assume mutations and everything else) and it's captured by my NodeJS apollo-graphQL server. I ended up using this python package to query against my node API and it seems to work well.

Your server is node and your client is python. The server doesn't care at all whether your client is python or java or C#, it only wants to see GraphQL requests.
I suggest using a python-based GraphQL client package such as python-graphql-client
You're right that in the end everything looks like a POST. Using an actual client might make life a little bit easier for you though.

You can try calling your GraphQL service using a RESTful wrapper: https://graphql.org/blog/rest-api-graphql-wrapper/. There's a similar question asked before: link.
Secondly, instead of calling the server directly, you could also try using a message broker in the middle. In short: the raspi sends messages to a queue that is polled by your server. There are several solutions available like: Rabbitmq.

Difference between Faust vs Kafka-python

I could not find any answer to this: what is the difference between Faust and kafka-python?
Is there any pros/cons on preferring any one of them?
As I understand it:
Kafka is written in Java, and Kafka-python is a Python client to communicate with "Java stream"
Faust is a pure "Python stream"
So, if I plan to use only Python then Faust should be better choice and if I want to have wider compatibility (Go, .NET, C/C#, Java, Python) then use Kafka + Kafka-python?
Note: I am new to using Kafka and I am trying to understand the pros/cons of different solutions.
I would highly appreciate any advice!!

As I understand it you use both with Kafka, and both from Python, but with the difference that:
Faust is for stream processing (filtering, joining, aggregating, etc)
kafka-python (just like confluent-kafka-python also) is a client library providing Consumer, Producer, and Admin APIs for Kafka.
So you could easily use both, for different purposes, from Python.

What's the difference between RabbitMQ and Pusher?

I'm building a django webapp where i need to stream some stock market trades on a webpage in real time. In order to do that, i'm searching for various approaches, and i found about Pusher and RabbitMQ.
With RabbitMQ i would just send the message to RMQ and consume them from Django, in order to get them on the web page. While looking for other solutions, i've also found about Pusher. What it's not clear, to me, is the difference between the two, technically. I don't understand where would i use Rabbit and where would i use Pusher, can someone explain to me how are they different? Thanks in advance!

You may be thinking of data delivery, non-blocking operations or push
notifications. Or you want to use publish / subscribe, asynchronous
processing, or work queues. All these are patterns, and they form
part of messaging.
RabbitMQ is a messaging broker - an intermediary for messaging. It
gives your applications a common platform to send and receive
messages, and your messages a safe place to live until received.
Pusher is a hosted service that makes it super-easy to add real-time data and functionality to web and mobile applications.
Pusher sits as a real-time layer between your servers and your
clients. Pusher maintains persistent connections to the clients -
over WebSocket if possible and falling back to HTTP-based
connectivity - so that as soon as your servers have new data that
they want to push to the clients they can do, instantly via Pusher.
Pusher offers libraries to integrate into all the main runtimes and
frameworks. PHP, Ruby, Python, Java, .NET, Go and Node on the server
and JavaScript, Objective-C (iOS) and Java (Android) on the client.

PyKafka Api usage

I am a newbie to Kafka and PyKafka.I know that a producer and a consumer are made in PyKafka via the below code.
from pykafka import KafkaClient
client = KafkaClient("localhost:9092")
topic = client.topics["topicname"]
producer = topic.get_producer()
consumer = topic.get_simple_consumer()
I want to know what is KafkaClient, and how it is helping in creating producer and consumer.
I have read we can create cluster and broker also using client.cluster and client.broker, but I can't understand the use of client here.

To make terms simpler, replace Kafka with "server".
You interact with servers with clients.
To interact with Kafka, in particular, you send messages to topics via producers, and get messages with consumers.
I don't know this library off-hand, but .broker and .cluster aren't actually "making a Kafka broker / cluster", only establishing a connection to an existing one, from which you can issue later commands.
You need the client. on those function calls because the client is a wrapper around both
To know why it is structured in this way, you'd have to ask the developers themselves

pykafka.KafkaClient is the root object of the PyKafka API, providing an interface to Kafka brokers as well as the ability to instantiate consumer and producer instances. The KafkaClient can be thought of as a representation of the totality of one Python process' interaction with a given Kafka cluster. There is no direct comparison between KafkaClient and any of the concepts mentioned in the official Kafka documentation.
It's totally possible in theory to design a python Kafka client library that doesn't have a "client" class like KafkaClient. We decided not to since in our opinion a single root class provides a cleaner, more learnable interface than a bag of various classes.

Scheduled message passing from server to clients: what system to use?

I want to be able to schedule delivery of a lightweight message from a server to a client. This is new territory to me so I'd appreciate some advice on the possible approaches available.
The client is running on a Raspberry Pi using node.js (because I'm using node libraries to control a piece of attached hardware). Eventually there will be multiple clients like it.
The server could be anything, though I'm most familiar with Python, django and node.
I want to be able to access the server from a browser and cause it to schedule a future message to the client, effectively a push notification with a tiny bit of data.
I'm looking at pub-sub and messaging systems to do this; I started writing a system that uses node on both ends and sockets, but the approach I want is more fire-and-forget occasional messages, not constant realtime data exchange. I'm also not a huge fan of the node-cron style scheduling, I'd like to be able to retrieve and alter scheduled events and it felt somewhat heavy-handed to layer this on top of a cron system.
My current solution uses python on the server (so I can write a django web interface) with celery and rabbitmq, using a named queue per client. The client subscribes to that specific queue using node-amqp, and off we go. This also allows me to create queues that multiple clients can be interested in, which is a neat bonus.
This answer makes me think I'm doing the right thing -- but as I'm new to this stuff, it feels like I might be missing something. Are there alternatives I should consider in the world of server-client messaging?

Since you are already using python you could take a look at python remote objects, (pyro).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.