Difference between Faust vs Kafka-python - python

I could not find any answer to this: what is the difference between Faust and kafka-python?
Is there any pros/cons on preferring any one of them?
As I understand it:
Kafka is written in Java, and Kafka-python is a Python client to communicate with "Java stream"
Faust is a pure "Python stream"
So, if I plan to use only Python then Faust should be better choice and if I want to have wider compatibility (Go, .NET, C/C#, Java, Python) then use Kafka + Kafka-python?
Note: I am new to using Kafka and I am trying to understand the pros/cons of different solutions.
I would highly appreciate any advice!!

As I understand it you use both with Kafka, and both from Python, but with the difference that:
Faust is for stream processing (filtering, joining, aggregating, etc)
kafka-python (just like confluent-kafka-python also) is a client library providing Consumer, Producer, and Admin APIs for Kafka.
So you could easily use both, for different purposes, from Python.

Related

Python Simple Multiple Client Socket Server

I am wondering how I can make a simple socket server in Python 2.7 which can handle and add/accept multiple clients at a time. I do not want to use Twisted nor threading, nor any libraries; just Python, and sockets. I have looked around SoF (stackoverflow- is that a thing?) and found people asking the same question but not getting a clear answer.
If you are wondering why I need to do this, It's because I want to create a simple data forwarder which forwards client data to another server. I think a very simple example showing me Server.py, Client1.py, and Client2.py is just what I need. Again, just a very simple example with no threading, no twisted, no libraries.
I hope you can help me, I'm fairly new to Python and I feel like this project will help get me on my feet, and I learn great from examples.
Consider using asyncio (available for python 3.3 and later).
Asyncio is the new python standard for single-threaded concurrent programming:
This module provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives.
The documentation provides a few examples:
TCP echo client
TCP echo server
If you're not ready to migrate to python 3, you can use trollius, the portage of asyncio for python 2. There is a few differences between the two modules, as listed in the documentation:
replace asyncio with trollius (or use import trollius as asyncio)
replace yield from ... with yield From(...)
replace yield from [] with yield From(None)
in coroutines, replace return res with raise Return(res)
Other solutions for single-threaded concurrent programming on python 2.7:
gevent: a coroutine-based Python networking library that uses greenlet.
asyncore: built-in asynchronous socket library (echo server example).

AMQP broker written in python?

I'm looking for a amqp broker written in python. Right now I'm using RabbitMQ with pika bindings. RabbitMQ does the job, but it would be nice to find something simpler and more lightweight written in python.
The only one I have found so far is SnakeMQ. But it does not(yet) support multiple queues, which is an requirement in my case.
Does anyone know if there are any alternatives written in python?
Queue.Queue() is batteries-included, couldn't be simpler.

kafka consumer in R

I am looking to hack together a kafka consumer in Python or R (preferably R).
Using the kafka console consumer I can grep for a string and retrieve the relevant data but I am at a loss when it comes to parsing it suitably in R.
There are kafka clients available in other languages (for example: PHP, CPP) but one in R would be helpful from a data analytics point of view.
It would be great if the expert R developers on this forum could hint at/suggest resources that would allow me to make headway in this direction.
Apache Kafka : incubator.apache.org/kafka/
Kafka Consumer Client(s) : https://github.com/kafka-dev/kafka/tree/master/clients
[2015 Update] there is a library that allows you to connect to kafka - rkafka
http://cran.r-project.org/web/packages/rkafka/rkafka.pdf
As there is a C++ API for Kafka, you could use Rcpp to bring it to R.
Edit in response to comment on R-only solution: I do not know Kafka well enough to answer, but generally speaking, middleware runs fast, connecting multiple clients, streams etc. So you would to simplify some thing somewhere to get R (single-threaded as it is) to play with it.

I'm looking for a network service that'll let me send messages to selected clients

I have a program which will be running on multiple devices on a network. These programs will need to send data between each other - to specified devices (not all devices).
server = server.Server('192.168.1.10')
server.identify('device1')
server.send('device2', 'this will be pickled and sent to device2')
That's some basic example code for what I need to do. Of course, it will also need to receive.
I was looking at building my own simple message passing server using Twisted when someone pointed me in the direction of MPI. I've never looked into the MPI protocol before and that website gives rather vague examples.
Is MPI a good approach? Are there better alternatives?
MPI is really good at doing the communications for running a tightly-coupled program accross several or many machines in a cluster. If you're running very loosely coupled programs - only interacting occasionally - or the machines are more distributed than within a cluster, like scattered around a LAN - then MPI is probably not what you're looking for.
There are several Open Source message brokers that already handle this kind of stuff for you, and come with a full API ready to use.
You should take a look at:
ActiveMQ which has a Python Stomp client.
RabbitMQ has a Python client too - see Building RabbitMQ apps using Python.
You could build it yourself, but that would be like reinventing the wheel (and on a side-note: I actually only realised I was half-way building a message broker before I started looking at existing solutions - building one takes a lot of work).
Consider using something like ZeroMQ. It supports the most useful messaging idioms - push/pull, publish/subscribe and so on, and although it's not 100% clear from your question which one you need, I'm pretty sure you will find the answer there.
They have a great user guide here, and the Python bindings are well-developed and supported. Some code samples are here.
You can implement MPI functions in order to create a communication between different codes. In this case the server program should public "MPI ports" with differents IDs. Clients should look for this ports and try to connect to them. Only server can accept each communication. Once the communication is stablished, codes can exchange data between them.
Another posibility is to run different programs in Multiple Instruction MPI option. In this case all programs are executed at the same time, and there is not necessity to create port communicators. After they are executed, you can create particular communicators between groups of programms you select.
Please tell me what kind of method you need and I can provide c code to implement the functions.

Python Comet Server

I am building a web application that has a real-time feed (similar to Facebook's newsfeed) that I want to update via a long-polling mechanism. I understand that with Python, my choices are pretty much to either use Stackless (building from their Comet wsgi example) or Cometd + Twisted. Unfortunately there is very little documentation regarding these options and I cannot find good information online about production scale users of comet on Python.
Has anyone successfully implemented comet on Python in a production system? How did you go about doing it and where can I find resources to implement my own?
Orbited seems as a nice solution. Haven't tried it though.
Update: things have changed in the last 2.5 years.
We now have websockets in all major browsers, except IE (naturally) and a couple of very good abstractions over it, that provide many methods of emulating real-time communication.
socket.io along with tornadio (socket.io 0.6) and tornadio2 (socket.io 0.7+)
sock.js along with SockJS-tornado
I recommend you should use StreamHub Comet Server - its used by a lot of people - personally I use it with a couple of Django sites I run. You will need to write a tiny bit of Java to handle the streaming - I did this using Jython. The front-end code is some real simple Javascript a la:
StreamHub hub = new StreamHub();
hub.connect("http://myserver.com/");
hub.subscribe("newsfeed", function(sTopic, oData) { alert("new news item: " + oData.Title); });
The documentation is pretty good - I had similar problems as you trying to get started with the sparse docs of Cometd et al. For a start I'd read Getting Started With Comet and StreamHub, download and see how some of the examples work and reference the API docs if you need to:
Javascript API JSDoc
Streaming from Java Javadoc
Here is a full-featured example of combining Django, Orbited,and Twisted to create a real-time (Comet) app: http://github.com/clemesha/hotdot using Python.
I've done tons of APIs using twisted for stuff like that, most of which are available on my github account.
Most are client-side, but slosh is a server I wrote to do a realtime cheap pubsub sort of thing. It scales somewhat horizontally for reads by allowing for simple stream replication. Writes are a little different when you stick to plain HTTP, but I've pushed a decent amount through it for a demo.
Otherwise, you have full-on BOSH which most XMPP servers support and will allow you to decouple the message distribution from the web frontend.
I haven't done it, but this guy has and writes a good article about it, with Django examples and pointers (which I haven't checked) to other frameworks.
the orbited and redis solutions are nice, but not longer relevant when you have something like the PubSubHubbub that google released. This makes it very easy to be the publisher or the subscriber to a given feed. http://code.google.com/p/pubsubhubbub/
Here's an example that does long-polling with gevent and Django.
It uses greenlet - stack switching functionality from Stackless packaged as a CPython extension.

Categories

Resources