Content-based routing with RabbitMQ and Python

Content-based routing with RabbitMQ and Python - python

Is it possible with RabbitMQ and Python to do content-based routing?
The AMQP standard and RabbitMQ claims to support content-based routing, but are there any libraries for Python which support specifying content-based bindings etc.?
The library I am currently using (py-amqplib http://barryp.org/software/py-amqplib/) seems to only support topic-based routing with simple pattern-matching (#, *).

The answer is "yes", but there's more to it... :)
Let's first agree on what content-based routing means. There are two possible meanings. Some people say that it is based on the header portion of a message. Others say it's based on the data portion of a message.
If we take the first definition, these are more or less the assumptions we make:
The data comes into existence somewhere, and it gets sent to the AMQP broker by some piece of software. We assume that this piece of software knows enough about the data to put key-value (KV) pairs in the header of the message that describe the content. Ideally, the sender is also the producer of the data, so it has as much information as we could ever want. Let's say the data is an image. We could then have the sender put KV pairs in the message header like this:
width=1024
height=768
mode=bw
photographer=John Doe
Now we can implement content-based routing by creating appropriate queues. Let's say we have a separate operation to perform on black-and-white images and a separate one on colour images. We can create two queues, one that receives messages with mode=bw and another with mode=colour. Then we have separate clients listening on those queues. The broker performs the routing, and there is nothing in our client that needs to be aware of the routing.
If we take the second definition, we go from different assumptions. We assume that the data comes into existence somewhere, and it gets sent to AMQP broker by some piece of software. But we assume that it's not sensible to demand that that software should populate the header with KV pairs. Instead, we want to make a routing decision based on the data itself.
There are two options for this in AMQP: you can decide to implement a new exchange for your particular data format, or you can delegate the routing to a client.
In RabbitMQ, there are direct (1-to-1), fanout (1-to-N), headers (header-filtered 1-to-N) and topic (topic-filtered 1-to-N) exchanges, but you can implement your own according to the AMQP standard. This would require reading a lot of RabbitMQ documentation and implementing the exchange in Erlang.
The other option is to make an AMQP client in Python that listens to a special "content routing queue". Whenever a message arrives at the queue, your router-client picks it up, does whatever is needed to make a routing decision, and sends the message back to the broker to a suitable queue. So to implement the scenario above, your Python program would detect whether an image is in black-and-white or colour, and would (re)send it to a "black-and-white" or a "colour" queue, where some suitable client would take over.
So on your second question, there's really nothing that you do in your client that does any content-based binding. Either your client(s) work as described above, or you create a new exchange type in RabbitMQ itself. Then, in your client setup code, you define the exchange type to be your new type.
Hope this answers your question!

In RabbitMQ, routing is the process by which an exchange decides which queues to place your message on. You publish all messages to an exchange, but you only receive messages from a queue. This means that the exchange is an active part of the process that makes some decisions about message forwarding or copying.
The topic exchange included with RabbitMQ looks at a string on the incoming messages (the routing_key) and matches that with the patterns (the binding_keys) supplied by all queues which declare their desire to receive messages from the exchange.
RabbitMQ source code is on the web so you can have a look at the topic exchange code here:
http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_topic.erl
A lot of the complexity there is to handle a data structure called a trie which allows for very fast lookups. In fact the same data structure is used inside Internet routers.
The headers exchange found here http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_headers.erl
is probably easier to understand. As you can see there is not a lot of code required to make a different type of exchange. If you wanted to examine the content (or maybe just peek at the first few bytes of messages, you should be able to quickly identify XML versus JSON versus something else. And if your JSON objects and XML documents maintain a specific sequence of elements then you should be able to distinguish between different JSON objects (or XML doc types) without parsing the entire message body.

Related

Have channels coincide with rooms in Python-socketio, and general pubsub questions

I’m working on a project involving websockets using python-socketio.
Our primary concern is fluidity, each connected user will have a cursor whose position on the board is sent as event every 50ms, boards are identified as (socket) rooms, and we are expecting many of these.
I’m new to PubSub, we are horizontally scaling our architecture and it seems to be the fit for events broadcasting.
I had a look at AsyncRedisManager class and from my understanding, it seems any message, sent by any socket on any socketio server (with pub/sub) is then transmitted / published from this server to redis on a single channel of communication. Subscribers to this channel can then see this flow of messages.
I’m hence concerned about 3 things:
Since all messages are simply going through one channel, isn’t this a “design flaw” as some servers might have no sockets connected to “one” specific room (at the moment), still they will be receiving (and pickle.loading), messages they don’t care about at that time.
The actual details of these messages (room, payload, etc.) is pickled.dumped and pickle.loaded by servers. In case of 50 rooms with 50 cursors each sending 25 event/s, isn’t this gonna be a huge CPU-bound bottleneck ?
I’m wrapping my head around the socket.io docs, comparing side by side the redis adapter to Python-socketio pubsub manager, and it seems channels are dynamically namespaced like “socketio#room_name” and messages broadcasted to these “namespaced” channels so that psubscribe would be a viable solution. Some other MQ refer in the terms of “topics”.
if the former assumption, is correct, still we cannot assume whether one server should or should not psubscribe to a channel#room_name unless no or at least one socket for that server is in the room.
I understand the paradigm of pub/sub is, from Redis page:
Rather, published messages are characterized into channels, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more channels, and only receive messages that are of interest, without knowledge of what (if any) publishers there are.
But my question would be summarized as:
is it possible to make Python-socketio servers dynamically subscribe/unsubscribe to channels whenever there is a need for it, with channels identified as rooms, hence having as many channels as rooms in total. Would that be feasible while keeping this “plug-&-play” simple logic as a pubsubManager subclass? Am I missing something or does this make sense ?
Thank you for your time, any ideas, corrections, or “draft” code would be greatly appreciated.

is it possible to make Python-socketio servers dynamically subscribe/unsubscribe to channels whenever there is a need for it, with channels identified as rooms
I guess it is possible, with a custom client manager class. You would need to inherit from one of the existing client managers or the base client manager and implement a different pub/sub logic that fits your needs. But keep in mind that if you have 10,000 clients, there's going to be at least 10,000 rooms, since each clients gets a personal room.

Multiple chat rooms - Is using ports the only way ? What if there are hundreds of rooms?

Need some direction on this.
I'm writing a chat room browser-application, however there is a subtle difference.
These are collaboration chats where one person types and the other person can see live ever keystroke entered by the other person as they type.
Also the the chat space is not a single line but a textarea space, like the one here (SO) to enter a question.
All keystrokes including tabs/spaces/enter should be visible live to the other person. And only one person can type at one time (I guess locking should be trivial)
I haven't written a multiple chatroom application. A simple client/server where both are communicatiing over a port is something I've written.
So here are the questions
1.) How is a multiple chatroom application written ? Is it also port based ?
2.) Showing the other persons every keystroke as they type is I guess possible through ajax. Is there any other mechanism available ?
Note : I'm going to use a python framework (web2py) but I don't think framework would matter here.
Any suggestions are welcome, thanks !

The Wikipedia entry for Comet (programming) has a pretty good overview of different approaches you can take on the client (assuming that your client's a web browser), and those approaches suggest the proper design for the server (assuming that the server's a web server).
One thing that's not mentioned on that page, but that you're almost certainly going to want to think about, is buffering input on the client. I don't think it's premature optimization to consider that a multi-user application in which every user's keystroke hits the server is going to scale poorly. I'd consider having user keystrokes go into a client-side buffer, and only sending them to the server when the user hasn't typed anything for 500 milliseconds or so.
You absolutely don't want to use ports for this. That's putting application-layer information in the transport layer, and it pushes application-level concerns (the application's going to create a new chat room) into transport-level concerns (a new port needs to be opened on the firewall).
Besides, a port's just a 16-bit field in the packet header. You can do the same thing in the design of your application's messages: put a room ID and a user ID at the start of each message, and have the server sort it all out.
The thing that strikes me as a pain about this is figuring out, when a client requests an update, what should be sent. The naive solution is to retain a buffer for each user in a room, and maintain an index into each (other) user's buffer as part of the user state; that way, when user A requests an update, the server can send down everything that users B, C, and D have typed since A's last request. This raises all kind of issues about memory usage and persistence that don't have obvious simple solutions
The right answers to the problems I've discussed here are going to depend on your requirements. Make sure those requirements are defined in great detail. You don't want to find yourself asking questions like "should I batch together keystrokes?" while you're building this thing.

You could try doing something like IRC, where the current "room" is sent from the client to the server "before" the text (/PRIVMSG #room-name Hello World), delimited by a space. For example, you could send ROOMNAME Sample text from the browser to the server.
Using AJAX would be the most reasonable option. I've never used web2py, but I'm guessing you could just use JSON to parse the data between the browser and the server, if you wanted to be fancy.

Simple server/client string exchange protocol

i am looking for an abstract and clean way to exchange strings between two python programs. The protocol is really simple: client/server sends a string to the server/client and it takes the corresponding action - via a handler, i suppose - and replies OR NOT to the other side with another string. Strings can be three things: an acknowledgement, signalling one side that the other on is still alive; a pickled class containing a command, if going from the "client" to the "server", or a response, if going from the "server" to the "client"; and finally a "lock" command, that signals a side of the conversation that the other is working and no further questions should be asked until another lock packet is received.
I have been looking at the python's built in SocketServer.TCPServer, but it's way too low level, it does not easily support reconnection and the client has to use the socket interface, which i preferred to be encapsulated.
I then explored the twisted framework, particularly the LineOnlyReceiver protocol and server examples, but i found the initial learning curve to be too steep, the online documentation assuming a little too much knowledge and a general lack of examples and good documentation (except the 2005 O'reilly book, is this still valid?).
I then tryied the pyliblo library, which is perfect for the task, alas it is monodirectional, there is no way to "answer" a client, and i need the answer to be associated to the specific command.
So my question is: is there an existing framework/library/module that allows me to have a client object in the server, to read the commands from and send the replies to, and a server object in the client, to read the replies from and send the commands to, that i can use after a simple setup (client, the server address is host:port, server, you are listening on port X) having the underlying socket, reconnection engine and so on handled?
thanks in advance to any answer (pardon my english and inexperience, this is my first question)

Python also provides an asyncchat module that simplifies much of the server/client behavior common to chat-like communications.

What you want to do seems a lot like RPC, so the things that come to my mind are XMLRPC or JSON RPC, if you dont want to use XML .
Python has a XMLRPC library that you can use, it uses HTTP as the transport so it also solves your problem of not being too low level. However if you could provide more detail in terms of what you exactly want to do perhaps we can give a better solution.

Python Sockets - Creating a message format

I have built a Python server to which various clients can connect, and I need to set a predefined series of messages from clients to the server - For example the client passes in a name to the server when it first connects.
I was wondering what the best way to approach this is? How should I build a simple protocol for their communication?
Should the messages start with a specific set of bytes to mark them out as part of this protocol, then contain some sort of message id? Any suggestions or further reading appreciated.

First, you need to decide whether you want your protocol to be human readable (much more overhead) or binary. If the first, you probably want to use regular expressions to decode the messages. For this, use the python module re. If the latter, the module struct is your friend.
Second, if you are building a protocol that is somehow stateful (e.g. first we do a handshake, then we transfer data, then we check checksums and say goodbye) you probably want to create a some sort of FSM to track the state.
Third, if protocol design is not a familiar subject, read some simple protocol specifications, for example by IETF
If this is not a learning excercise, you might want to build up from something else, like Python Twisted

Depending on the requirements, you might want to consider using JSON: use "newline" terminated strings with JSON encoding. The transport protocol could be HTTP: with this, you could have access to all the "connection related" facilities (e.g. status codes) and have JSON encoded payload.
The advantages of using JSON over HTTP:
human readable (debugging etc.)
libraries available for all languages/platforms
cross-platform
browser debuggable (to some extent)
Of course, there are many other ways to skin this cat but the time to working prototype using this approach is very low. This is worth considering if your requirements (which aren't very detailed here) can be met.

Read some protocols, and try to find one that looks like what you need. Does it need to be message-oriented or stream-oriented? Does it need request order to be preserved, does it need requests to be paired with responses? Do you need message identifiers? Retries, back-off? Is it an RPC protocol, a message queue protocol?

See http://www.faqs.org/docs/artu/ch05s02.html and http://www.faqs.org/docs/artu/ch05s03.html for a good overview and discussion on data file formats and protocols.

Network programming abstraction, decomposition

I have a problem as follows:
Server process 1
Constantly sends updates that occur to a datastore
Server process 2
Clients contact the server, which queries the datastore, and returns a result
The thing is, the results that process 1 and process 2 are sending back the client are totally different and unrelated.
How does one decompose this?
Do you just have one process constantly sending data, and define the protocol to have a bit which corresponds to whether the return type is 1 or 2?
Do you have two processes? How do they share the datastore then (it is just a structure not a database)?
Thanks!

It sounds like you want to stream your series of ints "somewhere" and also collect them in a datastore. In my system I am streaming sensor readings into a database and also allowing them to go directly to web clients, giving them live power readings. I've written a blog entry on why a database is not suitable for live data - though it is perfect for saving the data for later analysis.
I'd have the first server process be a twisted server that uses txamp to stream the ints to RabbitMQ. Any clients that want live data can subscribe to the stream in RabbitMQ, also using Txamp. Web browser clients can use Orbited here is a worked example.
In your design server 1 saves to the database. You could instead have server3 collect data from RabbitMQ and stream it to the database. I plan to have a server that collects chunks of data and render graphs to store to a central fileshare.
Don't create your own messaging system, RabbitMQ is well tested, scalable, and can persist your "messages" (raw data) if something goes wrong.

If you can restrict yourself to Twisted, I recommend to use Perspective Broker. It's essentially an RPC system, and doesn't care much about the notion of "client" and "server" - either the initiator of a TCP connection or the responder can start RPC calls in PB.
So server 1 would accept registration calls with a callback object, and call the callback whenever it has new data available. Server 2 provides various RPC operations as clients require them. If they operate on the very same data, I would put both servers into a single process.

Why not use a database instead of "just a structure"? Both relational and non-relational DBs offer many practical advantages (separate processes using them, take care of replication [[and/or snapshots, backups, ...]], rich functionality if you need it for the "queries", and so on, and so forth).
Worst case, the "just a structure" can be handled by a third process that's entirely dedicated to it (basically mimicking what any DB engine would offer -- though the engine would probably do it better and faster;-), allowing you to at least keep a good decomposition (with the two server processes both interacting with the "datastore process").

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.