Comet for User based Notification over a Message Queue

Comet for User based Notification over a Message Queue - python

We trying to build application that should use Comet (AJAX Push) to send notifications to individual users. Most notifications will have a fairly low timeout.
As we are running RabbitMQ, it would be easiest to send messages through AMQP. I am wondering what the best way to address individual users is, so that both the Comet server and the queue server have an easy job.
I have looked at a number of solutions including using
Carrot with Orbited, Tornado, and more.
If the comet server registers one consumer (with the queue) for every user, then these consumers either have to be kept with a timeout, or discarded after every use. Neither solution seems very promising. I imagine something like this would be possible in Tornado/Carrot:
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
user_id = 123
consumer = Consumer(connection=conn, queue="feed", exchange="feed", routing_key=user_id)
consumer.register_callback(self.message_received)
consumer.wait()
def message_received(self, message_data, message):
self.write(simplejson.dumps(message_data))
message.ack()
consumer.close()
self.finish()
Alternatively, the comet server could only have one consumer for the queue and have to implement its own lightweight message queue that can cache incoming notifications until a user connects and uses them. This seems like something that memcached might be good for, but I have no experience with it.
What would be the best approach here?

I had almost the same use case and eventually ended up with Socket.IO for client-side, TornadIO for handling connections and RabbitMQ for message passing (via pika). Works quite well, worth to try it out.

Related

Why do chat applications have to be asynchronous?

I need to implement a chat application for my web service (that is written in Django + Rest api framework). After doing some google search, I found that Django chat applications that are available are all deprecated and not supported anymore. And all the DIY (do it yourself) solutions I found are using Tornado or Twisted framework.
So, My question is: is it OK to make a Django-only based synchronous chat application? And do I need to use any asynchronous framework? I have very little experience in backend programming, so I want to keep everything as simple as possible.

Django, like many other web framework, is constructed around the concept of receiving an HTTP request from a web client, processing the request and sending a response. Breaking down that flow (simplified for sake of clarity):
The remote client opens TCP connection with your Django server.
The client sends a HTTP request to the server, having a path, some headers and possibly a body.
Server sends a HTTP response.
Connection is closed
Server goes back to a state where it waits for a new connection.
A chat server, if it needs to be somewhat real-time, needs to be different: it needs to maintain many simultaneous open connections with connected clients, so that when new messages are published on a channel, the appropriate clients are notified accordingly.
A modern way of implementing that is using WebSockets. This communication flow between the client and server starts with a HTTP request, like the one described above, but the client sends a special Upgrade HTTP request to the server, asking for the session to switch over from a simple request/response paradigm to a persistent, "full-duplex" communication model, where both the client and server can send messages at any time in both direction.
The fact that the connections with multiple simultaneous clients needs to be persistent means you can't have a simple execution model where a single request would be taken care of by your server at a time, which is usually what happens in what you call synchronous servers. Tornado and Twisted have different models for doing networking, using multithreading, so that multiple connections can be left open and taken care of simultanously by a server, and making a chat service possible.
Synchronous approach nevertheless
Having said that, there are ways to implement a very simple, non-scalable chat service with apparent latency:
Clients perform POST requests to your server to send messages to channels.
Clients perform periodical GET requests to the server to ask for any new messages to the channels they're subscribed to. The rate at which they send these requests is basically the refresh rate of the chat app.
With this approach, your server will work significantly harder than if it had a asynchronous execution model for maintaining persistent connections, but it will work.

If you're going to make a chat app, you'll want to use websockets. They'll make getting updates to all clients participating in a conversation significantly easier and it'll give you real time conversations within your app. Having said that, I've never seen websockets used within a synchronous framework.
If it's OK to make Django-only based synchronous chat application? Too many unanswered questions for a reasonable answer. How many people will use this chat app? How many people per conversation? How long will this app be around? If you're looking to make something simple for you and a couple friends, make what you know. If you're getting paid to make this app, use websockets and use an asynchronous framework.

You certainly can develop a synchronous chat app, you don't necessarily need to us an asynchronous framework. but it all comes down to what do you want your app to do? how many people will use the app? will there be multiple users and multiple chats going on at the same time?

How do chat servers distribute messages to multiple clients?

This is really a programming design question more than a specific language or library question. I'm tinkering with the idea of a standalone chat server for websockets that will accept several remote browser-based javascript clients. I'm going for something super simple at first, then might build it up. The server just keeps accepting client connections and listens for messages. When a message is received, it will be sent back to all the clients.
What I need to better understand is which approach is best for sending the messages out to all clients, specifically, sending immediately to all clients, or queuing the messages to each client's queue to be sent when a client connection handler's turn comes up. Below are the two examples in a python-like pseudo-code:
Broadcast Method
def client_handler(client):
while true:
if(client.pending_msg):
rmsg = client.recv()
for c in clients:
c.send(rmsg)
client.sleep(1)
Queue Method
def client_handler(client):
while true:
if client.pending_msg:
rmsg = client.recv()
for c in clients:
c.queue_msg(rmsg)
if client.has_queued:
client.send_queue
client.sleep(1)
What is the best approach? Or, perhaps they are good for different use-cases, in which case, what are the pros, cons and circumstances for which they should be used. Thanks!

First of all, it seems odd to me that a single client handler would know about all the other existing clients. This should be the first thing you should abstract away and create a central message processing handler instead which the individual clients talk to.
That handler can then either send the message directly to the clients (like in your broadcast example), or add them to queues of the clients (like your queue example). Which would be the preferred version depends a bit on your network protocol.
Since you said that you will be using websockets, you have a persistent network connection to the clients anyway, so you can just send them out immediately. There is no real gain to queue (and buffer) the messages. Ideally, a client would just have a send() method anyway, and the client would then internally decide whether that means appending it to a queue or sending it immediately over the network.
Furthermore, since websockets are kind of asynchronous in their nature, you don’t need busy wait loops anyway. You can just listen for messages from the client directly, process those, and broadcast them using your central handler. And since you then don’t have a wait loop anymore, there also would be no place where you work off your queue anymore, making the immediate broadcast the more natural decision.

How to store real-time chat messages in database?

I am using mysqldb for my database currently, and I need to integrate a messaging feature that is in real-time. The chat demo that Tornado provides does not implement a database, (whereas the blog does.)
This messaging service also will also double as an email in the future (like how Facebook's message service works. The chat platform is also email.) Regardless, I would like to make sure that my current, first chat version will be able to be expanded to function as email, and overall, I need to store messages in a database.
Is something like this as simple as: for every chat message sent, query the database and display the message on the users' screen. Or, is this method prone to suffer from high server load and poor optimization? How exactly should I structure the "infrastructure" to make this work?
(I apologize for some of the inherent subjectivity in this question; however, I prefer to "measure twice, code once.")
Input, examples, and resources appreciated.
Regards.

Tornado is a single threaded non blocking server.
What this means is that if you make any blocking calls on the main thread you will eventually kill performance. You might not notice this at first because each database call might only block for 20ms. But once you are making more than 200 database calls per seconds your application will effectively be locked up.
However that's quite a few DB calls. In your case that would be 200 people hitting send on their chat message in the same second.
What you probably want to do is use a queue with a non blocking API. So Tornado receives a chat message. You put it on the queue to be saved to the database by another process, then you send the chat message back out to the other chat members.
When someone connects to a chat session you also need to send off a request to the queue for all the previous messages, when the queue responds you send those off to the newly connected user.
That's how I would approach the problem anyway.
Also see this question and answer: Any suggestion for using non-blocking MySQL api on Tornado in Python3?
Just remember, Tornado is single threaded. It's amazing. And can handle thousands of simultaneous connections. But if code in one of those connections blocks for 1 second then NOTHING else will be done for any other connection during that second.

Design question on Python network programming

I'm currently writing a project in Python which has a client and a server part. I have troubles with the network communication, so I need to explain some things...
The client mainly does operations the server tells him to and sends the results of the operations back to the server. I need a way to communicate bidirectional on a TCP socket.
Current Situation
I currently use a LineReceiver of the Twisted framework on the server side, and a plain Python socket (and ssl) on client side (because I was unable to correctly implement a Twisted PushProducer). There is a Queue on the client side which gets filled with data which should be sent to the server; a subprocess continuously pulls data from the queue and sends it to the server (see code below).
This scenario works well, if only the client pushes its results to the manager. There is no possibility the server can send data to the client. More accurate, there is no way for the client to receive data the server has sent.
The Problem
I need a way to send commands from the server to the client.
I thought about listening for incoming data in the client loop I use to send data from the queue:
def run(self):
while True:
data = self.queue.get()
logger.debug("Sending: %s", repr(data))
data = cPickle.dumps(data)
self.socket.write(data + "\r\n")
# Here would be a good place to listen on the socket
But there are several problems with this solution:
the SSLSocket.read() method is a blocking one
if there is no data in the queue, the client will never receive any data
Yes, I could use Queue.get_nowait() instead of Queue.get(), but all in all it's not a good solution, I think.
The Question
Is there a good way to achieve this requirements with Twisted? I really do not have that much skills on Twisted to find my way round in there. I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event.
Is Twisted (or more general any event driven framework) able to solve this problem? I don't even have real event on the communication side. If the server decides to send data, it should be able to send it; there should not be a need to wait for any event on the communication side, as possible.

"I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event."
You can send data using protocol.transport.write from anywhere, not just in lineReceived.

"I need a way to send commands from the server to the client."
Don't do this. It inverts the usual meaning of "client" and "server". Clients take the active role and send stuff or request stuff from the server.
Is Twisted (or more general any event driven framework) able to solve this problem?
It shouldn't. You're inverting the role of client and server.
If the server decides to send data, it should be able to send it;
False, actually.
The server is constrained to wait for clients to request data. That's generally the accepted meaning of "client" and "server".
"One to send commands to the client and one to transmit the results to the server. Does this solution sound more like a standard client-server communication for you?"
No.
If a client sent messages to a server and received responses from the server, it would meet more usual definitions.
Sometimes, this sort of thing is described as having "Agents" which are -- each -- a kind of server and a "Controller" which is a single client of all these servers.
The controller dispatches work to the agents. The agents are servers -- they listen on a port, accept work from the controller, and do work. Each Agent must do two concurrent things (usually via the select API):
Monitor a well-known socket on which it will receive work from the one-and-only client.
Do the work (in the background).
This is what Client-Server usually means.
If each Agent is a Server, you'll find lots of libraries will support this. This is the way everyone does it.

Message queue proxy in Python + Twisted

I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?

You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.

Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.

A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.