How to store real-time chat messages in database?

How to store real-time chat messages in database? - python

I am using mysqldb for my database currently, and I need to integrate a messaging feature that is in real-time. The chat demo that Tornado provides does not implement a database, (whereas the blog does.)
This messaging service also will also double as an email in the future (like how Facebook's message service works. The chat platform is also email.) Regardless, I would like to make sure that my current, first chat version will be able to be expanded to function as email, and overall, I need to store messages in a database.
Is something like this as simple as: for every chat message sent, query the database and display the message on the users' screen. Or, is this method prone to suffer from high server load and poor optimization? How exactly should I structure the "infrastructure" to make this work?
(I apologize for some of the inherent subjectivity in this question; however, I prefer to "measure twice, code once.")
Input, examples, and resources appreciated.
Regards.

Tornado is a single threaded non blocking server.
What this means is that if you make any blocking calls on the main thread you will eventually kill performance. You might not notice this at first because each database call might only block for 20ms. But once you are making more than 200 database calls per seconds your application will effectively be locked up.
However that's quite a few DB calls. In your case that would be 200 people hitting send on their chat message in the same second.
What you probably want to do is use a queue with a non blocking API. So Tornado receives a chat message. You put it on the queue to be saved to the database by another process, then you send the chat message back out to the other chat members.
When someone connects to a chat session you also need to send off a request to the queue for all the previous messages, when the queue responds you send those off to the newly connected user.
That's how I would approach the problem anyway.
Also see this question and answer: Any suggestion for using non-blocking MySQL api on Tornado in Python3?
Just remember, Tornado is single threaded. It's amazing. And can handle thousands of simultaneous connections. But if code in one of those connections blocks for 1 second then NOTHING else will be done for any other connection during that second.

Related

Django Channels: How to pass incoming messages to external script which is running outside of django?

I have started a private project with Django and Channels to build a web-based UI to control the music player daemon (mpd) on raspberry pi. I know that there are other projects like Volumio or moode audio etc. out of the box that is doing the same, but my intension is to learn something new!
Up to now I have managed to setup a nginx server on the pi that communicates with my devices like mobile phone or pc. In the background nginx communicates with an uWSGI server for http requests to Django and a daphne server as asgi for ws connection to Django Channels. As well there is a redis server installed as backend because the Channels Layer needs this. So, on client request a simple html page as UI is served and a websocket connection is established so far.
In parallel I have a separate script as a mpd handler which is wrapped in a while loop to keep it alive, and which does all the stuff with mpd using the python module python-mpd2.
The mpd handler shall get its commands via websocket from the clients/consumers like play, stop etc. and reacts on that. At the same time, it shall send the timeline of the song when a song is playing, let’s say every one second as well via websocket. I could manage to send frequently data to all connected clients/consumers with async_to_sync(channel_layer.group_send) from outside but I couldn’t find a solution how to pass data/commands coming from the clients via websocket to my separate running mpd handler script.
I read in the docs for Django Channels that it is not recommended to use while loops in the consumers because this will block all the communication – that’s right I have tried this already. Then I tried to receive messages with the command async_to_sync(channel_layer.receive)('channel_name') in the mpd handler with a direct connection to a consumer. But this command blocks my mpd handler because it works async although I use async_to_sync.
So, my question:
Is it possible to pass messages to outside of Django Channels to other scripts with channel own methods? Do you have any suggestion how to solve this maybe with other methods or workarounds? I am looking for a reliable solution.
I gave thoughts to that issue and have some ideas, but I don’t know if this will lead to any solution:
Polling:
The clients send frequently messages and requests via websocket to control the mpd and update the UI. In this case no handler would be needed. (I don’t know if this method will generate to much traffic on the websocket and makes it slow. As well, the connection to mpd has to be established frequently and closed again. Don’t know if this works robust.)
Database:
Generate a database where consumers and the mpd handler have access to. The consumers write the incoming messages in a database and the mpd handler reads them out and does the job. (Here I don’t know if there will be problems when the consumers and mpd handler try to access the db at the same time.)
Using Queues with multiprocessing module:
Consumers passes the messages via a queue to the mpd handler. (Don’t know if this is possible.)
Catching up the messages in redis:
Mpd handler listens frequently on redis to catch up the messages. I read that when the Layers are used in common way the groups and channel names are listed on redis only. Messages are passed via redis when the consumers are started as workers. (That would mean that all my consumers must start as background worker, but how?)
I hope you may have a solution to my question. You may realise from my ideas and the question marks involved to solve this problem that I am not an IT expert. As I wrote at the beginning, I have another engineering background and a newbie in this but very interested to learn something new! So please be patient with me when I don’t understand everything immediately.
I hope to read your answers soon and thank you in advance.
Best regards.

Whilst nobody gave an answer to my question, I tried a little bit out some possible options.
I changed the binding of mpd from fix IP to a socket connection and created a mpd_Handler class with some functions/methods like connect to mpd, disconnect, play, pause etc.
This class is imported in Django consumers.py and views.py. Whenever a web client connects to Django or has a new command (like play, skip etc.), the mpd_Handler will perform the command and respond the actual state of mpd like current song metadata.
A second mpd handler which is running outside of Django as a separate script monitors frequently the mpd state to detect any changes. In case of a change at mpd (e.g., the song of web radio stream has changed or the duration time of the song) this handler informs all clients that are connected to Django consumer group with the command async_to_sync(channel_layer.group_send) so that the clients can update their UI.
At the moment it works, and I hope this is a good solution and helps others who have the same problem. Other suggestions are still welcome!
Best regards.

Why do chat applications have to be asynchronous?

I need to implement a chat application for my web service (that is written in Django + Rest api framework). After doing some google search, I found that Django chat applications that are available are all deprecated and not supported anymore. And all the DIY (do it yourself) solutions I found are using Tornado or Twisted framework.
So, My question is: is it OK to make a Django-only based synchronous chat application? And do I need to use any asynchronous framework? I have very little experience in backend programming, so I want to keep everything as simple as possible.

Django, like many other web framework, is constructed around the concept of receiving an HTTP request from a web client, processing the request and sending a response. Breaking down that flow (simplified for sake of clarity):
The remote client opens TCP connection with your Django server.
The client sends a HTTP request to the server, having a path, some headers and possibly a body.
Server sends a HTTP response.
Connection is closed
Server goes back to a state where it waits for a new connection.
A chat server, if it needs to be somewhat real-time, needs to be different: it needs to maintain many simultaneous open connections with connected clients, so that when new messages are published on a channel, the appropriate clients are notified accordingly.
A modern way of implementing that is using WebSockets. This communication flow between the client and server starts with a HTTP request, like the one described above, but the client sends a special Upgrade HTTP request to the server, asking for the session to switch over from a simple request/response paradigm to a persistent, "full-duplex" communication model, where both the client and server can send messages at any time in both direction.
The fact that the connections with multiple simultaneous clients needs to be persistent means you can't have a simple execution model where a single request would be taken care of by your server at a time, which is usually what happens in what you call synchronous servers. Tornado and Twisted have different models for doing networking, using multithreading, so that multiple connections can be left open and taken care of simultanously by a server, and making a chat service possible.
Synchronous approach nevertheless
Having said that, there are ways to implement a very simple, non-scalable chat service with apparent latency:
Clients perform POST requests to your server to send messages to channels.
Clients perform periodical GET requests to the server to ask for any new messages to the channels they're subscribed to. The rate at which they send these requests is basically the refresh rate of the chat app.
With this approach, your server will work significantly harder than if it had a asynchronous execution model for maintaining persistent connections, but it will work.

If you're going to make a chat app, you'll want to use websockets. They'll make getting updates to all clients participating in a conversation significantly easier and it'll give you real time conversations within your app. Having said that, I've never seen websockets used within a synchronous framework.
If it's OK to make Django-only based synchronous chat application? Too many unanswered questions for a reasonable answer. How many people will use this chat app? How many people per conversation? How long will this app be around? If you're looking to make something simple for you and a couple friends, make what you know. If you're getting paid to make this app, use websockets and use an asynchronous framework.

You certainly can develop a synchronous chat app, you don't necessarily need to us an asynchronous framework. but it all comes down to what do you want your app to do? how many people will use the app? will there be multiple users and multiple chats going on at the same time?

RabbitMQ - Game Rooms and Security Considerations

I'm programming an Android application and want to define rooms. The rooms would hold all the users of certain game. This is like poker with 4 players, where each room can hold 4 users. I also want to use rabbitmq for scalability and customobility. The problem is that the Android application uses the same username:password to connect all users to a RabbitMQ server (specific virtual host).
I guess I'm worried that one user might be able to read/write messages from different queues that it should. There are multiple solutions that are not satisfactory:
Use a different user in each Android application: This really can't be done, because the Android Market doesn't allow different applications for each user that downloads it. Even if it did, it's a stupid idea anyway.
Set appropriate access controls: http://www.rabbitmq.com/access-control.html . I guess this wouldn't prevent the problem of a malicious attacker reading/writing messages from/to queues it doesn't have access to.
Set appropriate routing keys: I guess if each user creates another queue from which it can read messages and published messages to specifically defined queue, this can work. But I guess the problem is the same, since users will be connecting to the RabbitMQ with the same username:password: therefore this user can read all queues and write to them (based on the access rules).
My question is: how to allow an attacker from reading/writing to queues that represent only the rooms he's currently joined in, and preventing access to other queues?

Perhaps I don't understand the application too well, but in my experience RabbitMQ is usually used on the backend, for example, while creating a distributed system with databases and application servers and other loosely coupled entities. Message queuing is an important tool for asynchronous application design, and the fact that each messaging queue can in theory be spawned into a separate process by RabbitMQ makes it remarkably scalable.
What you are alluding to in your question seems more like a access control mechanism for users. I would see this in the front end of a system. For example, having filtering mechanisms on the incoming messages before passing them on to the messaging queues. You might even want to consider DoS prevention via rate control per user.
Cheers!

I am working on a Poker application myself =)
I am relying on something like Akka/Actors (check out Erlang) based traffic over streaming web sockets and hoping it works out (still kind of worried about secure web sockets).
That said, I am also considering RabbitMQ for receiving player actions. I do not think you want to ever expose the username or password to the rabbit queue. As a matter of fact, you probably don't even want the queue server accessible from the outside world.
Instead, set up some server that your users can establish a connection to. This will be your "front end" that the android clients will talk to. Each user will connect to the server via a secure TCP connection and then log into your system. This is where the users will have their own usernames and passwords. If authentication is successful, keep the socket alive (this is where my knowledge of TCP is weak) and then associate the user information with this socket.
When a player makes an action, such as folding or raising, send their action over the secure TCP connection to your "front end" (this connection should still be established). The "front end" then checks which user is connected to this socket, then publishes a message to the queue that would ideally contain the user id, action taken, and the table id. In other words, the only IP allowed to hit the queue is your front end server, and the front end server just uses the single username/password for the rabbit queue.
It's up to you to handle the exchange of the queue message and routing the message to the right table (or making sure the table only handles messages that it's responsible for - which is why I am loving Akka right about now :) Once the message arrives to the table, verify that the user id in the message is the user id whose turn it actually is, and then verify that the action sent is an acceptable one based on the table's state. For example, if I receive a CHECK request and the user can only CALL/FOLD/RAISE, then I will just reply saying invalid action or just throw out the whole message.
Do not let the public get to the queue, and always make sure you do not have security holes, especially if you start dealing with real currencies.
Hope this helps...
EDIT: I just want to be clear. Any time clients make actions, they simply need to send the action and table id or whatever information you need. Do not let them send their user id or any user specific information. Your "front end" server should auto associate the user id based on the socket the request is coming in on. If they submit any user information with their request, it may be a good idea to log it, and then throw out the data. I would log it just because I don't like people trying to cheat, and that's probably what they're doing if they send you unexpected data.

Best way for client to fire off separate process without blocking client/server communication

The end result I am trying to achieve is allow a server to assign specific tasks to a client when it makes it's connection. A simplified version would be like this
Client connects to Server
Server tells Client to run some network task
Client receives task and fires up another process to complete task
Client tells Server it has started
Server tells Client it has another task to do (and so on...)
A couple of notes
There would be a cap on how many tasks a client can do
The client would need to be able to monitor the task/process (running? died?)
It would be nice if the client could receive data back from the process to send to the server if needed
At first, I was going to try threading, but I have heard python doesn't do threading correctly (is that right/wrong?)
Then it was thought to fire of a system call from python and record the PID. Then send certain signals to it for status, stop, (SIGUSR1, SIGUSR2, SIGINT). But not sure if that will work, because I don't know if I can capture data from another process. If you can, I don't have a clue how that would be accomplished. (stdout or a socket file?)
What would you guys suggest as far as the best way to handle this?

Use spawnProcess to spawn a subprocess. If you're using Twisted already, then this should integrate pretty seamlessly into your existing protocol logic.

Use Celery, a Python distributed task queue. It probably does everything you want or can be made to do everything you want, and it will also handle a ton of edge cases you might not have considered yet (what happens to existing jobs if the server crashes, etc.)
You can communicate with Celery from your other software using a messaging queue like RabbitMQ; see the Celery tutorials for details on this.
It will probably be most convenient to use a database such as MySQL or PostgreSQL to store information about tasks and their results, but you may be able to engineer a solution that doesn't use a database if you prefer.

Message queue proxy in Python + Twisted

I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?

You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.

Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.

A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.