I’m working on a project involving websockets using python-socketio.
Our primary concern is fluidity, each connected user will have a cursor whose position on the board is sent as event every 50ms, boards are identified as (socket) rooms, and we are expecting many of these.
I’m new to PubSub, we are horizontally scaling our architecture and it seems to be the fit for events broadcasting.
I had a look at AsyncRedisManager class and from my understanding, it seems any message, sent by any socket on any socketio server (with pub/sub) is then transmitted / published from this server to redis on a single channel of communication. Subscribers to this channel can then see this flow of messages.
I’m hence concerned about 3 things:
Since all messages are simply going through one channel, isn’t this a “design flaw” as some servers might have no sockets connected to “one” specific room (at the moment), still they will be receiving (and pickle.loading), messages they don’t care about at that time.
The actual details of these messages (room, payload, etc.) is pickled.dumped and pickle.loaded by servers. In case of 50 rooms with 50 cursors each sending 25 event/s, isn’t this gonna be a huge CPU-bound bottleneck ?
I’m wrapping my head around the socket.io docs, comparing side by side the redis adapter to Python-socketio pubsub manager, and it seems channels are dynamically namespaced like “socketio#room_name” and messages broadcasted to these “namespaced” channels so that psubscribe would be a viable solution. Some other MQ refer in the terms of “topics”.
if the former assumption, is correct, still we cannot assume whether one server should or should not psubscribe to a channel#room_name unless no or at least one socket for that server is in the room.
I understand the paradigm of pub/sub is, from Redis page:
Rather, published messages are characterized into channels, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more channels, and only receive messages that are of interest, without knowledge of what (if any) publishers there are.
But my question would be summarized as:
is it possible to make Python-socketio servers dynamically subscribe/unsubscribe to channels whenever there is a need for it, with channels identified as rooms, hence having as many channels as rooms in total. Would that be feasible while keeping this “plug-&-play” simple logic as a pubsubManager subclass? Am I missing something or does this make sense ?
Thank you for your time, any ideas, corrections, or “draft” code would be greatly appreciated.
is it possible to make Python-socketio servers dynamically subscribe/unsubscribe to channels whenever there is a need for it, with channels identified as rooms
I guess it is possible, with a custom client manager class. You would need to inherit from one of the existing client managers or the base client manager and implement a different pub/sub logic that fits your needs. But keep in mind that if you have 10,000 clients, there's going to be at least 10,000 rooms, since each clients gets a personal room.
Related
I am working on a small programming game/environment in Python to help my younger brother learn to code. It needs to operate over a network, and I am new to network programming. I am going to explain the concept of the game so that someone can point me in the best direction.
The idea is a simple grid of 25x25 'diodes,' squares with fixed positions and editable color values, essentially simulating a very small screen. In addition to the grid display, there is a command window, where Python code can be entered and sent to an instance of InteractiveConsole, and a chat window. A client needs to be able to send Python commands to the host, which will run the code, and then receive the output in the form of a string representing changes to the grid. My concept for doing this involves maintaining a queue on the host side of incoming and outgoing events to handle and relay to the clients on individual threads. Any given command/chat event will be sent to the host and relayed to all clients, including the client who created the event, so that those events are visible to all clients in their command/chat windows. All changes to the grid will originate with the host as a result of processing commands originated from clients and will also be sent out to all clients.
What I primarily don't understand is how to synchronize between all clients, i.e. how to know when a given item in the queue has been successfully sent out to all clients before clearing it from the queue, since any individual thread doing so prematurely will prevent the item from being sent to other clients. This is an extremely open-ended question because I understand that I will definitely need to consume some learning materials before I'm ready to implement this. I'm not asking for a specific solution but rather for some guidance on what general type of solution could work in my situation. I'm doing this in my spare time, so I don't want to spend a month going through networking tutorials that aren't pointing me in a direction that will be applicable to this project.
My approach would be to use a udp server that can broadcast to multiple clients. So basically, all the clients would connect to this server during a game session, and the server would broadcast the game state to the clients as it is updated. Since your game is relatively simple this approach would give you real time updates.
I am using mysqldb for my database currently, and I need to integrate a messaging feature that is in real-time. The chat demo that Tornado provides does not implement a database, (whereas the blog does.)
This messaging service also will also double as an email in the future (like how Facebook's message service works. The chat platform is also email.) Regardless, I would like to make sure that my current, first chat version will be able to be expanded to function as email, and overall, I need to store messages in a database.
Is something like this as simple as: for every chat message sent, query the database and display the message on the users' screen. Or, is this method prone to suffer from high server load and poor optimization? How exactly should I structure the "infrastructure" to make this work?
(I apologize for some of the inherent subjectivity in this question; however, I prefer to "measure twice, code once.")
Input, examples, and resources appreciated.
Regards.
Tornado is a single threaded non blocking server.
What this means is that if you make any blocking calls on the main thread you will eventually kill performance. You might not notice this at first because each database call might only block for 20ms. But once you are making more than 200 database calls per seconds your application will effectively be locked up.
However that's quite a few DB calls. In your case that would be 200 people hitting send on their chat message in the same second.
What you probably want to do is use a queue with a non blocking API. So Tornado receives a chat message. You put it on the queue to be saved to the database by another process, then you send the chat message back out to the other chat members.
When someone connects to a chat session you also need to send off a request to the queue for all the previous messages, when the queue responds you send those off to the newly connected user.
That's how I would approach the problem anyway.
Also see this question and answer: Any suggestion for using non-blocking MySQL api on Tornado in Python3?
Just remember, Tornado is single threaded. It's amazing. And can handle thousands of simultaneous connections. But if code in one of those connections blocks for 1 second then NOTHING else will be done for any other connection during that second.
I'm programming an Android application and want to define rooms. The rooms would hold all the users of certain game. This is like poker with 4 players, where each room can hold 4 users. I also want to use rabbitmq for scalability and customobility. The problem is that the Android application uses the same username:password to connect all users to a RabbitMQ server (specific virtual host).
I guess I'm worried that one user might be able to read/write messages from different queues that it should. There are multiple solutions that are not satisfactory:
Use a different user in each Android application: This really can't be done, because the Android Market doesn't allow different applications for each user that downloads it. Even if it did, it's a stupid idea anyway.
Set appropriate access controls: http://www.rabbitmq.com/access-control.html . I guess this wouldn't prevent the problem of a malicious attacker reading/writing messages from/to queues it doesn't have access to.
Set appropriate routing keys: I guess if each user creates another queue from which it can read messages and published messages to specifically defined queue, this can work. But I guess the problem is the same, since users will be connecting to the RabbitMQ with the same username:password: therefore this user can read all queues and write to them (based on the access rules).
My question is: how to allow an attacker from reading/writing to queues that represent only the rooms he's currently joined in, and preventing access to other queues?
Perhaps I don't understand the application too well, but in my experience RabbitMQ is usually used on the backend, for example, while creating a distributed system with databases and application servers and other loosely coupled entities. Message queuing is an important tool for asynchronous application design, and the fact that each messaging queue can in theory be spawned into a separate process by RabbitMQ makes it remarkably scalable.
What you are alluding to in your question seems more like a access control mechanism for users. I would see this in the front end of a system. For example, having filtering mechanisms on the incoming messages before passing them on to the messaging queues. You might even want to consider DoS prevention via rate control per user.
Cheers!
I am working on a Poker application myself =)
I am relying on something like Akka/Actors (check out Erlang) based traffic over streaming web sockets and hoping it works out (still kind of worried about secure web sockets).
That said, I am also considering RabbitMQ for receiving player actions. I do not think you want to ever expose the username or password to the rabbit queue. As a matter of fact, you probably don't even want the queue server accessible from the outside world.
Instead, set up some server that your users can establish a connection to. This will be your "front end" that the android clients will talk to. Each user will connect to the server via a secure TCP connection and then log into your system. This is where the users will have their own usernames and passwords. If authentication is successful, keep the socket alive (this is where my knowledge of TCP is weak) and then associate the user information with this socket.
When a player makes an action, such as folding or raising, send their action over the secure TCP connection to your "front end" (this connection should still be established). The "front end" then checks which user is connected to this socket, then publishes a message to the queue that would ideally contain the user id, action taken, and the table id. In other words, the only IP allowed to hit the queue is your front end server, and the front end server just uses the single username/password for the rabbit queue.
It's up to you to handle the exchange of the queue message and routing the message to the right table (or making sure the table only handles messages that it's responsible for - which is why I am loving Akka right about now :) Once the message arrives to the table, verify that the user id in the message is the user id whose turn it actually is, and then verify that the action sent is an acceptable one based on the table's state. For example, if I receive a CHECK request and the user can only CALL/FOLD/RAISE, then I will just reply saying invalid action or just throw out the whole message.
Do not let the public get to the queue, and always make sure you do not have security holes, especially if you start dealing with real currencies.
Hope this helps...
EDIT: I just want to be clear. Any time clients make actions, they simply need to send the action and table id or whatever information you need. Do not let them send their user id or any user specific information. Your "front end" server should auto associate the user id based on the socket the request is coming in on. If they submit any user information with their request, it may be a good idea to log it, and then throw out the data. I would log it just because I don't like people trying to cheat, and that's probably what they're doing if they send you unexpected data.
Need some direction on this.
I'm writing a chat room browser-application, however there is a subtle difference.
These are collaboration chats where one person types and the other person can see live ever keystroke entered by the other person as they type.
Also the the chat space is not a single line but a textarea space, like the one here (SO) to enter a question.
All keystrokes including tabs/spaces/enter should be visible live to the other person. And only one person can type at one time (I guess locking should be trivial)
I haven't written a multiple chatroom application. A simple client/server where both are communicatiing over a port is something I've written.
So here are the questions
1.) How is a multiple chatroom application written ? Is it also port based ?
2.) Showing the other persons every keystroke as they type is I guess possible through ajax. Is there any other mechanism available ?
Note : I'm going to use a python framework (web2py) but I don't think framework would matter here.
Any suggestions are welcome, thanks !
The Wikipedia entry for Comet (programming) has a pretty good overview of different approaches you can take on the client (assuming that your client's a web browser), and those approaches suggest the proper design for the server (assuming that the server's a web server).
One thing that's not mentioned on that page, but that you're almost certainly going to want to think about, is buffering input on the client. I don't think it's premature optimization to consider that a multi-user application in which every user's keystroke hits the server is going to scale poorly. I'd consider having user keystrokes go into a client-side buffer, and only sending them to the server when the user hasn't typed anything for 500 milliseconds or so.
You absolutely don't want to use ports for this. That's putting application-layer information in the transport layer, and it pushes application-level concerns (the application's going to create a new chat room) into transport-level concerns (a new port needs to be opened on the firewall).
Besides, a port's just a 16-bit field in the packet header. You can do the same thing in the design of your application's messages: put a room ID and a user ID at the start of each message, and have the server sort it all out.
The thing that strikes me as a pain about this is figuring out, when a client requests an update, what should be sent. The naive solution is to retain a buffer for each user in a room, and maintain an index into each (other) user's buffer as part of the user state; that way, when user A requests an update, the server can send down everything that users B, C, and D have typed since A's last request. This raises all kind of issues about memory usage and persistence that don't have obvious simple solutions
The right answers to the problems I've discussed here are going to depend on your requirements. Make sure those requirements are defined in great detail. You don't want to find yourself asking questions like "should I batch together keystrokes?" while you're building this thing.
You could try doing something like IRC, where the current "room" is sent from the client to the server "before" the text (/PRIVMSG #room-name Hello World), delimited by a space. For example, you could send ROOMNAME Sample text from the browser to the server.
Using AJAX would be the most reasonable option. I've never used web2py, but I'm guessing you could just use JSON to parse the data between the browser and the server, if you wanted to be fancy.
Is it possible with RabbitMQ and Python to do content-based routing?
The AMQP standard and RabbitMQ claims to support content-based routing, but are there any libraries for Python which support specifying content-based bindings etc.?
The library I am currently using (py-amqplib http://barryp.org/software/py-amqplib/) seems to only support topic-based routing with simple pattern-matching (#, *).
The answer is "yes", but there's more to it... :)
Let's first agree on what content-based routing means. There are two possible meanings. Some people say that it is based on the header portion of a message. Others say it's based on the data portion of a message.
If we take the first definition, these are more or less the assumptions we make:
The data comes into existence somewhere, and it gets sent to the AMQP broker by some piece of software. We assume that this piece of software knows enough about the data to put key-value (KV) pairs in the header of the message that describe the content. Ideally, the sender is also the producer of the data, so it has as much information as we could ever want. Let's say the data is an image. We could then have the sender put KV pairs in the message header like this:
width=1024
height=768
mode=bw
photographer=John Doe
Now we can implement content-based routing by creating appropriate queues. Let's say we have a separate operation to perform on black-and-white images and a separate one on colour images. We can create two queues, one that receives messages with mode=bw and another with mode=colour. Then we have separate clients listening on those queues. The broker performs the routing, and there is nothing in our client that needs to be aware of the routing.
If we take the second definition, we go from different assumptions. We assume that the data comes into existence somewhere, and it gets sent to AMQP broker by some piece of software. But we assume that it's not sensible to demand that that software should populate the header with KV pairs. Instead, we want to make a routing decision based on the data itself.
There are two options for this in AMQP: you can decide to implement a new exchange for your particular data format, or you can delegate the routing to a client.
In RabbitMQ, there are direct (1-to-1), fanout (1-to-N), headers (header-filtered 1-to-N) and topic (topic-filtered 1-to-N) exchanges, but you can implement your own according to the AMQP standard. This would require reading a lot of RabbitMQ documentation and implementing the exchange in Erlang.
The other option is to make an AMQP client in Python that listens to a special "content routing queue". Whenever a message arrives at the queue, your router-client picks it up, does whatever is needed to make a routing decision, and sends the message back to the broker to a suitable queue. So to implement the scenario above, your Python program would detect whether an image is in black-and-white or colour, and would (re)send it to a "black-and-white" or a "colour" queue, where some suitable client would take over.
So on your second question, there's really nothing that you do in your client that does any content-based binding. Either your client(s) work as described above, or you create a new exchange type in RabbitMQ itself. Then, in your client setup code, you define the exchange type to be your new type.
Hope this answers your question!
In RabbitMQ, routing is the process by which an exchange decides which queues to place your message on. You publish all messages to an exchange, but you only receive messages from a queue. This means that the exchange is an active part of the process that makes some decisions about message forwarding or copying.
The topic exchange included with RabbitMQ looks at a string on the incoming messages (the routing_key) and matches that with the patterns (the binding_keys) supplied by all queues which declare their desire to receive messages from the exchange.
RabbitMQ source code is on the web so you can have a look at the topic exchange code here:
http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_topic.erl
A lot of the complexity there is to handle a data structure called a trie which allows for very fast lookups. In fact the same data structure is used inside Internet routers.
The headers exchange found here http://hg.rabbitmq.com/rabbitmq-server/file/9b22dde04c9f/src/rabbit_exchange_type_headers.erl
is probably easier to understand. As you can see there is not a lot of code required to make a different type of exchange. If you wanted to examine the content (or maybe just peek at the first few bytes of messages, you should be able to quickly identify XML versus JSON versus something else. And if your JSON objects and XML documents maintain a specific sequence of elements then you should be able to distinguish between different JSON objects (or XML doc types) without parsing the entire message body.