How to synchronize list changes over network in Python

How to synchronize list changes over network in Python - python

I am developing a little group management system where there are two different types of servers. The "client server" which can join and leave groups on the "management server".
There are multiple management servers in a multicast group. So the client servers send the join and leave requests to this multicast group. Due to the fact that IPv6 multicast is not reliable, there is the possibility that some management servers do not receive the requests. So their list óf memberships is not up to date.
So I need a function that I can use to synchronize lists whenever they change. There are three types of changes:
client server leaves group
client server joins group
client server updates its complete list of memberships (so the management server replaces its list)
I thought of creating a log list on each management server that logs the recent changes (maybe of the last 60 seconds). If a server notices a change it informs the other management servers about the change and sends the time with this information. If the receiver has a more recent change it ignores the information of the sender. If not, it updates its list.
But is this the best way to do this? Are there special patterns for such things? Or maybe even a python framework?

Sounds like you want to use gevent. It's useful for exactly the scenario you're talking about where you want events synchronized between multiple nodes. It'll abstract away most of the networking layer for you too, so you can focus on getting work done instead.

Related

python zmq many client to many server discovery message patterns

Strugging on this problem for a while so finally asking for some help from the experts.
Language: python
The problem/setup:
I have many clients, client[n], client[n] .. etc
I have many servers, server[n], server[n] .. etc
Each server can plugin to 5 external ws connections. At any time I may need to open [x] ws connections; maybe 2, maybe 32, the total ws connections i need, thus servers needed, is dynamic...
Each client maybe connecting 1 ws connection from server[1], 1 ws connection from server[2] .. .etc
How I imagine the flow working
New client[1] is loaded, needing 2 ws feeds
New client[1] broadcasts [xpub/xsub ?] message to all servers saying, 'hey, I need these 2 ws connections, who has them?'
Server[1] with the ws connections reply to client[1] (and only that client) - 'I got what youre looking for, talk to me'
client[1] engages in req/reply communication with server[1] so that client[1] can utilize server[1]'s ws connection to make queries against it, eg, 'hey, server[1] with access to ws[1], can you request [x]' .. server[1] replies to client[1] 'heres the reply from the ws request you made'
tldr
clients will be having multiple req/rep with many servers
servers will be dealing with many clients
client need to broadcast/find appropriate clients to be messaging with

The Zyre protocol is specifically designed for brokerless "gossip" discovery. Pyre (https://github.com/zeromq/pyre) is the Python implementation. It provides mechanisms for nodes to join a "discovery group" and share information. Among other things, it allows group members to WHISPER to individual members or SHOUT (multicast) to all members.
Zyre uses UDP broadcast beacons to initiate contact, so it is generally limited to a single subnet (UDP broadcast is generally not forwarded beyond a subnet). However, you could bridge a group across different subnets via your server(s) in each subnet (see below).
You could use zyre to distribute topology information (in this case, your server list) to your clients.
I have only played around with pyre a little, so I may not have all the details exactly right, but I would try to set it up like this:
Define a Zyre group.
Each server...
Joins the group.
Sets its server address (ip or fqdn, and maybe port) in its beacon header.
Each client...
Joins the group.
Reads server address from the HELLO messages it receives from servers.
Makes REQ connections to server(s).
Adds/removes server connections based on HELLO/LEAVE/AVOID messages received over time.
If servers are not in the same subnet (e.g., maybe they are in different AWS availability zones), you could preconfigure the servers to know what all the server IPs are, periodically verify that they are up (via REQ/REP or PUB/SUB between the servers), and SHOUT active-servers information to the local group. Clients could use this information to inform/adjust their list of active servers/connections.
I've thought about doing exactly the above, but it unfortunately hasn't risen above other priorities in the backlog, so I haven't gotten past the above level of detail.

I'll focus on the discovery problem. How do clients know which servers are available and which ws connections each one has?
One approach is to add a third type of node, call it broker. There is a single broker, and all clients and servers know how to reach it. (Eg, all clients and servers are configured with the broker's IP or hostname.)
When a server starts it registers itself with the broker: "I have ws feeds x,y,z and accept requests on 1.2.3.5:1234". The broker tracks this state, maybe in a hash table.
When a client needs ws feed y, it first contacts the broker: "Which server has ws feed y?" If the broker knows who has feed y, it gives the client the server's IP and port. The client can then contact the server directly. (If multiple servers can access feed y, the broker could return a list of servers instead of a single one.)
If servers run for a "long" time, clients can cache the "server X has feed y" information and only talk to the broker when they need to access a new feed.
With this design, clients use the broker to find servers of interest. Servers don't have to know anything about clients at all. And the "real" traffic (clients accessing feeds via servers) is still done directly between clients and servers - no broker involved.
HTH. And for the record I am definitely not an expert.

Python automatically find server with ZMQ

I am using ZMQ to facilitate communications between one server and multiple clients. Is there a method to have the clients automatically find the ZMQ server if they are on the same internal network? My goal would be to have the client be able to automatically detect the IP and Port it should connect to.

It's not possible to do this in any sort of scalable way without some sort of broker or manager that will manage your communications system.
The way that would work is that you have your broker on a known IP:port, and as your server and clients spin up, they connect to the broker, and the broker then tells your endpoints how to communicate to each other.
There are some circumstances where such a communication pattern could make sense, generally when the server and clients are controlled by different entities, and maybe even different from the entity controlling the broker. In your case, it sounds like a dramatic amount of over-engineering. The only other way to do what you're looking for that I'm aware of is to just start brute forcing the network to find open IP:port combinations that respond the way you are looking for. Ick.
I suggest you just define the IP:port you want to use, probably through some method of static configuration you can change manually as necessary, or that can act as sort of a flat-file broker that both ends of the communication can access.

Python - Multiple client servers for scaling

For my current setup, I have a single client server using Tornado, a standalone database server and another standalone server for my website.
I'm looking at having a second client server process running on the same system (to take advantage of its multiple cores) and I would like some advice in locating which server my "clients" have connected to. Each client can have multiple connections (instances).
I've already looked at using memcached to hold a list of user identifiers and link them to which server(s) they are connected to, but that doesn't seem like it would scale very well (eg having six digits of connected users).
I see the same issue with database lookups.
I have already optimized my server as much as possible, without going into micro-optimization and I personally frown upon that.
Current server methodology:
On connect:
Accept connection, rate limit for max connections per IP.
Append client instance to a list named "clientList".
On data from client:
Rate limit for max messages per second.
Append data to a client work queue.
If client has a thread dedicated toward its work queue:
return, its work will be chewed by the current thread
otherwise create a new thread for this users work queue, start it.
TLDR:
How do I efficiently store which servers a client has connected to in order to forward messages to that user.

RabbitMQ - Game Rooms and Security Considerations

I'm programming an Android application and want to define rooms. The rooms would hold all the users of certain game. This is like poker with 4 players, where each room can hold 4 users. I also want to use rabbitmq for scalability and customobility. The problem is that the Android application uses the same username:password to connect all users to a RabbitMQ server (specific virtual host).
I guess I'm worried that one user might be able to read/write messages from different queues that it should. There are multiple solutions that are not satisfactory:
Use a different user in each Android application: This really can't be done, because the Android Market doesn't allow different applications for each user that downloads it. Even if it did, it's a stupid idea anyway.
Set appropriate access controls: http://www.rabbitmq.com/access-control.html . I guess this wouldn't prevent the problem of a malicious attacker reading/writing messages from/to queues it doesn't have access to.
Set appropriate routing keys: I guess if each user creates another queue from which it can read messages and published messages to specifically defined queue, this can work. But I guess the problem is the same, since users will be connecting to the RabbitMQ with the same username:password: therefore this user can read all queues and write to them (based on the access rules).
My question is: how to allow an attacker from reading/writing to queues that represent only the rooms he's currently joined in, and preventing access to other queues?

Perhaps I don't understand the application too well, but in my experience RabbitMQ is usually used on the backend, for example, while creating a distributed system with databases and application servers and other loosely coupled entities. Message queuing is an important tool for asynchronous application design, and the fact that each messaging queue can in theory be spawned into a separate process by RabbitMQ makes it remarkably scalable.
What you are alluding to in your question seems more like a access control mechanism for users. I would see this in the front end of a system. For example, having filtering mechanisms on the incoming messages before passing them on to the messaging queues. You might even want to consider DoS prevention via rate control per user.
Cheers!

I am working on a Poker application myself =)
I am relying on something like Akka/Actors (check out Erlang) based traffic over streaming web sockets and hoping it works out (still kind of worried about secure web sockets).
That said, I am also considering RabbitMQ for receiving player actions. I do not think you want to ever expose the username or password to the rabbit queue. As a matter of fact, you probably don't even want the queue server accessible from the outside world.
Instead, set up some server that your users can establish a connection to. This will be your "front end" that the android clients will talk to. Each user will connect to the server via a secure TCP connection and then log into your system. This is where the users will have their own usernames and passwords. If authentication is successful, keep the socket alive (this is where my knowledge of TCP is weak) and then associate the user information with this socket.
When a player makes an action, such as folding or raising, send their action over the secure TCP connection to your "front end" (this connection should still be established). The "front end" then checks which user is connected to this socket, then publishes a message to the queue that would ideally contain the user id, action taken, and the table id. In other words, the only IP allowed to hit the queue is your front end server, and the front end server just uses the single username/password for the rabbit queue.
It's up to you to handle the exchange of the queue message and routing the message to the right table (or making sure the table only handles messages that it's responsible for - which is why I am loving Akka right about now :) Once the message arrives to the table, verify that the user id in the message is the user id whose turn it actually is, and then verify that the action sent is an acceptable one based on the table's state. For example, if I receive a CHECK request and the user can only CALL/FOLD/RAISE, then I will just reply saying invalid action or just throw out the whole message.
Do not let the public get to the queue, and always make sure you do not have security holes, especially if you start dealing with real currencies.
Hope this helps...
EDIT: I just want to be clear. Any time clients make actions, they simply need to send the action and table id or whatever information you need. Do not let them send their user id or any user specific information. Your "front end" server should auto associate the user id based on the socket the request is coming in on. If they submit any user information with their request, it may be a good idea to log it, and then throw out the data. I would log it just because I don't like people trying to cheat, and that's probably what they're doing if they send you unexpected data.

Network programming abstraction, decomposition

I have a problem as follows:
Server process 1
Constantly sends updates that occur to a datastore
Server process 2
Clients contact the server, which queries the datastore, and returns a result
The thing is, the results that process 1 and process 2 are sending back the client are totally different and unrelated.
How does one decompose this?
Do you just have one process constantly sending data, and define the protocol to have a bit which corresponds to whether the return type is 1 or 2?
Do you have two processes? How do they share the datastore then (it is just a structure not a database)?
Thanks!

It sounds like you want to stream your series of ints "somewhere" and also collect them in a datastore. In my system I am streaming sensor readings into a database and also allowing them to go directly to web clients, giving them live power readings. I've written a blog entry on why a database is not suitable for live data - though it is perfect for saving the data for later analysis.
I'd have the first server process be a twisted server that uses txamp to stream the ints to RabbitMQ. Any clients that want live data can subscribe to the stream in RabbitMQ, also using Txamp. Web browser clients can use Orbited here is a worked example.
In your design server 1 saves to the database. You could instead have server3 collect data from RabbitMQ and stream it to the database. I plan to have a server that collects chunks of data and render graphs to store to a central fileshare.
Don't create your own messaging system, RabbitMQ is well tested, scalable, and can persist your "messages" (raw data) if something goes wrong.

If you can restrict yourself to Twisted, I recommend to use Perspective Broker. It's essentially an RPC system, and doesn't care much about the notion of "client" and "server" - either the initiator of a TCP connection or the responder can start RPC calls in PB.
So server 1 would accept registration calls with a callback object, and call the callback whenever it has new data available. Server 2 provides various RPC operations as clients require them. If they operate on the very same data, I would put both servers into a single process.

Why not use a database instead of "just a structure"? Both relational and non-relational DBs offer many practical advantages (separate processes using them, take care of replication [[and/or snapshots, backups, ...]], rich functionality if you need it for the "queries", and so on, and so forth).
Worst case, the "just a structure" can be handled by a third process that's entirely dedicated to it (basically mimicking what any DB engine would offer -- though the engine would probably do it better and faster;-), allowing you to at least keep a good decomposition (with the two server processes both interacting with the "datastore process").

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.