python zmq many client to many server discovery message patterns - python

Strugging on this problem for a while so finally asking for some help from the experts.
Language: python
The problem/setup:
I have many clients, client[n], client[n] .. etc
I have many servers, server[n], server[n] .. etc
Each server can plugin to 5 external ws connections. At any time I may need to open [x] ws connections; maybe 2, maybe 32, the total ws connections i need, thus servers needed, is dynamic...
Each client maybe connecting 1 ws connection from server[1], 1 ws connection from server[2] .. .etc
How I imagine the flow working
New client[1] is loaded, needing 2 ws feeds
New client[1] broadcasts [xpub/xsub ?] message to all servers saying, 'hey, I need these 2 ws connections, who has them?'
Server[1] with the ws connections reply to client[1] (and only that client) - 'I got what youre looking for, talk to me'
client[1] engages in req/reply communication with server[1] so that client[1] can utilize server[1]'s ws connection to make queries against it, eg, 'hey, server[1] with access to ws[1], can you request [x]' .. server[1] replies to client[1] 'heres the reply from the ws request you made'
tldr
clients will be having multiple req/rep with many servers
servers will be dealing with many clients
client need to broadcast/find appropriate clients to be messaging with

The Zyre protocol is specifically designed for brokerless "gossip" discovery. Pyre (https://github.com/zeromq/pyre) is the Python implementation. It provides mechanisms for nodes to join a "discovery group" and share information. Among other things, it allows group members to WHISPER to individual members or SHOUT (multicast) to all members.
Zyre uses UDP broadcast beacons to initiate contact, so it is generally limited to a single subnet (UDP broadcast is generally not forwarded beyond a subnet). However, you could bridge a group across different subnets via your server(s) in each subnet (see below).
You could use zyre to distribute topology information (in this case, your server list) to your clients.
I have only played around with pyre a little, so I may not have all the details exactly right, but I would try to set it up like this:
Define a Zyre group.
Each server...
Joins the group.
Sets its server address (ip or fqdn, and maybe port) in its beacon header.
Each client...
Joins the group.
Reads server address from the HELLO messages it receives from servers.
Makes REQ connections to server(s).
Adds/removes server connections based on HELLO/LEAVE/AVOID messages received over time.
If servers are not in the same subnet (e.g., maybe they are in different AWS availability zones), you could preconfigure the servers to know what all the server IPs are, periodically verify that they are up (via REQ/REP or PUB/SUB between the servers), and SHOUT active-servers information to the local group. Clients could use this information to inform/adjust their list of active servers/connections.
I've thought about doing exactly the above, but it unfortunately hasn't risen above other priorities in the backlog, so I haven't gotten past the above level of detail.

I'll focus on the discovery problem. How do clients know which servers are available and which ws connections each one has?
One approach is to add a third type of node, call it broker. There is a single broker, and all clients and servers know how to reach it. (Eg, all clients and servers are configured with the broker's IP or hostname.)
When a server starts it registers itself with the broker: "I have ws feeds x,y,z and accept requests on 1.2.3.5:1234". The broker tracks this state, maybe in a hash table.
When a client needs ws feed y, it first contacts the broker: "Which server has ws feed y?" If the broker knows who has feed y, it gives the client the server's IP and port. The client can then contact the server directly. (If multiple servers can access feed y, the broker could return a list of servers instead of a single one.)
If servers run for a "long" time, clients can cache the "server X has feed y" information and only talk to the broker when they need to access a new feed.
With this design, clients use the broker to find servers of interest. Servers don't have to know anything about clients at all. And the "real" traffic (clients accessing feeds via servers) is still done directly between clients and servers - no broker involved.
HTH. And for the record I am definitely not an expert.

Related

Dynamically connect to an endpoint from a ZeroMQ client

My client built with pyzmq, will connect to a service that will provide it with the correct address it needs to connect to. It might do this several times, each time having to connect to a different worker.
What I have created until now, based on the zguide, is a simple broker that will accept connections from clients on a frontend port and then it will connect with one of the workers and make a question (right now its just a random choice of yes and no). If the client replies with 'yes' then my idea was to let the client know that that specific worker is ready and have it connect directly to the worker.
In the examples that I have seen clients mostly connect to a single server or broker once. What would be the best way to connect with an address given to me on runtime potentially multiple times?

Python - Multiple client servers for scaling

For my current setup, I have a single client server using Tornado, a standalone database server and another standalone server for my website.
I'm looking at having a second client server process running on the same system (to take advantage of its multiple cores) and I would like some advice in locating which server my "clients" have connected to. Each client can have multiple connections (instances).
I've already looked at using memcached to hold a list of user identifiers and link them to which server(s) they are connected to, but that doesn't seem like it would scale very well (eg having six digits of connected users).
I see the same issue with database lookups.
I have already optimized my server as much as possible, without going into micro-optimization and I personally frown upon that.
Current server methodology:
On connect:
Accept connection, rate limit for max connections per IP.
Append client instance to a list named "clientList".
On data from client:
Rate limit for max messages per second.
Append data to a client work queue.
If client has a thread dedicated toward its work queue:
return, its work will be chewed by the current thread
otherwise create a new thread for this users work queue, start it.
TLDR:
How do I efficiently store which servers a client has connected to in order to forward messages to that user.

Python server to receive specific data from two clients on same remote device

I was able to set up a simple socket server and client connection between two devices, with the ability to send and receive values. My issue is with setting up the remote server to accept two clients from the same device, and differentiate the data being received by them.
Specifically, each client will be running a similar code to accept encoder/decoder values from their respective motor. My main program, attached to the server, needs to use the data from each client separately, in order to carry out the appropriate calculations. How do I differentiate the incoming signals coming from both clients?
When the communication isn't heavy between clients and server, one way to do this is to have clients do a handshake to server and have the server enumerate clients and send back id's for communication.
Then the client sends it's id along with any communication it has with server in order for the server to identify it. At least that is what I did.

How to synchronize list changes over network in Python

I am developing a little group management system where there are two different types of servers. The "client server" which can join and leave groups on the "management server".
There are multiple management servers in a multicast group. So the client servers send the join and leave requests to this multicast group. Due to the fact that IPv6 multicast is not reliable, there is the possibility that some management servers do not receive the requests. So their list óf memberships is not up to date.
So I need a function that I can use to synchronize lists whenever they change. There are three types of changes:
client server leaves group
client server joins group
client server updates its complete list of memberships (so the management server replaces its list)
I thought of creating a log list on each management server that logs the recent changes (maybe of the last 60 seconds). If a server notices a change it informs the other management servers about the change and sends the time with this information. If the receiver has a more recent change it ignores the information of the sender. If not, it updates its list.
But is this the best way to do this? Are there special patterns for such things? Or maybe even a python framework?
Sounds like you want to use gevent. It's useful for exactly the scenario you're talking about where you want events synchronized between multiple nodes. It'll abstract away most of the networking layer for you too, so you can focus on getting work done instead.

Can pyzmq pub/sub sockets be used bidirectionally?

I'm using a pyzmq pub/sub socket for a server to advertise notifications to client subscribers. It works nicely but I have a question:
Is there any way to use the same socket to send information back to the server? Or do I need a separate socket for that?
Use case: I just want to allow the server to see who's actively subscribing to notifications, so I was hoping I could allow clients to send back periodic "heartbeat" messages. I have a use case where if no clients are listening, I want the server to spawn one. (This is a multiprocess system that uses localhost only.)
You need a separate socket. From the ZMQ guide (http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub):
Killing back-chatter is essential to real scalability. With pub-sub, it's how the pattern can map cleanly to the PGM multicast protocol, which is handled by the network switch. In other words, subscribers don't connect to the publisher at all, they connect to a multicast group on the switch, to which the publisher sends its messages.
In order for this to work, the PUB socket will not send back data to the subscribers (at least not in a way visible to the user. The heartbeating problem was discussed in-depth in the guide: http://zguide.zeromq.org/page:all#The-Asynchronous-Client-Server-Pattern
Also, check out the 7/MDP and 18/MDP protocols (http://rfc.zeromq.org/spec:7 -- this is also discussed in the guide) if you want to keep track of clients.

Categories

Resources