Python - Multiple client servers for scaling - python

For my current setup, I have a single client server using Tornado, a standalone database server and another standalone server for my website.
I'm looking at having a second client server process running on the same system (to take advantage of its multiple cores) and I would like some advice in locating which server my "clients" have connected to. Each client can have multiple connections (instances).
I've already looked at using memcached to hold a list of user identifiers and link them to which server(s) they are connected to, but that doesn't seem like it would scale very well (eg having six digits of connected users).
I see the same issue with database lookups.
I have already optimized my server as much as possible, without going into micro-optimization and I personally frown upon that.
Current server methodology:
On connect:
Accept connection, rate limit for max connections per IP.
Append client instance to a list named "clientList".
On data from client:
Rate limit for max messages per second.
Append data to a client work queue.
If client has a thread dedicated toward its work queue:
return, its work will be chewed by the current thread
otherwise create a new thread for this users work queue, start it.
TLDR:
How do I efficiently store which servers a client has connected to in order to forward messages to that user.

Related

Why are redis pub and sub considered different clients when only one connection is opened?

How come that even when only one instance of Redis connection created, every time I call publish or subscribe on that instance, it counts it like another client. So when I connect to redis using python
import redis
redis_server = redis.Redis()
it does not recognize it as new client. Only when I call one of these
redis_server.publish("channel", message)
redis_server.subscribe("channel")
I can see that there are 2 clients connected. Are the pub/sub clients treated seperately in redis? Why not registering connected client when the new connection is open?
By default redis-py gives you get a connection pool with only a maximum number of connections. On the first command you issue a real connection will be made and you'll see it appear in the CLIENT LIST on the server.
Whenever any client library for Redis issues a subscribe command, that entire connection is occupied by this, so redis-py is probably creating a separate connection dedicated to this.
This should explain why you see no clients connected, then 2. It's not necessarily 1 connection for every command issued as the connections in the pool will be reused.

How to synchronize list changes over network in Python

I am developing a little group management system where there are two different types of servers. The "client server" which can join and leave groups on the "management server".
There are multiple management servers in a multicast group. So the client servers send the join and leave requests to this multicast group. Due to the fact that IPv6 multicast is not reliable, there is the possibility that some management servers do not receive the requests. So their list óf memberships is not up to date.
So I need a function that I can use to synchronize lists whenever they change. There are three types of changes:
client server leaves group
client server joins group
client server updates its complete list of memberships (so the management server replaces its list)
I thought of creating a log list on each management server that logs the recent changes (maybe of the last 60 seconds). If a server notices a change it informs the other management servers about the change and sends the time with this information. If the receiver has a more recent change it ignores the information of the sender. If not, it updates its list.
But is this the best way to do this? Are there special patterns for such things? Or maybe even a python framework?
Sounds like you want to use gevent. It's useful for exactly the scenario you're talking about where you want events synchronized between multiple nodes. It'll abstract away most of the networking layer for you too, so you can focus on getting work done instead.

Multi-threaded UDP server with Python

I want to create a simple video streaming (actually, image streaming) server that can manage different protocols (TCP Push/Pull, UDP Push/Pull/Multicast).
I managed to get TCP Push/Pull working with the SocketServer.TCPServer class and ThreadinMixIn for processing each connected client in a different thread.
But now that I'm working on the UDP protocol, I just realized that ThreadinMixIn creates a thread per call of handle() per client query (as there's nothing such as a "connection" in UDP).
The problem is I need to process a sequence of queries by the same client, for all the clients. How could I manage that ?
The only way I see I could handle that is to have a list of (client adresses, processing thread) and send each query to the matching thread (or create a new one if the client haven't sent any thread yet). Is there an easier way to do that ?
Thanks !
P.S : I can't use any external or too "high-level" library for this as it's a school subject meant to understand how sockets work.
Take a look at Twisted. This will remove the need to do any thread dispatch from your application. You still have to match up packets to a particular session in order to handle them, but this isn't difficult (use a port per client and dispatch based on the port, or require packets in a session to always come from the same address and use the peer address, or use one of the existing protocols that solves this problem such as SIP).

How-To - Update Live Running Python Application

I have a python application , to be more precise a Network Application that can't go down this means i can't kill the PID since it actually talks with other servers and clients and so on ... many € per minute of downtime , you know the usual 24/7 system.
Anyway in my hobby projects i also work a lot with WSGI frameworks and i noticed that i have the same problem even during off-peak hours.
Anyway imagine a normal server using TCP/UDP ( put here your favourite WSGI/SIP/Classified Information Server/etc).
Now you perform a git pull in the remote server and there goes the new python files into the server (these files will of course ONLY affect the data processing and not the actual sockets so there is no need to re-raise the sockets or touch in any way the network part).
I don't usually use File monitors since i prefer to use SIGNAL to wakeup the internal app updater.
Now imagine the following code
from mysuper.app import handler
while True:
data = socket.recv()
if data:
socket.send(handler(data))
Lets imagine that handler is a APP with DB connections, cache connections , etc.
What is the best way to update the handler.
Is it safe to call reload(handler) ?
Will this break DB connections ?
Will DB Connections survive to this restart ?
Will current transactions be lost ?
Will this create anti-matter ?
What is the best-pratice patterns that you guys usually use if there are any ?
It's safe to call reload(handler).
Depends where you initialize your connections. If you make the connections inside handler(), then yes, they'll be garbage collected when the handler() object falls out of scope. But you wouldn't be connecting inside your main loop, would you? I'd highly recommend something like:
dbconnection = connect(...)
while True:
...
socket.send(handler(data, dbconnection))
if for no other reason than that you won't be making an expensive connection inside a tight loop.
That said, I'd recommend going with an entirely different architecture. Make a listener process that does basically nothing more than listen for UDP datagrams, sends them to a messaging queue like RabbitMQ, then waits for the reply message to send the results back to the client. Then write your actual servers that get their requests from the messaging queue, process them, and send a reply message back.
If you want to upgrade the UDP server, launch the new instance listening on another port. Update your firewall rules to redirect incoming traffic to the new port. Reload the rules. Kill the old process. Voila: seamless cutover.
The real win is from uncoupling your backend. Since multiple processes can listen for the same messages from your frontend "proxy" service, you can run several in parallel - on different machines, if you want to. To upgrade the backend, start a new instance then kill the old one so that there's no time when at least one instance isn't running.
To scale your proxy, have multiple instances running on different ports or different hosts, and configure your firewall to randomly redirect incoming datagrams to one of the proxies.
To scale your backend, run more instances.

Message queue proxy in Python + Twisted

I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?
You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.
Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.
A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.

Categories

Resources