ZeroMQ bidirectional async communication with subprocesses

ZeroMQ bidirectional async communication with subprocesses - python

I have a server process which receives requests from a web clients.
The server has to call an external worker process ( another .py ) which streams data to the server and the server streams back to the client.
The server has to monitor these worker processes and send messages to them ( basically kill them or send messages to control which kind of data gets streamed ). These messages are asynchronous ( e.g. depend on the web client )
I thought in using ZeroMQ sockets over an ipc://-transport-class , but the call for socket.recv() method is blocking.
Should I use two sockets ( one for streaming data to the server and another to receive control messages from server )?

Using a separate socket for signalling and messaging is always better
While a Poller-instance will help a bit, the cardinal step is to use separate socket for signalling and another one for data-streaming. Always. The point is, that in such setup, both the Poller.poll() and the event-loop can remain socket-specific and spent not more than a predefined amount of time, during a real-time controlled code-execution.
So, do not hesitate to setup a bit richer signalling/messaging infrastructure as an environment where you will only enjoy the increased simplicity of control, separation of concerns and clarity of intents.
ZeroMQ is an excellent tool for doing this - including per-socket IO-thread affinity, so indeed a fine-grain performance tuning is available at your fingertips.

I think if figured out a solution, but I don't know if there is a better (more efficient, safer, ...) way of doing this.
The client makes a request to the server, which spawns N processes worker to attend the request.
This is the relevant excerpt from worker.py:
for i in range(start_counter,10):
# Check if there is any message from server
while True:
try:
msg = worker.recv(zmq.DONTWAIT)
print("Received {} from server".format(msg))
except zmq.Again:
break
# Send data to server
worker.send(b"Message {} from {}".format(i, worker_id))
# Take some sleep
time.sleep(random.uniform(0.3, 1.1))
In this way, the worker a) does not need a separate socket and b) does not need a separate thread to process messages from server.
In the real implementation, worker must stream 128 byte messages at 100Hz to the server, and the server must receive lots of this messages (many clients asking requests that need 3-10 worker each).
Will this approach suffer a performance hit if implemented this way?

Related

Detecting when a tcp client is not active for more than 5 seconds

Im trying to make a tcp communication, where the server sends a message every x seconds through a socket, and should stop sending those messages on a certain condition where the client isnt sending any message for 5 seconds.
To be more detailed, the client also sends constant messages which are all ignored by the server on the same socket as above, and can stop sending them at any unknown time. The messages are, for simplicity, used as alive messages to inform the server that the communication is still relevant.
The problem is that if i want to send repeated messages from the server, i cannot allow it to "get busy" and start receiving messages instead, thus i cannot detect when a new messages arrives from the other side and act accordingly.
The problem is independent of the programming language, but to be more specific im using python, and cannot access the code of the client.
Is there any option of receiving and sending messages on a single socket simultaneously?
Thanks!

Option 1
Use two threads, one will write to the socket and the second will read from it.
This works since sockets are full-duplex (allow bi-directional simultaneous access).
Option 2
Use a single thread that manages all keep alives using select.epoll. This way one thread can handle multiple clients. Remember though, that if this isn't the only thread that uses the sockets, you might need to handle thread safety on your own

As discussed in another answer, threads are one common approach. The other approach is to use an event loop and nonblocking I/O. Recent versions of Python (I think starting at 3.4) include a package called asyncio that supports this.
You can call the create_connection method on an event_loop to create an asyncio connection. See this example for a simple server that reads and writes over TCP.
In many cases an event loop can permit higher performance than threads, but it has the disadvantage of requiring most or all of your code to be aware of the event model.

How do chat servers distribute messages to multiple clients?

This is really a programming design question more than a specific language or library question. I'm tinkering with the idea of a standalone chat server for websockets that will accept several remote browser-based javascript clients. I'm going for something super simple at first, then might build it up. The server just keeps accepting client connections and listens for messages. When a message is received, it will be sent back to all the clients.
What I need to better understand is which approach is best for sending the messages out to all clients, specifically, sending immediately to all clients, or queuing the messages to each client's queue to be sent when a client connection handler's turn comes up. Below are the two examples in a python-like pseudo-code:
Broadcast Method
def client_handler(client):
while true:
if(client.pending_msg):
rmsg = client.recv()
for c in clients:
c.send(rmsg)
client.sleep(1)
Queue Method
def client_handler(client):
while true:
if client.pending_msg:
rmsg = client.recv()
for c in clients:
c.queue_msg(rmsg)
if client.has_queued:
client.send_queue
client.sleep(1)
What is the best approach? Or, perhaps they are good for different use-cases, in which case, what are the pros, cons and circumstances for which they should be used. Thanks!

First of all, it seems odd to me that a single client handler would know about all the other existing clients. This should be the first thing you should abstract away and create a central message processing handler instead which the individual clients talk to.
That handler can then either send the message directly to the clients (like in your broadcast example), or add them to queues of the clients (like your queue example). Which would be the preferred version depends a bit on your network protocol.
Since you said that you will be using websockets, you have a persistent network connection to the clients anyway, so you can just send them out immediately. There is no real gain to queue (and buffer) the messages. Ideally, a client would just have a send() method anyway, and the client would then internally decide whether that means appending it to a queue or sending it immediately over the network.
Furthermore, since websockets are kind of asynchronous in their nature, you don’t need busy wait loops anyway. You can just listen for messages from the client directly, process those, and broadcast them using your central handler. And since you then don’t have a wait loop anymore, there also would be no place where you work off your queue anymore, making the immediate broadcast the more natural decision.

simultaneously sending/receiving info from a server, in python?

I'm trying to figure out how to make a server that can accept multiple clients at one time. While doing so, I need the client able to send and receive data from the server at the same time.
Would i have to make a threaded server? And have a thread for listening for data.
And then another thread for sending out information to the client?
Then for the client side, do i need use threads to send/get info?

Use async IO. There are dozen of async IO socket libs for python. Here is a brief benchmark.
I also tested gevent, eventlet, asyncore, twisted, pyev, pycurl, tornado.
Twsited
is stable but most slow and also not easy to start with.
gevent, eventlet (libevent)
easy to start and fast (code looks like blocking) but have some issues with forking.
pycurl (libcurl)
fast and easy (if you ok to do flags magic.. but there are example) but only http.
pyev (libev)
you must understand what you are doing almost like polling yourself.
tornado (polling in python)
fast enough and i think stable and also easy to start.
asyncore
really fast.. but don't use it.. it is ugly-ugly.
Don't use threads in python unless you are really know what you are doing.
Python and threads not really big friends (unless version <3.2 in 3.2 there must be a new gil).

On server-side you clearly need a Socket Server. This server creates a new thread for every incoming client connection.
Once a connection is established, both the client and the thread that was instantiated for the communication require an additional thread if they have to do other business in parallel than listening to the socket if the communication is synchronous. In case an asynchronous communication is what you need, then Python provides an excellent Asynchronous Socket Handler.

Use a asynchronous socket. Example server could be found here and the client code here. No direct hassle with threads. Depending on your needs, you probably don't need the asynchronous client.

You don't need threads for either client or server; you can instead select() to multiplex all the I/O inside a single thread.

Design question on Python network programming

I'm currently writing a project in Python which has a client and a server part. I have troubles with the network communication, so I need to explain some things...
The client mainly does operations the server tells him to and sends the results of the operations back to the server. I need a way to communicate bidirectional on a TCP socket.
Current Situation
I currently use a LineReceiver of the Twisted framework on the server side, and a plain Python socket (and ssl) on client side (because I was unable to correctly implement a Twisted PushProducer). There is a Queue on the client side which gets filled with data which should be sent to the server; a subprocess continuously pulls data from the queue and sends it to the server (see code below).
This scenario works well, if only the client pushes its results to the manager. There is no possibility the server can send data to the client. More accurate, there is no way for the client to receive data the server has sent.
The Problem
I need a way to send commands from the server to the client.
I thought about listening for incoming data in the client loop I use to send data from the queue:
def run(self):
while True:
data = self.queue.get()
logger.debug("Sending: %s", repr(data))
data = cPickle.dumps(data)
self.socket.write(data + "\r\n")
# Here would be a good place to listen on the socket
But there are several problems with this solution:
the SSLSocket.read() method is a blocking one
if there is no data in the queue, the client will never receive any data
Yes, I could use Queue.get_nowait() instead of Queue.get(), but all in all it's not a good solution, I think.
The Question
Is there a good way to achieve this requirements with Twisted? I really do not have that much skills on Twisted to find my way round in there. I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event.
Is Twisted (or more general any event driven framework) able to solve this problem? I don't even have real event on the communication side. If the server decides to send data, it should be able to send it; there should not be a need to wait for any event on the communication side, as possible.

"I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event."
You can send data using protocol.transport.write from anywhere, not just in lineReceived.

"I need a way to send commands from the server to the client."
Don't do this. It inverts the usual meaning of "client" and "server". Clients take the active role and send stuff or request stuff from the server.
Is Twisted (or more general any event driven framework) able to solve this problem?
It shouldn't. You're inverting the role of client and server.
If the server decides to send data, it should be able to send it;
False, actually.
The server is constrained to wait for clients to request data. That's generally the accepted meaning of "client" and "server".
"One to send commands to the client and one to transmit the results to the server. Does this solution sound more like a standard client-server communication for you?"
No.
If a client sent messages to a server and received responses from the server, it would meet more usual definitions.
Sometimes, this sort of thing is described as having "Agents" which are -- each -- a kind of server and a "Controller" which is a single client of all these servers.
The controller dispatches work to the agents. The agents are servers -- they listen on a port, accept work from the controller, and do work. Each Agent must do two concurrent things (usually via the select API):
Monitor a well-known socket on which it will receive work from the one-and-only client.
Do the work (in the background).
This is what Client-Server usually means.
If each Agent is a Server, you'll find lots of libraries will support this. This is the way everyone does it.

Message queue proxy in Python + Twisted

I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?

You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.

Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.

A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.