How to efficiently separate socket pollers using asyncio and zmq?

How to efficiently separate socket pollers using asyncio and zmq? - python

Let's say I want to implement an echo server and client using ZeroMQ (pyzmq) and asyncio (for its event loop, coroutines, etc.).
Now I want to add more reliability by adding a heartbeat. As I don't want to interact too much with my wonderful echo protocol, this heartbeat is done by both client and server on a dedicated pair of sockets.
From what I understand, the way to go™ is to create a new zmq socket in the server class, register it to the existing Poller and let the server class handle everything, from timeout calculation to sending beats. That works, of course.
But this is more complicated than it should be (that's a personal view). From the server point of view, 'heartbeats' are implementation details. What heartbeats are there for is to answer a simple question: "is the client still there?". More technically, I would like to setup and Heartbeat object that takes a timeout and an address. That Heartbeat object would do all the socket setup, beat-related socket polling, send actual beats and receive them.
From the server point of view, I would just use client.is_alive() when required. But that would require two socket pollers to work in parallel. I can achieve that with an executor, but that does not seem right. How would you do that?

Related

How do chat servers distribute messages to multiple clients?

This is really a programming design question more than a specific language or library question. I'm tinkering with the idea of a standalone chat server for websockets that will accept several remote browser-based javascript clients. I'm going for something super simple at first, then might build it up. The server just keeps accepting client connections and listens for messages. When a message is received, it will be sent back to all the clients.
What I need to better understand is which approach is best for sending the messages out to all clients, specifically, sending immediately to all clients, or queuing the messages to each client's queue to be sent when a client connection handler's turn comes up. Below are the two examples in a python-like pseudo-code:
Broadcast Method
def client_handler(client):
while true:
if(client.pending_msg):
rmsg = client.recv()
for c in clients:
c.send(rmsg)
client.sleep(1)
Queue Method
def client_handler(client):
while true:
if client.pending_msg:
rmsg = client.recv()
for c in clients:
c.queue_msg(rmsg)
if client.has_queued:
client.send_queue
client.sleep(1)
What is the best approach? Or, perhaps they are good for different use-cases, in which case, what are the pros, cons and circumstances for which they should be used. Thanks!

First of all, it seems odd to me that a single client handler would know about all the other existing clients. This should be the first thing you should abstract away and create a central message processing handler instead which the individual clients talk to.
That handler can then either send the message directly to the clients (like in your broadcast example), or add them to queues of the clients (like your queue example). Which would be the preferred version depends a bit on your network protocol.
Since you said that you will be using websockets, you have a persistent network connection to the clients anyway, so you can just send them out immediately. There is no real gain to queue (and buffer) the messages. Ideally, a client would just have a send() method anyway, and the client would then internally decide whether that means appending it to a queue or sending it immediately over the network.
Furthermore, since websockets are kind of asynchronous in their nature, you don’t need busy wait loops anyway. You can just listen for messages from the client directly, process those, and broadcast them using your central handler. And since you then don’t have a wait loop anymore, there also would be no place where you work off your queue anymore, making the immediate broadcast the more natural decision.

Can pyzmq pub/sub sockets be used bidirectionally?

I'm using a pyzmq pub/sub socket for a server to advertise notifications to client subscribers. It works nicely but I have a question:
Is there any way to use the same socket to send information back to the server? Or do I need a separate socket for that?
Use case: I just want to allow the server to see who's actively subscribing to notifications, so I was hoping I could allow clients to send back periodic "heartbeat" messages. I have a use case where if no clients are listening, I want the server to spawn one. (This is a multiprocess system that uses localhost only.)

You need a separate socket. From the ZMQ guide (http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub):
Killing back-chatter is essential to real scalability. With pub-sub, it's how the pattern can map cleanly to the PGM multicast protocol, which is handled by the network switch. In other words, subscribers don't connect to the publisher at all, they connect to a multicast group on the switch, to which the publisher sends its messages.
In order for this to work, the PUB socket will not send back data to the subscribers (at least not in a way visible to the user. The heartbeating problem was discussed in-depth in the guide: http://zguide.zeromq.org/page:all#The-Asynchronous-Client-Server-Pattern
Also, check out the 7/MDP and 18/MDP protocols (http://rfc.zeromq.org/spec:7 -- this is also discussed in the guide) if you want to keep track of clients.

Which form of connection to use with pika

I've been trying to figure out which form of connection i should use when using pika, I've got two alternatives as far as I understand.
Either the BlockingConnection or the SelectConnection, however I'm not really sure about the differences between these two (i.e. what is the BlockingConnection blocking? and more)
The documentation for pika says that SelectConnection is the preferred way to connect to rabbit since it provides "multiple event notification methods including select, epoll, kqueue and poll."
So I'm wondering what are the implications of these two different kinds of connections?
PS: I know I shouldn't put a tag in the title but in this case I think it does help to clarify the question.

The SelectConnection is useful if your application architecture can benefit from an asynchronous design, e.g. doing something else while the RabbitMQ IO completes (e.g. switch to some other IO etc) . This type of connection uses callbacks to indicate when functions return. For example you can declare callbacks for
on_connected, on_channel_open, on_exchange_declared, on_queue_declared etc.
...to perform operations when these events are triggered.
The benefit is especially good if your RabbitMQ server (or connection to that server) is slow or overloaded.
BlockingConnection on the hand is just that - it blocks until the called function returns. so it will block the execution thread until connected or channel_open or exchange_declared or queue_declared return for example. That said, its often simpler to program this sort of serialized logic than the async SelectConnection logic. For simple apps with responsive RabbitMQ servers these also work OK IMO.
I suppose you've read the Pika documentation already http://pika.readthedocs.io/en/stable/intro.html, if not, then this is absolutely vital information before you use Pika!
Cheers!

The Pika documentation is quite clear about the differences between the connection types. The main difference is that the pika.adapters.blocking_connection.BlockingConnection() adapter is used for non-asynchronous programming and that the pika.adapters.select_connection.SelectConnection() adapter is used for asynchronous programming.
If you don't know what the difference is between non-asynchronous/synchronous and asynchronous programming I suggest that you read this question or for the more deeper technical explanation this article.
Now let's dive into the different Pika adapters and see what they do, for the example purpose I imagine that we use Pika for setting up a client connection with RabbitMQ as AMQP message broker.
BlockingConnection()
In the following example, a connection is made to RabbitMQ listening to port 5672 on localhost using the username guest and password guest and virtual host '/'. Once connected, a channel is opened and a message is published to the test_exchange exchange using the test_routing_key routing key. The BasicProperties value passed in sets the message to delivery mode 1 (non-persisted) with a content-type of text/plain. Once the message is published, the connection is closed:
import pika
parameters = pika.URLParameters('amqp://guest:guest#localhost:5672/%2F')
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
channel.basic_publish('test_exchange',
'test_routing_key',
'message body value',
pika.BasicProperties(content_type='text/plain',
delivery_mode=1))
connection.close()
SelectConnection()
In contrast, using this connection adapter is more complicated and less pythonic, but when used with other asynchronous services it can have tremendous performance improvements. In the following code example, all of the same parameters and values are used as were used in the previous example:
import pika
# Step #3
def on_open(connection):
connection.channel(on_open_callback=on_channel_open)
# Step #4
def on_channel_open(channel):
channel.basic_publish('test_exchange',
'test_routing_key',
'message body value',
pika.BasicProperties(content_type='text/plain',
delivery_mode=1))
connection.close()
# Step #1: Connect to RabbitMQ
parameters = pika.URLParameters('amqp://guest:guest#localhost:5672/%2F')
connection = pika.SelectConnection(parameters=parameters,
on_open_callback=on_open)
try:
# Step #2 - Block on the IOLoop
connection.ioloop.start()
# Catch a Keyboard Interrupt to make sure that the connection is closed cleanly
except KeyboardInterrupt:
# Gracefully close the connection
connection.close()
# Start the IOLoop again so Pika can communicate, it will stop on its own when the connection is closed
connection.ioloop.start()
Conclusion
For those doing simple, non-asynchronous/synchronous programming, the BlockingConnection() adapter proves to be the easiest way to get up and running with Pika to publish messages. But if you are looking for a way to implement asynchronous message handling, the SelectConnection() handler is your better choice.
Happy coding!

simultaneously sending/receiving info from a server, in python?

I'm trying to figure out how to make a server that can accept multiple clients at one time. While doing so, I need the client able to send and receive data from the server at the same time.
Would i have to make a threaded server? And have a thread for listening for data.
And then another thread for sending out information to the client?
Then for the client side, do i need use threads to send/get info?

Use async IO. There are dozen of async IO socket libs for python. Here is a brief benchmark.
I also tested gevent, eventlet, asyncore, twisted, pyev, pycurl, tornado.
Twsited
is stable but most slow and also not easy to start with.
gevent, eventlet (libevent)
easy to start and fast (code looks like blocking) but have some issues with forking.
pycurl (libcurl)
fast and easy (if you ok to do flags magic.. but there are example) but only http.
pyev (libev)
you must understand what you are doing almost like polling yourself.
tornado (polling in python)
fast enough and i think stable and also easy to start.
asyncore
really fast.. but don't use it.. it is ugly-ugly.
Don't use threads in python unless you are really know what you are doing.
Python and threads not really big friends (unless version <3.2 in 3.2 there must be a new gil).

On server-side you clearly need a Socket Server. This server creates a new thread for every incoming client connection.
Once a connection is established, both the client and the thread that was instantiated for the communication require an additional thread if they have to do other business in parallel than listening to the socket if the communication is synchronous. In case an asynchronous communication is what you need, then Python provides an excellent Asynchronous Socket Handler.

Use a asynchronous socket. Example server could be found here and the client code here. No direct hassle with threads. Depending on your needs, you probably don't need the asynchronous client.

You don't need threads for either client or server; you can instead select() to multiplex all the I/O inside a single thread.

Design question on Python network programming

I'm currently writing a project in Python which has a client and a server part. I have troubles with the network communication, so I need to explain some things...
The client mainly does operations the server tells him to and sends the results of the operations back to the server. I need a way to communicate bidirectional on a TCP socket.
Current Situation
I currently use a LineReceiver of the Twisted framework on the server side, and a plain Python socket (and ssl) on client side (because I was unable to correctly implement a Twisted PushProducer). There is a Queue on the client side which gets filled with data which should be sent to the server; a subprocess continuously pulls data from the queue and sends it to the server (see code below).
This scenario works well, if only the client pushes its results to the manager. There is no possibility the server can send data to the client. More accurate, there is no way for the client to receive data the server has sent.
The Problem
I need a way to send commands from the server to the client.
I thought about listening for incoming data in the client loop I use to send data from the queue:
def run(self):
while True:
data = self.queue.get()
logger.debug("Sending: %s", repr(data))
data = cPickle.dumps(data)
self.socket.write(data + "\r\n")
# Here would be a good place to listen on the socket
But there are several problems with this solution:
the SSLSocket.read() method is a blocking one
if there is no data in the queue, the client will never receive any data
Yes, I could use Queue.get_nowait() instead of Queue.get(), but all in all it's not a good solution, I think.
The Question
Is there a good way to achieve this requirements with Twisted? I really do not have that much skills on Twisted to find my way round in there. I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event.
Is Twisted (or more general any event driven framework) able to solve this problem? I don't even have real event on the communication side. If the server decides to send data, it should be able to send it; there should not be a need to wait for any event on the communication side, as possible.

"I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event."
You can send data using protocol.transport.write from anywhere, not just in lineReceived.

"I need a way to send commands from the server to the client."
Don't do this. It inverts the usual meaning of "client" and "server". Clients take the active role and send stuff or request stuff from the server.
Is Twisted (or more general any event driven framework) able to solve this problem?
It shouldn't. You're inverting the role of client and server.
If the server decides to send data, it should be able to send it;
False, actually.
The server is constrained to wait for clients to request data. That's generally the accepted meaning of "client" and "server".
"One to send commands to the client and one to transmit the results to the server. Does this solution sound more like a standard client-server communication for you?"
No.
If a client sent messages to a server and received responses from the server, it would meet more usual definitions.
Sometimes, this sort of thing is described as having "Agents" which are -- each -- a kind of server and a "Controller" which is a single client of all these servers.
The controller dispatches work to the agents. The agents are servers -- they listen on a port, accept work from the controller, and do work. Each Agent must do two concurrent things (usually via the select API):
Monitor a well-known socket on which it will receive work from the one-and-only client.
Do the work (in the background).
This is what Client-Server usually means.
If each Agent is a Server, you'll find lots of libraries will support this. This is the way everyone does it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.