zeromq persistence patterns

zeromq persistence patterns - python

Who has to manages the persistent in the ZeroMQ?
When we use the ZeroMQ clients in Python language, what are the plug-ins/modules available to manage the persistent?
I would like to know the patterns to use the ZeroMQ.

As far as I know, Zeromq does not have any persistence. It is out of scope for it and needs to be handled by the end user. Just like serializing the message.
In C#, I have used db4o to add persistence. Typically I persist the object in its raw state, then serialize it and send it to ZMQ socket. Btw, this was for PUB/SUB pair.

On the application ends you can persist accordingly, for example I've built a persistance layer in node.js which communicated to back-end php calls and via websockets.
The persistance aspect held messages for a certain period of time (http://en.wikipedia.org/wiki/Time_to_live) this was to give clients a chance to connect. I used in-memory data structures but I toyed with the idea of using redis to gain on-disk persistance.

We needed to persist the received messages from a subscriber before processing them. The messages are received in a separate thread and stored on disk, while the persisted message queue is manipulated in the main thread.
The module is available at: https://pypi.org/project/persizmq. From the documentation:
import pathlib
import zmq
import persizmq
context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.setsockopt_string(zmq.SUBSCRIBE, "")
subscriber.connect("ipc:///some-queue.zeromq")
persistent_dir = pathlib.Path("/some/dir")
storage = persizmq.PersistentStorage(persistent_dir=persistent_dir)
def on_exception(exception: Exception)->None:
print("an exception in the listening thread: {}".format(exception))
with persizmq.ThreadedSubscriber(
callback=storage.add_message, subscriber=subscriber,
on_exception=on_exception):
msg = storage.front() # non-blocking
if msg is not None:
print("Received a persistent message: {}".format(msg))
storage.pop_front()

Related

How to use inproc transport with pyzmq?

I have set up two small scripts imitating a publish and subscribe procedure with pyzmq. However, I am unable to send messages over to my subscriber client using the inproc transport. I am able to use tcp://127.0.0.1:8080 fine, just not inproc.
pub_server.py
import zmq
import random
import sys
import time
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("inproc://stream")
while True:
socket.send_string("Hello")
time.sleep(1)
sub_client.py
import sys
import zmq
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt_string(zmq.SUBSCRIBE, '')
socket.connect("inproc://stream")
for x in range (5):
string = socket.recv()
print(string)
How can I successfully alter my code so that I'm able to use the inproc transport method between my two scripts?
EDIT:
I have updated my code to further reflect #larsks comment. I am still not receiving my published string - what is it that I am doing wrong?
import threading
import zmq
def pub():
context = zmq.Context()
sender = context.socket(zmq.PUB)
sender.connect("inproc://hello")
lock = threading.RLock()
with lock:
sender.send(b"")
def sub():
context = zmq.Context()
receiver = context.socket(zmq.SUB)
receiver.bind("inproc://hello")
pub()
# Wait for signal
string = receiver.recv()
print(string)
print("Test successful!")
receiver.close()
if __name__ == "__main__":
sub()

As the name implies, inproc sockets can only be used within the same process. If you were to rewrite your client and server such that there were two threads in the same process you could use inproc, but otherwise this socket type simply isn't suitable for what you're doing.
The documentation is very clear on this point:
The in-process transport passes messages via memory directly between threads sharing a single ØMQ context.
Update
Taking a look at the updated code, the problem that stands out first is that while the documentation quoted above says "...between threads sharing a single ØMQ context", you are creating two contexts in your code. Typically, you will only call zmq.Context() once in your program.
Next, you are never subscribing your subscriber to any messages, so even in the event that everything else was working correctly you would not actually receive any messages.
Lastly, your code is going to experience the slow joiner problem:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.
The pub/sub model isn't meant for single messages, nor is it meant to be a reliable transport.
So, to sum up:
You need to create a shared ZMQ context before you creating your sockets.
You probably want your publisher to publish in a loop instead of publishing a single message. Since you're trying to use inproc sockets you're going to need to put your two functions into separate threads.
You need to set a subscription filter in order to receive messages.
There is an example using PAIR sockets in the ZMQ documentation that might provide a useful starting point. PAIR sockets are designed for coordinating threads over inproc sockets, and unlike pub/sub sockets they are bidirectional and are not impacted by the "slow joiner" issue.

As mention earlier by #larsks, the context object should be the same. Declare the context object globally and use it in both pub and sub functions instead of creating new ones for each.

ZeroMQ bidirectional async communication with subprocesses

I have a server process which receives requests from a web clients.
The server has to call an external worker process ( another .py ) which streams data to the server and the server streams back to the client.
The server has to monitor these worker processes and send messages to them ( basically kill them or send messages to control which kind of data gets streamed ). These messages are asynchronous ( e.g. depend on the web client )
I thought in using ZeroMQ sockets over an ipc://-transport-class , but the call for socket.recv() method is blocking.
Should I use two sockets ( one for streaming data to the server and another to receive control messages from server )?

Using a separate socket for signalling and messaging is always better
While a Poller-instance will help a bit, the cardinal step is to use separate socket for signalling and another one for data-streaming. Always. The point is, that in such setup, both the Poller.poll() and the event-loop can remain socket-specific and spent not more than a predefined amount of time, during a real-time controlled code-execution.
So, do not hesitate to setup a bit richer signalling/messaging infrastructure as an environment where you will only enjoy the increased simplicity of control, separation of concerns and clarity of intents.
ZeroMQ is an excellent tool for doing this - including per-socket IO-thread affinity, so indeed a fine-grain performance tuning is available at your fingertips.

I think if figured out a solution, but I don't know if there is a better (more efficient, safer, ...) way of doing this.
The client makes a request to the server, which spawns N processes worker to attend the request.
This is the relevant excerpt from worker.py:
for i in range(start_counter,10):
# Check if there is any message from server
while True:
try:
msg = worker.recv(zmq.DONTWAIT)
print("Received {} from server".format(msg))
except zmq.Again:
break
# Send data to server
worker.send(b"Message {} from {}".format(i, worker_id))
# Take some sleep
time.sleep(random.uniform(0.3, 1.1))
In this way, the worker a) does not need a separate socket and b) does not need a separate thread to process messages from server.
In the real implementation, worker must stream 128 byte messages at 100Hz to the server, and the server must receive lots of this messages (many clients asking requests that need 3-10 worker each).
Will this approach suffer a performance hit if implemented this way?

How do chat servers distribute messages to multiple clients?

This is really a programming design question more than a specific language or library question. I'm tinkering with the idea of a standalone chat server for websockets that will accept several remote browser-based javascript clients. I'm going for something super simple at first, then might build it up. The server just keeps accepting client connections and listens for messages. When a message is received, it will be sent back to all the clients.
What I need to better understand is which approach is best for sending the messages out to all clients, specifically, sending immediately to all clients, or queuing the messages to each client's queue to be sent when a client connection handler's turn comes up. Below are the two examples in a python-like pseudo-code:
Broadcast Method
def client_handler(client):
while true:
if(client.pending_msg):
rmsg = client.recv()
for c in clients:
c.send(rmsg)
client.sleep(1)
Queue Method
def client_handler(client):
while true:
if client.pending_msg:
rmsg = client.recv()
for c in clients:
c.queue_msg(rmsg)
if client.has_queued:
client.send_queue
client.sleep(1)
What is the best approach? Or, perhaps they are good for different use-cases, in which case, what are the pros, cons and circumstances for which they should be used. Thanks!

First of all, it seems odd to me that a single client handler would know about all the other existing clients. This should be the first thing you should abstract away and create a central message processing handler instead which the individual clients talk to.
That handler can then either send the message directly to the clients (like in your broadcast example), or add them to queues of the clients (like your queue example). Which would be the preferred version depends a bit on your network protocol.
Since you said that you will be using websockets, you have a persistent network connection to the clients anyway, so you can just send them out immediately. There is no real gain to queue (and buffer) the messages. Ideally, a client would just have a send() method anyway, and the client would then internally decide whether that means appending it to a queue or sending it immediately over the network.
Furthermore, since websockets are kind of asynchronous in their nature, you don’t need busy wait loops anyway. You can just listen for messages from the client directly, process those, and broadcast them using your central handler. And since you then don’t have a wait loop anymore, there also would be no place where you work off your queue anymore, making the immediate broadcast the more natural decision.

How to receive and answer to socket asynchronously with Python?

I'm working on a really basic "image streaming" server as a school subject, and I've done most of the work but I'm still stuck on the separation between data and control related sockets:
My structure is : TCPServer (my server, used as control socket) contains a dataSocket (only used to send images and initialized within my TCPServer object, when I receive a certain query)
When I'm sending data (images) through my dataSocket, I still need to see if the client sent a PAUSE or STOP request, but if I use python's self.request.recv(1024) the server awaits a response instead of continuing to send data (which is quite logical).
What should I do to prevent this behavior ? Should I launch my recv(1024) on a separate thread and run it at each loop (and check if I get any relevant data in between two iterations) ?

Twisted should do the trick! It handles asynchronous sockets in Python

Design question on Python network programming

I'm currently writing a project in Python which has a client and a server part. I have troubles with the network communication, so I need to explain some things...
The client mainly does operations the server tells him to and sends the results of the operations back to the server. I need a way to communicate bidirectional on a TCP socket.
Current Situation
I currently use a LineReceiver of the Twisted framework on the server side, and a plain Python socket (and ssl) on client side (because I was unable to correctly implement a Twisted PushProducer). There is a Queue on the client side which gets filled with data which should be sent to the server; a subprocess continuously pulls data from the queue and sends it to the server (see code below).
This scenario works well, if only the client pushes its results to the manager. There is no possibility the server can send data to the client. More accurate, there is no way for the client to receive data the server has sent.
The Problem
I need a way to send commands from the server to the client.
I thought about listening for incoming data in the client loop I use to send data from the queue:
def run(self):
while True:
data = self.queue.get()
logger.debug("Sending: %s", repr(data))
data = cPickle.dumps(data)
self.socket.write(data + "\r\n")
# Here would be a good place to listen on the socket
But there are several problems with this solution:
the SSLSocket.read() method is a blocking one
if there is no data in the queue, the client will never receive any data
Yes, I could use Queue.get_nowait() instead of Queue.get(), but all in all it's not a good solution, I think.
The Question
Is there a good way to achieve this requirements with Twisted? I really do not have that much skills on Twisted to find my way round in there. I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event.
Is Twisted (or more general any event driven framework) able to solve this problem? I don't even have real event on the communication side. If the server decides to send data, it should be able to send it; there should not be a need to wait for any event on the communication side, as possible.

"I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event."
You can send data using protocol.transport.write from anywhere, not just in lineReceived.

"I need a way to send commands from the server to the client."
Don't do this. It inverts the usual meaning of "client" and "server". Clients take the active role and send stuff or request stuff from the server.
Is Twisted (or more general any event driven framework) able to solve this problem?
It shouldn't. You're inverting the role of client and server.
If the server decides to send data, it should be able to send it;
False, actually.
The server is constrained to wait for clients to request data. That's generally the accepted meaning of "client" and "server".
"One to send commands to the client and one to transmit the results to the server. Does this solution sound more like a standard client-server communication for you?"
No.
If a client sent messages to a server and received responses from the server, it would meet more usual definitions.
Sometimes, this sort of thing is described as having "Agents" which are -- each -- a kind of server and a "Controller" which is a single client of all these servers.
The controller dispatches work to the agents. The agents are servers -- they listen on a port, accept work from the controller, and do work. Each Agent must do two concurrent things (usually via the select API):
Monitor a well-known socket on which it will receive work from the one-and-only client.
Do the work (in the background).
This is what Client-Server usually means.
If each Agent is a Server, you'll find lots of libraries will support this. This is the way everyone does it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.