I am trying to adopt the ZeroMQ asynchronous client-server pattern described here with python multiprocessing. A brief description in the ZeroMQ guide
It's a DEALER/ROUTER for the client to server frontend communication and DEALER/DEALER for the server backend to the server workers communication. The server frontend and backend are connected using a zmq.proxy()-instance.
Instead of using threads, I want to use multiprocessing on the server. But requests from the client do not reach the server workers. However, they do reach the server frontend. And also the backend. But the backend is not able to connect to the server workers.
How do we generally debug these issues in pyzmq?How to turn on verbose logging for the sockets?
The python code snippets I am using -
server.py
import zmq
import time
from multiprocessing import Process
def run(context, worker_id):
socket = context.socket(zmq.DEALER)
socket.connect("ipc://backend.ipc")
print(f"Worker {worker_id} started")
try:
while True:
ident, msg = socket.recv_multipart()
print("Worker received %s from %s" % (msg, "ident"))
time.sleep(5)
socket.send_multipart([ident, msg])
print("Worker sent %s from %s" % (msg, ident))
except:
socket.close()
if __name__ == "__main__":
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("tcp://*:5570")
backend = context.socket(zmq.DEALER)
backend.bind("ipc://backend.ipc")
N_WORKERS = 7
jobs = []
try:
for worker_id in range(N_WORKERS):
job = Process(target=run, args=(context, worker_id,))
jobs.append(job)
job.start()
zmq.proxy(frontend, backend)
for job in jobs:
job.join()
except:
frontend.close()
backend.close()
context.term()
client.py
import re
import zmq
from uuid import uuid4
if __name__ == "__main__":
context = zmq.Context()
socket = context.socket(zmq.DEALER)
identity = uuid4()
socket.identity = identity.encode("ascii")
socket.connect("tcp://localhost:5570")
poll = zmq.Poller()
poll.register(socket, zmq.POLLIN)
request = {
"body": "Some request body.",
}
socket.send_string(json.dumps(request))
while True:
for i in range(5):
sockets = dict(poll.poll(10))
if socket in sockets:
msg = socket.recv()
print(msg)
Q : "How to turn on verbose logging for the sockets?"
Start using the published native API socket_monitor() for all relevant details, reported as events arriving from socket-(instance)-under-monitoring.
Q : "How do we generally debug these issues in pyzmq?"
There is no general strategy on doing this. Having gone into a domain of a distributed-computing, you will almost always create your own, project-specific, tools for "collecting" & "viewing/interpreting" a time-ordered flow of (principally) distributed-events.
Last but not least : avoid trying to share a Context()-instance, the less "among" 8 processes
The Art of Zen of Zero strongly advocates to avoid any shape and form of sharing. Here, the one and the very same Context()-instance is referenced ("shared") via a multiprocessing.Process's process-instantiation call-signature interface, which does not make the inter-process-"sharing" work.
One may let each spawned process-instance create it's own Context()-instance and use it from inside its private space during its own life-cycle.
Btw, your code ignores any return-codes, documented in the native API, that help you handle ( in worse cases debug post-mortem ) what goes alongside the distributed-computing. The try: ... except: ... finally: scaffolding also helps a lot here.
Anyway, the sooner you will learn to stop using the blocking-forms of the { .send() | .recv() | .poll() }-methods, the better your code starts to re-use the actual powers of the ZeroMQ.
Related
I'm working on a relatively simple Python / ZeroMQ based work distribution system, using REQ/ROUTER sockets. The system is distributed and worker nodes are geographically distributed on different continents.
The ROUTER, responsible for distributing work, .bind()-s a ROUTER socket. Workers .connect() to it over TCP using a REQ socket.
In the process of setting up a new worker node, I've noticed that while smaller messages (up to 1kB) do the trip with no issues, replies of ~2kB and up, sent by the ROUTER-end are never received by the worker into their REQ-socket - when I call recv(), the socket just hangs.
The worker code runs inside Docker containers, and I was able to work around the issue when running the same image with --net=host - it seems to not happen if Docker is using the host network.
I'm wondering if this is something in the network stack configuration on the host machine or in Docker, or maybe something that can be prevented in my code?
Here is a simplified version of my code that reproduces this issue:
Worker
import sys
import zmq
import logging
import time
READY = 'R'
def worker(connect_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect(connect_to)
log = logging.getLogger(__name__)
while True:
socket.send_string(READY)
log.debug("Send READY message, waiting for reply")
message = socket.recv()
log.debug("Got reply of %d bytes", len(message))
time.sleep(5)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
worker(sys.argv[1])
Router
import sys
import zmq
import logging
REPLY_SIZE = 1024 * 8
def router(bind_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.ROUTER)
socket.bind(bind_to)
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
log = logging.getLogger(__name__)
while True:
socks = dict(poller.poll(5000))
if socks.get(socket) == zmq.POLLIN:
message = socket.recv_multipart()
log.debug("Received message of %d parts", len(message))
identity, _ = message[:2]
res = handle_message(message[2:])
log.debug("Sending %d bytes back in response on socket", len(res))
socket.send_multipart([identity, '', res])
def handle_message(parts):
log = logging.getLogger(__name__)
log.debug("Got message: %s", parts)
return 'A' * REPLY_SIZE
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
router(sys.argv[1])
FWIW I was able to reproduce this on Ubuntu 16.04 (both router and worker) with Docker 17.09.0-ce, libzmq 4.1.5 and PyZMQ 15.4.0.
No, sir, the socket does not hang at all:
Why?
The issue is, that you have instructed the Socket()-instance to enter into an infinitely blocking state, once having called .recv() method, without specifying a zmq.NOBLOCK flag ( the ZMQ_DONTWAIT flag in the ZeroMQ original API ).
This is the cause, that upon other circumstances reported yesterday, moves the code into infinite blocking, as there seem to be other issues that prevent Docker-container to properly deliver any first message to the hands of the Worker's Docker-embedded-ZeroMQ-Context() I/O-engine and to the hands of the REQ-access-point. As the REQ-archetype uses a strict two-step Finite-State-Automaton - strictly striding ( .send()->.recv()->.send()-> ... ad infimum )
This cause->effect reversing is wrong and misleading -
the issue of "socket just hangs"
is un-decideable
from an issue Docker does not deliver a single message ( to allow .recv() to return )
Next steps:
may use .poll() in REQ-side to sniff without blocking for any already arrived message in the Worker.
Once there are none such, focus on Docker first + next may benefit from ZeroMQ Context()-I/O-engine performance and link-level tweaking configuration options.
I am trying to get a python program to communicate with another python program via zeromq by using the request-reply pattern. The client program should send a request to the server program which replies.
I have two servers such that when one server fails the other takes over. Communication works perfect when the first server works, however, when the first server fails and when I make a request to the second server, I see the error:
zmp.error.ZMQError: Operation cannot be accomplished in current state
Code of the server 1:
# Run the server
while True:
# Define the socket using the "Context"
sock = context.socket(zmq.REP)
sock.bind("tcp://127.0.0.1:5677")
data = sock.recv().decode("utf-8")
res = "Recvd"
sock.send(res.encode('utf-8'))
Code of the server 2:
# Run the server
while True:
# Define the socket using the "Context"
sock = context.socket(zmq.REP)
sock.bind("tcp://127.0.0.1:5877")
data = sock.recv().decode("utf-8")
res = "Recvd"
sock.send(res.encode('utf-8'))
Code of client:
# ZeroMQ Context For distributed Message amogst processes
context = zmq.Context()
sock_1 = context.socket(zmq.REQ)
sock_2 = context.socket(zmq.REQ)
sock_1.connect("tcp://127.0.0.1:5677")
sock_2.connect("tcp://127.0.0.1:5877")
try:
sock_1.send(data.encode('utf-8'), zmq.NOBLOCK)
socks_1.setsockopt(zmq.RCVTIMEO, 1000)
socks_1.setsockopt(zmq.LINGER, 0)
data = socks_1.recv().decode('utf-8') #receive data from the main node
except:
try:
#when server one fails
sock_2.send(data.encode('utf-8'), zmq.NOBLOCK)
socks_2.setsockopt(zmq.RCVTIMEO, 1000)
socks_2.setsockopt(zmq.LINGER, 0)
data = socks_2.recv().decode('utf-8')
except Exception as e:
print(str(e))
What is the problem with this approach?
How can I resolve this?
Q: How can I resolve this?A: Avoid the known risk of REQ/REP deadlocking!
While the ZeroMQ is a powerful framework, understanding its internal composition is necessary for robust and reliable distributed systems design and prototyping.
After a closer look, using a common REQ/REP Formal Communication Pattern may leave ( and does leave ) counter-parties in a mutual dead-lock: where one is expecting the other to do a step, which will be never accomplished, and there is no way to escape from the deadlocked state.
For more illustrated details and FSA-schematic diagram, see this post
Next, a fail-over system has to survive any collisions of its own components. Thus, one has to design well the distributed system state-signalling and avoid as many dependencies on element-FSA-design/stepping/blocking as possible, otherwise, the fail-safe behaviour remains just an illusion.
Always handle resources with care, do not consider components of the ZeroMQ smart-signalling/messaging as any kind of "expendable disposables", doing so might be tolerated in scholar examples, not in production system environments. You still have to pay the costs ( time, resources allocations / de-allocations / garbage-collection(s) ). As noted in comments, never let resources creation/allocation without a due control. while True: .socket(); .bind(); .send(); is brutally wrong in principle and deteriorating the rest of the design.
On server side, "receive" and "send" pair is critical. I was facing a simiar issue, while socket.send was missed.
def zmq_listen():
global counter
message = socket_.recv().decode("utf-8")
logger.info(f"[{counter}] Message: {message}")
request = json.loads(message)
request["msg_id"] = f"m{counter}"
ack = {"msg_id": request["msg_id"]}
socket_.send(json.dumps(ack).encode("utf-8"))
return request
Implement the lazy pirate pattern. Create a new socket from your context when an error is caught, before trying to send the message again.
The pretty good brute force solution is to close and reopen the REQ
socket after an error
Here is a python example.
#
# Author: Daniel Lundin <dln(at)eintr(dot)org>
#
from __future__ import print_function
import zmq
REQUEST_TIMEOUT = 2500
REQUEST_RETRIES = 3
SERVER_ENDPOINT = "tcp://localhost:5555"
context = zmq.Context(1)
print("I: Connecting to server…")
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
poll = zmq.Poller()
poll.register(client, zmq.POLLIN)
sequence = 0
retries_left = REQUEST_RETRIES
while retries_left:
sequence += 1
request = str(sequence).encode()
print("I: Sending (%s)" % request)
client.send(request)
expect_reply = True
while expect_reply:
socks = dict(poll.poll(REQUEST_TIMEOUT))
if socks.get(client) == zmq.POLLIN:
reply = client.recv()
if not reply:
break
if int(reply) == sequence:
print("I: Server replied OK (%s)" % reply)
retries_left = REQUEST_RETRIES
expect_reply = False
else:
print("E: Malformed reply from server: %s" % reply)
else:
print("W: No response from server, retrying…")
# Socket is confused. Close and remove it.
client.setsockopt(zmq.LINGER, 0)
client.close()
poll.unregister(client)
retries_left -= 1
if retries_left == 0:
print("E: Server seems to be offline, abandoning")
break
print("I: Reconnecting and resending (%s)" % request)
# Create new connection
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
poll.register(client, zmq.POLLIN)
client.send(request)
context.term()
I have 3 programs written in Python, which need to be connected. 2 programs X and Y gather some information, which are sent by them to program Z. Program Z analyzes the data and send to program X and Y some decisions. Number of programs similar to X and Y will be expanded in the future. Initially I used named pipe to allow communication from X, Y to Z. But as you can see, I need bidirectional relation. My boss told me to use ZeroMQ. I have just found pattern for my use case, which is called Asynchronous Client/Server. Please see code from ZMQ book (http://zguide.zeromq.org/py:all) below.
The problem is my boss does not want to use any threads, forks etc. I moved client and server tasks to separate programs, but I am not sure what to do with ServerWorker class. Can this be somehow used without threads? Also, I am wondering, how to establish optimal workers amount.
import zmq
import sys
import threading
import time
from random import randint, random
__author__ = "Felipe Cruz <felipecruz#loogica.net>"
__license__ = "MIT/X11"
def tprint(msg):
"""like print, but won't get newlines confused with multiple threads"""
sys.stdout.write(msg + '\n')
sys.stdout.flush()
class ClientTask(threading.Thread):
"""ClientTask"""
def __init__(self, id):
self.id = id
threading.Thread.__init__ (self)
def run(self):
context = zmq.Context()
socket = context.socket(zmq.DEALER)
identity = u'worker-%d' % self.id
socket.identity = identity.encode('ascii')
socket.connect('tcp://localhost:5570')
print('Client %s started' % (identity))
poll = zmq.Poller()
poll.register(socket, zmq.POLLIN)
reqs = 0
while True:
reqs = reqs + 1
print('Req #%d sent..' % (reqs))
socket.send_string(u'request #%d' % (reqs))
for i in range(5):
sockets = dict(poll.poll(1000))
if socket in sockets:
msg = socket.recv()
tprint('Client %s received: %s' % (identity, msg))
socket.close()
context.term()
class ServerTask(threading.Thread):
"""ServerTask"""
def __init__(self):
threading.Thread.__init__ (self)
def run(self):
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind('tcp://*:5570')
backend = context.socket(zmq.DEALER)
backend.bind('inproc://backend')
workers = []
for i in range(5):
worker = ServerWorker(context)
worker.start()
workers.append(worker)
poll = zmq.Poller()
poll.register(frontend, zmq.POLLIN)
poll.register(backend, zmq.POLLIN)
while True:
sockets = dict(poll.poll())
if frontend in sockets:
ident, msg = frontend.recv_multipart()
tprint('Server received %s id %s' % (msg, ident))
backend.send_multipart([ident, msg])
if backend in sockets:
ident, msg = backend.recv_multipart()
tprint('Sending to frontend %s id %s' % (msg, ident))
frontend.send_multipart([ident, msg])
frontend.close()
backend.close()
context.term()
class ServerWorker(threading.Thread):
"""ServerWorker"""
def __init__(self, context):
threading.Thread.__init__ (self)
self.context = context
def run(self):
worker = self.context.socket(zmq.DEALER)
worker.connect('inproc://backend')
tprint('Worker started')
while True:
ident, msg = worker.recv_multipart()
tprint('Worker received %s from %s' % (msg, ident))
replies = randint(0,4)
for i in range(replies):
time.sleep(1. / (randint(1,10)))
worker.send_multipart([ident, msg])
worker.close()
def main():
"""main function"""
server = ServerTask()
server.start()
for i in range(3):
client = ClientTask(i)
client.start()
server.join()
if __name__ == "__main__":
main()
So, you grabbed the code from here: Asynchronous Client/Server Pattern
Pay close attention to the images that show you the model this code is targeted to. In particular, look at "Figure 38 - Detail of Asynchronous Server". The ServerWorker class is spinning up 5 "Worker" nodes. In the code, those nodes are threads, but you could make them completely separate programs. In that case, your server program (probably) wouldn't be responsible for spinning them up, they'd spin up separately and just communicate to your server that they are ready to receive work.
You'll see this often in ZMQ examples, a multi-node topology mimicked in threads in a single executable. It's just to make reading the whole thing easy, it's not always intended to be used that way.
For your particular case, it could make sense to have the workers be threads or to break them out into separate programs... but if it's a business requirement from your boss, then just break them out into separate programs.
Of course, to answer your second question, there's no way to know how many workers would be optimal without understanding the work load they'll be performing and how quickly they'll need to respond... your goal is to have the worker complete the work faster than new work is received. There's a fair chance, in many cases, that that can be accomplished with a single worker. If so, you can have your server itself be the worker, and just skip the entire "worker tier" of the architecture. You should start there, for the sake of simplicity, and just do some load testing to see if it will actually cope with your workload effectively. If not, get a sense of how long it takes to complete a task, and how quickly tasks are coming in. Let's say a worker can complete a task in 15 seconds. That's 4 tasks a minute. If tasks are coming in 5 tasks a minute, you need 2 workers, and you'll have a little headroom to grow. If things are wildly variable, then you'll have to make a decision about resources vs. reliability.
Before you get too much farther down the trail, make sure you read Chapter 4, Reliable Request/Reply Patterns, it will provide some insight for handling exceptions, and might give you a better pattern to follow.
I'm developing a Flask/gevent WSGIserver webserver that needs to communicate (in the background) with a hardware device over two sockets using XML.
One socket is initiated by the client (my application) and I can send XML commands to the device. The device answers on a different port and sends back information that my application has to confirm. So my application has to listen to this second port.
Up until now I have issued a command, opened the second port as a server, waited for a response from the device and closed the second port.
The problem is that it's possible that the device sends multiple responses that I have to confirm. So my solution was to keep the port open and keep responding to incoming requests. However, in the end the device is done sending requests, and my application is still listening (I don't know when the device is done), thereby blocking everything else.
This seemed like a perfect use case for a thread, so that my application launches a listening server in a separate thread. Because I'm already using gevent as a WSGI server for Flask, I can use the greenlets.
The problem is, I have looked for a good example of such a thing, but all I can find is examples of multi-threading handlers for a single socket server. I don't need to handle a lot of connections on the socket server, but I need it launched in a separate thread so it can listen for and handle incoming messages while my main program can keep sending messages.
The second problem I'm running into is that in the server, I need to use some methods from my "main" class. Being relatively new to Python I'm unsure how to structure it in a way to make that possible.
class Device(object):
def __init__(self, ...):
self.clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def _connect_to_device(self):
print "OPEN CONNECTION TO DEVICE"
try:
self.clientsocket.connect((self.ip, 5100))
except socket.error as e:
pass
def _disconnect_from_device(self):
print "CLOSE CONNECTION TO DEVICE"
self.clientsocket.close()
def deviceaction1(self, ...):
# the data that is sent is an XML document that depends on the parameters of this method.
self._connect_to_device()
self._send_data(XMLdoc)
self._wait_for_response()
return True
def _send_data(self, data):
print "SEND:"
print(data)
self.clientsocket.send(data)
def _wait_for_response(self):
print "WAITING FOR REQUESTS FROM DEVICE (CHANNEL 1)"
self.serversocket.bind(('10.0.0.16', 5102))
self.serversocket.listen(5) # listen for answer, maximum 5 connections
connection, address = self.serversocket.accept()
# the data is of a specific length I can calculate
if len(data) > 0:
self._process_response(data)
self.serversocket.close()
def _process_response(self, data):
print "RECEIVED:"
print(data)
# here is some code that processes the incoming data and
# responds to the device
# this may or may not result in more incoming data
if __name__ == '__main__':
machine = Device(ip="10.0.0.240")
Device.deviceaction1(...)
This is (globally, I left out sensitive information) what I'm doing now. As you can see everything is sequential.
If anyone can provide an example of a listening server in a separate thread (preferably using greenlets) and a way to communicate from the listening server back to the spawning thread, it would be of great help.
Thanks.
EDIT:
After trying several methods, I decided to use Pythons default select() method to solve this problem. This worked, so my question regarding the use of threads is no longer relevant. Thanks for the people who provided input for your time and effort.
Hope it can provide some help, In example class if we will call tenMessageSender function then it will fire up an async thread without blocking main loop and then _zmqBasedListener will start listening on separate port untill that thread is alive. and whatever message our tenMessageSender function will send, those will be received by client and respond back to zmqBasedListener.
Server Side
import threading
import zmq
import sys
class Example:
def __init__(self):
self.context = zmq.Context()
self.publisher = self.context.socket(zmq.PUB)
self.publisher.bind('tcp://127.0.0.1:9997')
self.subscriber = self.context.socket(zmq.SUB)
self.thread = threading.Thread(target=self._zmqBasedListener)
def _zmqBasedListener(self):
self.subscriber.connect('tcp://127.0.0.1:9998')
self.subscriber.setsockopt(zmq.SUBSCRIBE, "some_key")
while True:
message = self.subscriber.recv()
print message
sys.exit()
def tenMessageSender(self):
self._decideListener()
for message in range(10):
self.publisher.send("testid : %d: I am a task" %message)
def _decideListener(self):
if not self.thread.is_alive():
print "STARTING THREAD"
self.thread.start()
Client
import zmq
context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.connect('tcp://127.0.0.1:9997')
publisher = context.socket(zmq.PUB)
publisher.bind('tcp://127.0.0.1:9998')
subscriber.setsockopt(zmq.SUBSCRIBE, "testid")
count = 0
print "Listener"
while True:
message = subscriber.recv()
print message
publisher.send('some_key : Message received %d' %count)
count+=1
Instead of thread you can use greenlet etc.
Can someone point me to an example of a REQ/REP non-blocking ZeroMQ (0MQ) with Python bindings? Perhaps my understanding of ZMQ is faulty but I couldn't find an example online.
I have a server in Node.JS that sends work from multiple clients to the server. The idea is that the server can spin up a bunch of jobs that operate in parallel instead of processing data for one client followed by the next
You can use for this goal both zmq.Poller (many examples you can find in zguide repo, eg rrbroker.py) or gevent-zeromq implementation (code sample).
The example provided in the accepted answer gives the gist of it, but you can get away with something a bit simpler as well by using zmq.device for the broker while otherwise sticking to the "Extended Request-Reply" pattern from the guide. As such, a hello worldy example for the server could look something like the following:
import time
import threading
import zmq
context = zmq.Context()
def worker():
socket = context.socket(zmq.REP)
socket.connect('inproc://workers')
while True:
msg = socket.recv_string()
print(f'Received request: [{msg}]')
time.sleep(1)
socket.send_string(msg)
url_client = 'tcp://*:5556'
clients = context.socket(zmq.ROUTER)
clients.bind(url_client)
workers = context.socket(zmq.DEALER)
workers.bind('inproc://workers')
for _ in range(4):
thread = threading.Thread(target=worker)
thread.start()
zmq.device(zmq.QUEUE, clients, workers)
Here we're letting four workers handle incoming requests in parallel. Now, you're using Node on the client side, but just to keep the example complete, one can use the Python client below to see that this works. Here, we're creating 10 requests which will then be handled in 3 batches:
import zmq
import threading
context = zmq.Context()
def make_request(a):
socket = context.socket(zmq.REQ)
socket.connect('tcp://localhost:5556')
print(f'Sending request {a} ...')
socket.send_string(str(a))
message = socket.recv_string()
print(f'Received reply from request {a} [{message}]')
for a in range(10):
thread = threading.Thread(target=make_request, args=(a,))
thread.start()