I am trying to get a python program to communicate with another python program via zeromq by using the request-reply pattern. The client program should send a request to the server program which replies.
I have two servers such that when one server fails the other takes over. Communication works perfect when the first server works, however, when the first server fails and when I make a request to the second server, I see the error:
zmp.error.ZMQError: Operation cannot be accomplished in current state
Code of the server 1:
# Run the server
while True:
# Define the socket using the "Context"
sock = context.socket(zmq.REP)
sock.bind("tcp://127.0.0.1:5677")
data = sock.recv().decode("utf-8")
res = "Recvd"
sock.send(res.encode('utf-8'))
Code of the server 2:
# Run the server
while True:
# Define the socket using the "Context"
sock = context.socket(zmq.REP)
sock.bind("tcp://127.0.0.1:5877")
data = sock.recv().decode("utf-8")
res = "Recvd"
sock.send(res.encode('utf-8'))
Code of client:
# ZeroMQ Context For distributed Message amogst processes
context = zmq.Context()
sock_1 = context.socket(zmq.REQ)
sock_2 = context.socket(zmq.REQ)
sock_1.connect("tcp://127.0.0.1:5677")
sock_2.connect("tcp://127.0.0.1:5877")
try:
sock_1.send(data.encode('utf-8'), zmq.NOBLOCK)
socks_1.setsockopt(zmq.RCVTIMEO, 1000)
socks_1.setsockopt(zmq.LINGER, 0)
data = socks_1.recv().decode('utf-8') #receive data from the main node
except:
try:
#when server one fails
sock_2.send(data.encode('utf-8'), zmq.NOBLOCK)
socks_2.setsockopt(zmq.RCVTIMEO, 1000)
socks_2.setsockopt(zmq.LINGER, 0)
data = socks_2.recv().decode('utf-8')
except Exception as e:
print(str(e))
What is the problem with this approach?
How can I resolve this?
Q: How can I resolve this?A: Avoid the known risk of REQ/REP deadlocking!
While the ZeroMQ is a powerful framework, understanding its internal composition is necessary for robust and reliable distributed systems design and prototyping.
After a closer look, using a common REQ/REP Formal Communication Pattern may leave ( and does leave ) counter-parties in a mutual dead-lock: where one is expecting the other to do a step, which will be never accomplished, and there is no way to escape from the deadlocked state.
For more illustrated details and FSA-schematic diagram, see this post
Next, a fail-over system has to survive any collisions of its own components. Thus, one has to design well the distributed system state-signalling and avoid as many dependencies on element-FSA-design/stepping/blocking as possible, otherwise, the fail-safe behaviour remains just an illusion.
Always handle resources with care, do not consider components of the ZeroMQ smart-signalling/messaging as any kind of "expendable disposables", doing so might be tolerated in scholar examples, not in production system environments. You still have to pay the costs ( time, resources allocations / de-allocations / garbage-collection(s) ). As noted in comments, never let resources creation/allocation without a due control. while True: .socket(); .bind(); .send(); is brutally wrong in principle and deteriorating the rest of the design.
On server side, "receive" and "send" pair is critical. I was facing a simiar issue, while socket.send was missed.
def zmq_listen():
global counter
message = socket_.recv().decode("utf-8")
logger.info(f"[{counter}] Message: {message}")
request = json.loads(message)
request["msg_id"] = f"m{counter}"
ack = {"msg_id": request["msg_id"]}
socket_.send(json.dumps(ack).encode("utf-8"))
return request
Implement the lazy pirate pattern. Create a new socket from your context when an error is caught, before trying to send the message again.
The pretty good brute force solution is to close and reopen the REQ
socket after an error
Here is a python example.
#
# Author: Daniel Lundin <dln(at)eintr(dot)org>
#
from __future__ import print_function
import zmq
REQUEST_TIMEOUT = 2500
REQUEST_RETRIES = 3
SERVER_ENDPOINT = "tcp://localhost:5555"
context = zmq.Context(1)
print("I: Connecting to server…")
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
poll = zmq.Poller()
poll.register(client, zmq.POLLIN)
sequence = 0
retries_left = REQUEST_RETRIES
while retries_left:
sequence += 1
request = str(sequence).encode()
print("I: Sending (%s)" % request)
client.send(request)
expect_reply = True
while expect_reply:
socks = dict(poll.poll(REQUEST_TIMEOUT))
if socks.get(client) == zmq.POLLIN:
reply = client.recv()
if not reply:
break
if int(reply) == sequence:
print("I: Server replied OK (%s)" % reply)
retries_left = REQUEST_RETRIES
expect_reply = False
else:
print("E: Malformed reply from server: %s" % reply)
else:
print("W: No response from server, retrying…")
# Socket is confused. Close and remove it.
client.setsockopt(zmq.LINGER, 0)
client.close()
poll.unregister(client)
retries_left -= 1
if retries_left == 0:
print("E: Server seems to be offline, abandoning")
break
print("I: Reconnecting and resending (%s)" % request)
# Create new connection
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
poll.register(client, zmq.POLLIN)
client.send(request)
context.term()
Related
So first I'll describe what I am doing.
A game is providing a webinterface but only on IPv4 and I would like people out in the internet to reach it too. From my ISP I however only get public IPv6. And since I couldn't find anything on the internet to translate requests and responses I wrote a little app that does. So IPv6 requests get forwarded to the webserver and the webservers responses get translated back to IPv6. That's working fine.
The only troubling bit is that... not all requests get detected or whatever is happening, like I first visit the webpage and sometimes it just hangs and says that it's waiting for a style.css file but when I look at the console output there's no reported connection. And generally there's just a whole lot of delay when you try doing something with the webinterface.
Then here is my code: (A word of warning I don't really know what everything with the networking exactly does, the stuff around the sending I especially don't quite understand if it's even needed, I just found it online)
def handle_request(return_address):
ipv4side = socket.create_connection(("127.0.0.1", 7245))
request = return_address.recv(2048)
print(request)
request = str(request, 'utf-8')
p = re.compile('\\[[^]]*]:7250')
m = p.search(request)
request = request.replace(request[m.start():m.end()], '127.0.0.1:7245')
request = request.encode('utf-8')
msg_len = len(request)
totalsent = 0
while totalsent < msg_len:
sent = ipv4side.send(request[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent += sent
while True:
response = ipv4side.recv(2048)
if len(response) == 0:
ipv4side.close()
return
msg_len = len(response)
totalsent = 0
while totalsent < msg_len:
sent = return_address.send(response[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent += sent
ipv6side = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
ipv6side.bind((IPV6, 7250))
ipv6side.listen(20)
ipv6side.settimeout(30)
while True:
try:
connected_socket = ipv6side.accept()[0]
print("NEW CONNECTION!" + str(connected_socket))
Thread(target=handle_request, args=(connected_socket,)).start()
except socket.timeout:
print("nothing new...")
I hope anyone can help me with this :D
I fixed the problem! (I think)
So, I followed what user253751 said about how HTTP is allowed to use the same connection for more than one request so I made adjustments and also so that that works it now has to detect the end-of-response/request (eor) for the responses and requests.
First thing I did was wrap the whole handle_request code in a while statement for the multiple requests on one connection thing.
For the end-of-response I am using regex
eor = re.compile('\\r\\n\\r\\n\\Z')
and then at the appropriate place:
eorm = eor.search(str(response, 'ISO-8859-1', ignore)) # I replaced the utf-8
# by the correct ISO-
# 8859-1 used for HTTP
# just to be safe
if eorm or len(response) == 0:
safe_send(return_address, response)
ipv4side.close()
break
And for the end-of-request it's basically the same just it only looks that the request is 0 length.
The code responsible for sending I put into the safe_send function that takes a connection and a msg.
Aaand I coded it so that in case the server aborts a connection for some reason and thus throws an error when trying to receive, it resends the request on a new connection.
try:
response = ipv4side.recv(2048)
except ConnectionAbortedError:
ipv4side = socket.create_connection(("127.0.0.1", 7245))
safe_send(ipv4side, request)
continue
I hope this is a good explanation :]
I am trying to adopt the ZeroMQ asynchronous client-server pattern described here with python multiprocessing. A brief description in the ZeroMQ guide
It's a DEALER/ROUTER for the client to server frontend communication and DEALER/DEALER for the server backend to the server workers communication. The server frontend and backend are connected using a zmq.proxy()-instance.
Instead of using threads, I want to use multiprocessing on the server. But requests from the client do not reach the server workers. However, they do reach the server frontend. And also the backend. But the backend is not able to connect to the server workers.
How do we generally debug these issues in pyzmq?How to turn on verbose logging for the sockets?
The python code snippets I am using -
server.py
import zmq
import time
from multiprocessing import Process
def run(context, worker_id):
socket = context.socket(zmq.DEALER)
socket.connect("ipc://backend.ipc")
print(f"Worker {worker_id} started")
try:
while True:
ident, msg = socket.recv_multipart()
print("Worker received %s from %s" % (msg, "ident"))
time.sleep(5)
socket.send_multipart([ident, msg])
print("Worker sent %s from %s" % (msg, ident))
except:
socket.close()
if __name__ == "__main__":
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind("tcp://*:5570")
backend = context.socket(zmq.DEALER)
backend.bind("ipc://backend.ipc")
N_WORKERS = 7
jobs = []
try:
for worker_id in range(N_WORKERS):
job = Process(target=run, args=(context, worker_id,))
jobs.append(job)
job.start()
zmq.proxy(frontend, backend)
for job in jobs:
job.join()
except:
frontend.close()
backend.close()
context.term()
client.py
import re
import zmq
from uuid import uuid4
if __name__ == "__main__":
context = zmq.Context()
socket = context.socket(zmq.DEALER)
identity = uuid4()
socket.identity = identity.encode("ascii")
socket.connect("tcp://localhost:5570")
poll = zmq.Poller()
poll.register(socket, zmq.POLLIN)
request = {
"body": "Some request body.",
}
socket.send_string(json.dumps(request))
while True:
for i in range(5):
sockets = dict(poll.poll(10))
if socket in sockets:
msg = socket.recv()
print(msg)
Q : "How to turn on verbose logging for the sockets?"
Start using the published native API socket_monitor() for all relevant details, reported as events arriving from socket-(instance)-under-monitoring.
Q : "How do we generally debug these issues in pyzmq?"
There is no general strategy on doing this. Having gone into a domain of a distributed-computing, you will almost always create your own, project-specific, tools for "collecting" & "viewing/interpreting" a time-ordered flow of (principally) distributed-events.
Last but not least : avoid trying to share a Context()-instance, the less "among" 8 processes
The Art of Zen of Zero strongly advocates to avoid any shape and form of sharing. Here, the one and the very same Context()-instance is referenced ("shared") via a multiprocessing.Process's process-instantiation call-signature interface, which does not make the inter-process-"sharing" work.
One may let each spawned process-instance create it's own Context()-instance and use it from inside its private space during its own life-cycle.
Btw, your code ignores any return-codes, documented in the native API, that help you handle ( in worse cases debug post-mortem ) what goes alongside the distributed-computing. The try: ... except: ... finally: scaffolding also helps a lot here.
Anyway, the sooner you will learn to stop using the blocking-forms of the { .send() | .recv() | .poll() }-methods, the better your code starts to re-use the actual powers of the ZeroMQ.
I have some simple python code which waits for messages for a topic.
However...when running this, the python process will hog CPU. I know with other languages, such as with sockets, there is a way to wait for messages without eating the entire processing power. Essentially the thread just remains halted waiting for a response. Is that possible with ZeroMQ?
import zmq
import sys
port = "5556"
if len(sys.argv) > 1:
port = sys.argv[1]
int(port)
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
print "Collecting updates from weather server..."
socket.connect ("tcp://localhost:%s" % port)
# Subscribe to zipcode, default is NYC, 10001
topicfilter = "10001"
socket.setsockopt(zmq.SUBSCRIBE, topicfilter)
# Process 5 updates
total_value = 0
for update_nbr in range (5):
string = socket.recv()
topic, messagedata = string.split()
total_value += int(messagedata)
print ('{} {}'.format(topic, message)
Yes, this is possible.
The code above is not a reproducible MCVE/MWE-representation of the claimed problem.
First : the code explicitly blocks. Using a .recv()-method is blocking until any new, matching message arrives ( if ever ).
That does not overload the CPU. It just keeps sitting on the SUB-side waiting, until a first such message ( if any ) arrives.
If interested in a better scheme, do not use .recv()-method this way, but rather detect any such POSACK-acknowledgement with a .poll()-method & read a POSACK-ed case with .recv( zmq.NOBLOCK )-method.
I am trying to establish a connection between a Python server code that generate data and a javascript client that display it.
All answer that I see are by redefining some class and you need quiet some knowledge in network to do this.
I see this problem as a simple utilisation of websocket so I guess their is some simple utilisation I am not seeing without having to use classes...
my problem can be reduced at this code :
# definition of the websocket
application = web.Application([(r'/websocket',WebSocketHandler)])
application.listen(8001)
while True:
val = generate_data()
application.send(val) #it's this part that i miss
The real problem here is that my function generate_data() is a bit complicated and display some stuff, now passing it in a function of the class WShandler(..) is just not the right approach it's complicated and most of all not working. Further more the timing of when to send the data must be handle by my while loop not by some timeout that I will define.
Using the library websocket-server https://github.com/Pithikos/python-websocket-server it is possible
you just have to redifine the function run_forever in websocket server.py from :
logger.info("Listening on port %d for clients.." % self.port)
self.serve_forever()
to :
logger.info("Listening on port %d for clients.." % self.port)
# self.serve_forever()
_serve = self.serve_forever
server_thread = threading.Thread(target=_serve)
# server_thread.daemon = True
server_thread.start()
After in your code:
r = WebsocketServer(8001, host='localhost', loglevel=logging.INFO)
server.run_forever()
while True:
val = generate_data()
server.send_message_to_all(val)
And it works like a charm. Maybe their is a way to do the same with tornado..
source:
https://github.com/Pithikos/python-websocket-server/issues/30
Threaded, non-blocking websocket client
I just got started with ZMQ. I am designing an app whose workflow is:
one of many clients (who have random PULL addresses) PUSH a request to a server at 5555
the server is forever waiting for client PUSHes. When one comes, a worker process is spawned for that particular request. Yes, worker processes can exist concurrently.
When that process completes it's task, it PUSHes the result to the client.
I assume that the PUSH/PULL architecture is suited for this. Please correct me on this.
But how do I handle these scenarios?
the client_receiver.recv() will wait for an infinite time when server fails to respond.
the client may send request, but it will fail immediately after, hence a worker process will remain stuck at server_sender.send() forever.
So how do I setup something like a timeout in the PUSH/PULL model?
EDIT: Thanks user938949's suggestions, I got a working answer and I am sharing it for posterity.
If you are using zeromq >= 3.0, then you can set the RCVTIMEO socket option:
client_receiver.RCVTIMEO = 1000 # in milliseconds
But in general, you can use pollers:
poller = zmq.Poller()
poller.register(client_receiver, zmq.POLLIN) # POLLIN for recv, POLLOUT for send
And poller.poll() takes a timeout:
evts = poller.poll(1000) # wait *up to* one second for a message to arrive.
evts will be an empty list if there is nothing to receive.
You can poll with zmq.POLLOUT, to check if a send will succeed.
Or, to handle the case of a peer that might have failed, a:
worker.send(msg, zmq.NOBLOCK)
might suffice, which will always return immediately - raising a ZMQError(zmq.EAGAIN) if the send could not complete.
This was a quick hack I made after I referred user938949's answer and http://taotetek.wordpress.com/2011/02/02/python-multiprocessing-with-zeromq/ . If you do better, please post your answer, I will recommend your answer.
For those wanting lasting solutions on reliability, refer http://zguide.zeromq.org/page:all#toc64
Version 3.0 of zeromq (beta ATM) supports timeout in ZMQ_RCVTIMEO and ZMQ_SNDTIMEO. http://api.zeromq.org/3-0:zmq-setsockopt
Server
The zmq.NOBLOCK ensures that when a client does not exist, the send() does not block.
import time
import zmq
context = zmq.Context()
ventilator_send = context.socket(zmq.PUSH)
ventilator_send.bind("tcp://127.0.0.1:5557")
i=0
while True:
i=i+1
time.sleep(0.5)
print ">>sending message ",i
try:
ventilator_send.send(repr(i),zmq.NOBLOCK)
print " succeed"
except:
print " failed"
Client
The poller object can listen in on many recieving sockets (see the "Python Multiprocessing with ZeroMQ" linked above. I linked it only on work_receiver. In the infinite loop, the client polls with an interval of 1000ms. The socks object returns empty if no message has been recieved in that time.
import time
import zmq
context = zmq.Context()
work_receiver = context.socket(zmq.PULL)
work_receiver.connect("tcp://127.0.0.1:5557")
poller = zmq.Poller()
poller.register(work_receiver, zmq.POLLIN)
# Loop and accept messages from both channels, acting accordingly
while True:
socks = dict(poller.poll(1000))
if socks:
if socks.get(work_receiver) == zmq.POLLIN:
print "got message ",work_receiver.recv(zmq.NOBLOCK)
else:
print "error: message timeout"
The send wont block if you use ZMQ_NOBLOCK, but if you try closing the socket and context, this step would block the program from exiting..
The reason is that the socket waits for any peer so that the outgoing messages are ensured to get queued.. To close the socket immediately and flush the outgoing messages from the buffer, use ZMQ_LINGER and set it to 0..
If you're only waiting for one socket, rather than create a Poller, you can do this:
if work_receiver.poll(1000, zmq.POLLIN):
print "got message ",work_receiver.recv(zmq.NOBLOCK)
else:
print "error: message timeout"
You can use this if your timeout changes depending on the situation, instead of setting work_receiver.RCVTIMEO.