My python script constantly has to send messages to RabbitMQ once it receives one from another data source. The frequency in which the python script sends them can vary, say, 1 minute - 30 minutes.
Here's how I establish a connection to RabbitMQ:
rabt_conn = pika.BlockingConnection(pika.ConnectionParameters("some_host"))
channel = rbt_conn.channel()
I just got an exception
pika.exceptions.ConnectionClosed
How can I reconnect to it? What's the best way? Is there any "strategy"? Is there an ability to send pings to keep a connection alive or set timeout?
Any pointers will be appreciated.
RabbitMQ uses heartbeats to detect and close "dead" connections and to prevent network devices (firewalls etc.) from terminating "idle" connections. From version 3.5.5 on, the default timeout is set to 60 seconds (previously it was ~10 minutes). From the docs:
Heartbeat frames are sent about every timeout / 2 seconds. After two missed heartbeats, the peer is considered to be unreachable.
The problem with Pika's BlockingConnection is that it is unable to respond to heartbeats until some API call is made (for example, channel.basic_publish(), connection.sleep(), etc).
The approaches I found so far:
Increase or deactivate the timeout
RabbitMQ negotiates the timeout with the client when establishing the connection. In theory, it should be possible to override the server default value with a bigger one using the heartbeat_interval argument, but the current Pika version (0.10.0) uses the min value between those offered by the server and the client. This issue is fixed on current master.
On the other hand, is possible to deactivate the heartbeat functionality completely by setting the heartbeat_interval argument to 0, which may well drive you into new issues (firewalls dropping connections, etc)
Reconnecting
Expanding on #itsafire's answer, you can write your own publisher class, letting you reconnect when required. An example naive implementation:
import logging
import json
import pika
class Publisher:
EXCHANGE='my_exchange'
TYPE='topic'
ROUTING_KEY = 'some_routing_key'
def __init__(self, host, virtual_host, username, password):
self._params = pika.connection.ConnectionParameters(
host=host,
virtual_host=virtual_host,
credentials=pika.credentials.PlainCredentials(username, password))
self._conn = None
self._channel = None
def connect(self):
if not self._conn or self._conn.is_closed:
self._conn = pika.BlockingConnection(self._params)
self._channel = self._conn.channel()
self._channel.exchange_declare(exchange=self.EXCHANGE,
type=self.TYPE)
def _publish(self, msg):
self._channel.basic_publish(exchange=self.EXCHANGE,
routing_key=self.ROUTING_KEY,
body=json.dumps(msg).encode())
logging.debug('message sent: %s', msg)
def publish(self, msg):
"""Publish msg, reconnecting if necessary."""
try:
self._publish(msg)
except pika.exceptions.ConnectionClosed:
logging.debug('reconnecting to queue')
self.connect()
self._publish(msg)
def close(self):
if self._conn and self._conn.is_open:
logging.debug('closing queue connection')
self._conn.close()
Other possibilities
Other possibilities which I yet didn't explore:
Using an asynchronous adapter for publishing
Keeping your RabbitMQ connection and your "publish" code on a background thread, which calls periodically connection.sleep() to responde to server heartbeats.
Dead simple: some pattern like this.
import time
while True:
try:
communication_handles = connect_pika()
do_your_stuff(communication_handles)
except pika.exceptions.ConnectionClosed:
print 'oops. lost connection. trying to reconnect.'
# avoid rapid reconnection on longer RMQ server outage
time.sleep(0.5)
You will probably have to re-factor your code, but basically it is about catching the exception, mitigate the problem and continue doing your stuff.
The communication_handles contain all the pika elements like channels, queues and whatever that your stuff needs to communicate with RabbitMQ via pika.
Related
I have a fake HTTP server that I use as a fixture in my testing. At some point in the test, I want to stop the server regardless of any still open connections. Clients on these open connections should get a TCP FIN.
I am aware that usually production servers need to solve different problem, that of quiescing, sometimes called graceful shutdown. This is the opposite of what I want.
With a standalone process, it is usually possible to simply get the process to quit and the OS will take care of the rest. (Forcibly killing processes is easy, while forcibly killing threads is not.) My fake server is, however, running in a thread of the test process itself, so I don't have this option (and I don't want to externalize it if there is other way around).
I investigated this issue in Python, with the HTTPServer class, where I was not able to find any solution.
I also investigated this in Go, where I was able to find the concept of Contexts, which is close to what I need, but it works the other way around: a http server would propagate a Context that can be used to cancel e.g. a database lookup if a client disconnected.
Edit: looks like Go actually does what I need and has a separate graceful and nongraceful shutdown methods, with the nongraceful being net/http#Server.Close.
server = http.server.HTTPServer(...)
thread = threading.Thread(run=server.serve_forever)
thread.start()
# a client has connected ....
server.shutdown()
# at this point I want to have the server stopped,
# without waiting for the request handling to complete
I've implemented the Go solution in Python. When new client connects, I remember the client socket, and when I want to quit, I shutdown all remembered sockets.
It seems to work.
import socket
import http.server.HTTPServer
class MyHTTPServer(HTTPServer):
"""Adds a method to the HTTPServer to allow it to exit gracefully"""
def __init__(self, addr, handler_cls):
super().__init__(addr, handler_cls)
self._client_sockets: List[socket.socket] = []
self.server_killed = False
def get_request(self) -> Tuple[socket.socket, Any]:
"""Remember the client socket"""
sock, addr = super().get_request()
self._client_sockets.append(sock)
return sock, addr
def shutdown_request(self, request: socket.socket) -> None:
"""Forget the client socket"""
self._client_sockets.remove(request)
print(f"{self._client_sockets=}")
super().shutdown_request(request)
def force_disconnect_clients(self) -> None:
"""Shutdown the remembered sockets"""
for client in self._client_sockets:
client.shutdown(socket.SHUT_RDWR)
Usage
server = MyHTTPServer(server_addr, MyRequestHandler)
# in a new thread
while not server.server_killed:
self._server.handle_request()
# ... use the server (keep in mind it can have at most one client at a time) ...
# in the main program
server.server_killed = True
server.force_disconnect_clients()
server.server_close()
Using the pika library's BlockingConnection to connect to RabbitMQ, I occasionally get an error when publishing messages:
Fatal Socket Error: error(32, 'Broken pipe')
This is from a very simple sub-process that takes some information out of an in-memory queue and sends a small JSON message into AMQP. The error only seems to come up when the system hasn't sent any messages for a few minutes.
Setup:
connection = pika.BlockingConnection(parameters)
channel = self.connection.channel()
channel.exchange_declare(
exchange='xyz',
exchange_type='fanout',
passive=False,
durable=True,
auto_delete=False
)
Enqueue code catches any connection errors and retries:
def _enqueue(self, message_id, data):
try:
published = self.channel.basic_publish(
self.amqp_exchange,
self.amqp_routing_key,
json.dumps(data),
pika.BasicProperties(
content_type="application/json",
delivery_mode=2,
message_id=message_id
)
)
# Confirm delivery or retry
if published:
self.retry_count = 0
else:
raise EnqueueException("Message publish not confirmed.")
except (EnqueueException, pika.exceptions.AMQPChannelError, pika.exceptions.AMQPConnectionError,
pika.exceptions.ChannelClosed, pika.exceptions.ConnectionClosed, pika.exceptions.UnexpectedFrameError,
pika.exceptions.UnroutableError, socket.timeout) as e:
self.retry_count += 1
if self.retry_count < 5:
logging.warning("Reconnecting and resending")
if self.connection.is_open:
self.connection.close()
self.connect()
self._enqueue(message_id, data)
else:
raise e
This sometimes works on the second attempt. It often hangs for a while or just throws away messages before eventually throwing an exception (possibly related bug report). Since it only happens when the system is quiet for a few minutes I'm guessing it's due to a connection timeout. But AMQP has a heartbeat system and pika reportedly uses it (related bug report).
Why do I get this error or lose messages, and why won't the connection stay open when not in use?
From another bug report:
As BlockingConnection doesn't handle heartbeats in the background and the heartbeat_interval can't override the servers suggested heartbeat interval (that's a bug too), i suggest that heartbeats should be disabled by default (rely on TCP keep-alive instead).
If processing a task in a consume block takes longer time then the server suggested heartbeat interval, the connection will be closed by the server and the client won't be able to ack the message when it's done processing.
An update in v1.0.0 may help with the issue.
So I implemented a workaround. Every 30 seconds I publish a heartbeat message through the queue. This keeps the connection open and has the added benefit of confirming to clients that my application is up and running.
The Broken Pipe error means that server is trying to write something into the socket when connection is closed on client's side.
As i can see, you have some shared "self.connection" that may be closed before/in parallel thread?
Also you could set up logging level to DEBUG and look at client's log to determine the moment when client closes connection.
I have a rabbitmq server and a amqp consumer (python) using kombu.
I have installed my app in a system that has a firewall that closes idle connections after 1 hour.
This is my amqp_consumer.py:
try:
# connections
with Connection(self.broker_url, ssl=_ssl, heartbeat=self.heartbeat) as conn:
chan = conn.channel()
# more stuff here
with conn.Consumer(queue, callbacks = [messageHandler], channel = chan):
# Process messages and handle events on all channels
while True:
conn.drain_events()
except Exception as e:
# do stuff
what i want is that if the firewall closed the connection, then i want to reconnect. should i use the heartbeat argument or should i pass a timeout argument (of 3600 sec) to the drain_events() function?
What are the differences between both options? (seems to do the same).
Thanks.
The drain_events on it's own would not produce any heartbeats, unless there are messages to consume and acknowledge. If the queue is idle then eventually the connection would be closed (by rabbit server or by your firewall).
What you should do is use both the heartbeat and the timeout like so:
while True:
try:
conn.drain_events(timeout=1)
except socket.timeout:
conn.heartbeat_check()
This way even if the queue is idle the connection won't be closed.
Besides that you might want to wrap the whole thing with a retry policy in case the connection does get closed or some other network error.
I'm developing a Flask/gevent WSGIserver webserver that needs to communicate (in the background) with a hardware device over two sockets using XML.
One socket is initiated by the client (my application) and I can send XML commands to the device. The device answers on a different port and sends back information that my application has to confirm. So my application has to listen to this second port.
Up until now I have issued a command, opened the second port as a server, waited for a response from the device and closed the second port.
The problem is that it's possible that the device sends multiple responses that I have to confirm. So my solution was to keep the port open and keep responding to incoming requests. However, in the end the device is done sending requests, and my application is still listening (I don't know when the device is done), thereby blocking everything else.
This seemed like a perfect use case for a thread, so that my application launches a listening server in a separate thread. Because I'm already using gevent as a WSGI server for Flask, I can use the greenlets.
The problem is, I have looked for a good example of such a thing, but all I can find is examples of multi-threading handlers for a single socket server. I don't need to handle a lot of connections on the socket server, but I need it launched in a separate thread so it can listen for and handle incoming messages while my main program can keep sending messages.
The second problem I'm running into is that in the server, I need to use some methods from my "main" class. Being relatively new to Python I'm unsure how to structure it in a way to make that possible.
class Device(object):
def __init__(self, ...):
self.clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def _connect_to_device(self):
print "OPEN CONNECTION TO DEVICE"
try:
self.clientsocket.connect((self.ip, 5100))
except socket.error as e:
pass
def _disconnect_from_device(self):
print "CLOSE CONNECTION TO DEVICE"
self.clientsocket.close()
def deviceaction1(self, ...):
# the data that is sent is an XML document that depends on the parameters of this method.
self._connect_to_device()
self._send_data(XMLdoc)
self._wait_for_response()
return True
def _send_data(self, data):
print "SEND:"
print(data)
self.clientsocket.send(data)
def _wait_for_response(self):
print "WAITING FOR REQUESTS FROM DEVICE (CHANNEL 1)"
self.serversocket.bind(('10.0.0.16', 5102))
self.serversocket.listen(5) # listen for answer, maximum 5 connections
connection, address = self.serversocket.accept()
# the data is of a specific length I can calculate
if len(data) > 0:
self._process_response(data)
self.serversocket.close()
def _process_response(self, data):
print "RECEIVED:"
print(data)
# here is some code that processes the incoming data and
# responds to the device
# this may or may not result in more incoming data
if __name__ == '__main__':
machine = Device(ip="10.0.0.240")
Device.deviceaction1(...)
This is (globally, I left out sensitive information) what I'm doing now. As you can see everything is sequential.
If anyone can provide an example of a listening server in a separate thread (preferably using greenlets) and a way to communicate from the listening server back to the spawning thread, it would be of great help.
Thanks.
EDIT:
After trying several methods, I decided to use Pythons default select() method to solve this problem. This worked, so my question regarding the use of threads is no longer relevant. Thanks for the people who provided input for your time and effort.
Hope it can provide some help, In example class if we will call tenMessageSender function then it will fire up an async thread without blocking main loop and then _zmqBasedListener will start listening on separate port untill that thread is alive. and whatever message our tenMessageSender function will send, those will be received by client and respond back to zmqBasedListener.
Server Side
import threading
import zmq
import sys
class Example:
def __init__(self):
self.context = zmq.Context()
self.publisher = self.context.socket(zmq.PUB)
self.publisher.bind('tcp://127.0.0.1:9997')
self.subscriber = self.context.socket(zmq.SUB)
self.thread = threading.Thread(target=self._zmqBasedListener)
def _zmqBasedListener(self):
self.subscriber.connect('tcp://127.0.0.1:9998')
self.subscriber.setsockopt(zmq.SUBSCRIBE, "some_key")
while True:
message = self.subscriber.recv()
print message
sys.exit()
def tenMessageSender(self):
self._decideListener()
for message in range(10):
self.publisher.send("testid : %d: I am a task" %message)
def _decideListener(self):
if not self.thread.is_alive():
print "STARTING THREAD"
self.thread.start()
Client
import zmq
context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.connect('tcp://127.0.0.1:9997')
publisher = context.socket(zmq.PUB)
publisher.bind('tcp://127.0.0.1:9998')
subscriber.setsockopt(zmq.SUBSCRIBE, "testid")
count = 0
print "Listener"
while True:
message = subscriber.recv()
print message
publisher.send('some_key : Message received %d' %count)
count+=1
Instead of thread you can use greenlet etc.
I am using the latest pika library(0.9.9+) for rabbitmq. My usage for rabbitmq and pika is as follows :
I have long running tasks (about 5 minutes) as workers. These tasks take their requests from rabbitmq.The requests come very infrequently i.e. there is a long idle time between requests.
The problem i was facing previously is related to idle connections(connection closures due to idle connections). So, I have enabled heartbeat in pika.
Now the selection of heartbeat is a problem. Pika seems to be a single threaded library where heartbeats reception and acknowledgement happens to be done in-between requests time frame.
So, if the heartbeat interval is set less than the time the callback function uses to do its long running computations, the server does not receive any heartbeat acknowledgements and closes the connection.
So, I assume the minimum heartbeat interval should be the maximum computation time of the callback function in a blocking connection.
What can be a good heartbeat value for amazon ec2 to prevent it closing idle connections ?
Also, some suggest to use rabbitmq keepalive (or libkeepalive) to maintain tcp connections. I think managing heartbeats at the tcp layer is much better because the application need not manage them.Is this true ? Is keepalive a good method when compared to RMQ heartbeats ?
I have seen that some suggest using multiple threads and queue for long running tasks. But is this the only option for long running tasks ? It is quite disappointing that another queue must be used for this scenario.
Thank you in advance. I think I have detailed the problem. Let me know if I can provide more details.
If you're not tied to using pika, this thread helped me achieve what you're trying to do using kombu:
#!/usr/bin/env python
import time, logging, weakref, eventlet
from kombu import Connection, Exchange, Queue
from kombu.utils.debug import setup_logging
from kombu.common import eventloop
from eventlet import spawn_after
eventlet.monkey_patch()
log_format = ('%(levelname) -10s %(asctime)s %(name) -30s %(funcName) '
'-35s %(lineno) -5d: %(message)s')
logging.basicConfig(level=logging.INFO, format=log_format)
logger = logging.getLogger('job_worker')
logger.setLevel(logging.INFO)
def long_running_function(body):
time.sleep(300)
def job_worker(body, message):
long_running_function(body)
message.ack()
def monitor_heartbeats(connection, rate=2):
"""Function to send heartbeat checks to RabbitMQ. This keeps the
connection alive over long-running processes."""
if not connection.heartbeat:
logger.info("No heartbeat set for connection: %s" % connection.heartbeat)
return
interval = connection.heartbeat
cref = weakref.ref(connection)
logger.info("Starting heartbeat monitor.")
def heartbeat_check():
conn = cref()
if conn is not None and conn.connected:
conn.heartbeat_check(rate=rate)
logger.info("Ran heartbeat check.")
spawn_after(interval, heartbeat_check)
return spawn_after(interval, heartbeat_check)
def main():
setup_logging(loglevel='INFO')
# process for heartbeat monitor
p = None
try:
with Connection('amqp://guest:guest#localhost:5672//', heartbeat=300) as conn:
conn.ensure_connection()
monitor_heartbeats(conn)
queue = Queue('job_queue',
Exchange('job_queue', type='direct'),
routing_key='job_queue')
logger.info("Starting worker.")
with conn.Consumer(queue, callbacks=[job_worker]) as consumer:
consumer.qos(prefetch_count=1)
for _ in eventloop(conn, timeout=1, ignore_timeouts=True):
pass
except KeyboardInterrupt:
logger.info("Worker was shut down.")
if __name__ == "__main__":
main()
I stripped out my domain specific code but essentially this is the framework I use.