I've been trying to figure out which form of connection i should use when using pika, I've got two alternatives as far as I understand.
Either the BlockingConnection or the SelectConnection, however I'm not really sure about the differences between these two (i.e. what is the BlockingConnection blocking? and more)
The documentation for pika says that SelectConnection is the preferred way to connect to rabbit since it provides "multiple event notification methods including select, epoll, kqueue and poll."
So I'm wondering what are the implications of these two different kinds of connections?
PS: I know I shouldn't put a tag in the title but in this case I think it does help to clarify the question.
The SelectConnection is useful if your application architecture can benefit from an asynchronous design, e.g. doing something else while the RabbitMQ IO completes (e.g. switch to some other IO etc) . This type of connection uses callbacks to indicate when functions return. For example you can declare callbacks for
on_connected, on_channel_open, on_exchange_declared, on_queue_declared etc.
...to perform operations when these events are triggered.
The benefit is especially good if your RabbitMQ server (or connection to that server) is slow or overloaded.
BlockingConnection on the hand is just that - it blocks until the called function returns. so it will block the execution thread until connected or channel_open or exchange_declared or queue_declared return for example. That said, its often simpler to program this sort of serialized logic than the async SelectConnection logic. For simple apps with responsive RabbitMQ servers these also work OK IMO.
I suppose you've read the Pika documentation already http://pika.readthedocs.io/en/stable/intro.html, if not, then this is absolutely vital information before you use Pika!
Cheers!
The Pika documentation is quite clear about the differences between the connection types. The main difference is that the pika.adapters.blocking_connection.BlockingConnection() adapter is used for non-asynchronous programming and that the pika.adapters.select_connection.SelectConnection() adapter is used for asynchronous programming.
If you don't know what the difference is between non-asynchronous/synchronous and asynchronous programming I suggest that you read this question or for the more deeper technical explanation this article.
Now let's dive into the different Pika adapters and see what they do, for the example purpose I imagine that we use Pika for setting up a client connection with RabbitMQ as AMQP message broker.
BlockingConnection()
In the following example, a connection is made to RabbitMQ listening to port 5672 on localhost using the username guest and password guest and virtual host '/'. Once connected, a channel is opened and a message is published to the test_exchange exchange using the test_routing_key routing key. The BasicProperties value passed in sets the message to delivery mode 1 (non-persisted) with a content-type of text/plain. Once the message is published, the connection is closed:
import pika
parameters = pika.URLParameters('amqp://guest:guest#localhost:5672/%2F')
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
channel.basic_publish('test_exchange',
'test_routing_key',
'message body value',
pika.BasicProperties(content_type='text/plain',
delivery_mode=1))
connection.close()
SelectConnection()
In contrast, using this connection adapter is more complicated and less pythonic, but when used with other asynchronous services it can have tremendous performance improvements. In the following code example, all of the same parameters and values are used as were used in the previous example:
import pika
# Step #3
def on_open(connection):
connection.channel(on_open_callback=on_channel_open)
# Step #4
def on_channel_open(channel):
channel.basic_publish('test_exchange',
'test_routing_key',
'message body value',
pika.BasicProperties(content_type='text/plain',
delivery_mode=1))
connection.close()
# Step #1: Connect to RabbitMQ
parameters = pika.URLParameters('amqp://guest:guest#localhost:5672/%2F')
connection = pika.SelectConnection(parameters=parameters,
on_open_callback=on_open)
try:
# Step #2 - Block on the IOLoop
connection.ioloop.start()
# Catch a Keyboard Interrupt to make sure that the connection is closed cleanly
except KeyboardInterrupt:
# Gracefully close the connection
connection.close()
# Start the IOLoop again so Pika can communicate, it will stop on its own when the connection is closed
connection.ioloop.start()
Conclusion
For those doing simple, non-asynchronous/synchronous programming, the BlockingConnection() adapter proves to be the easiest way to get up and running with Pika to publish messages. But if you are looking for a way to implement asynchronous message handling, the SelectConnection() handler is your better choice.
Happy coding!
Related
I am using rabbitmq to facilitate some tasks from my rabbit server to my respective consumers. I have noticed that when I run some rather lengthy tests, 20+ minutes, my consumer will lose contact with the producer after it completes it's task. In my rabbit logs, I have seen the error
closing AMQP connection <0.14009.27> (192.168.101.2:64855 ->
192.168.101.3:5672):
missed heartbeats from client, timeout: 60s
Also, I receive this error from pika
pika.exceptions.ConnectionClosed: (-1, "error(10054, 'An existing connection was forcibly closed by the remote host')")
I'm assuming this is due to this code right here and the conflict of heartbeats with the lengthy blocking connection time.
self.connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.101.2', 5672, 'user', credentials))
self.channel = self.connection.channel()
self.channel.queue_declare(queue=self.tool,
arguments={'x-message-ttl': 1000,
"x-dead-letter-exchange": "dlx",
"x-dead-letter-routing-key": "dl",
'durable': True})
Is there a proper way to increase the heartbeat time or how would I turn it off(would it be wise to) completely? Like I said, tests that are 20+ min seem to lead to a closedconnection error but I've ran plenty of tests from the 1-15 minute mark where everything is fine and the consumer client continues to wait for a message to be delivered.
Please don't disable heartbeats. Instead, use Pika correctly. This means:
Use Pika version 0.12.0
Do your long-running task in a separate thread
When your task completes, use the add_callback_threadsafe method to schedule the basic_ack call.
Example code can be found here: link
I'm a RabbitMQ core team member and Pika maintainer so if you have further questions or issues, I recommend following up on either the pika-python or rabbitmq-users mailing list. Thanks!
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
You can set the minimum heartbeat interval when creating the connection.
You can see an example in the pika documentation.
I'd recommend against disabling the heartbeat as it might lead to hanging connections piling up on the broker. We experienced such issue in production.
Always make sure the connections have a minimum reasonable heartbeat. If the heartbeat interval needs to be long (hours for example), make sure you close the connection when the application crashes or exits. In this way you won't leave the connection open on the broker side.
As #Luke mentioned, heartbeats are useful but if you still want to disable them, just set heartbeat parameter to zero when creating a connection. So,
For URL parameters: connection = pika.BlockingConnection(pika.URLParameters("amqp://user:pass#127.0.0.1?heartbeat=0"))
For Connection parameters: connection = pika.BlockingConnection(pika.ConnectionParameters(heartbeat=0))
Let's say I want to implement an echo server and client using ZeroMQ (pyzmq) and asyncio (for its event loop, coroutines, etc.).
Now I want to add more reliability by adding a heartbeat. As I don't want to interact too much with my wonderful echo protocol, this heartbeat is done by both client and server on a dedicated pair of sockets.
From what I understand, the way to go™ is to create a new zmq socket in the server class, register it to the existing Poller and let the server class handle everything, from timeout calculation to sending beats. That works, of course.
But this is more complicated than it should be (that's a personal view). From the server point of view, 'heartbeats' are implementation details. What heartbeats are there for is to answer a simple question: "is the client still there?". More technically, I would like to setup and Heartbeat object that takes a timeout and an address. That Heartbeat object would do all the socket setup, beat-related socket polling, send actual beats and receive them.
From the server point of view, I would just use client.is_alive() when required. But that would require two socket pollers to work in parallel. I can achieve that with an executor, but that does not seem right. How would you do that?
This is really a programming design question more than a specific language or library question. I'm tinkering with the idea of a standalone chat server for websockets that will accept several remote browser-based javascript clients. I'm going for something super simple at first, then might build it up. The server just keeps accepting client connections and listens for messages. When a message is received, it will be sent back to all the clients.
What I need to better understand is which approach is best for sending the messages out to all clients, specifically, sending immediately to all clients, or queuing the messages to each client's queue to be sent when a client connection handler's turn comes up. Below are the two examples in a python-like pseudo-code:
Broadcast Method
def client_handler(client):
while true:
if(client.pending_msg):
rmsg = client.recv()
for c in clients:
c.send(rmsg)
client.sleep(1)
Queue Method
def client_handler(client):
while true:
if client.pending_msg:
rmsg = client.recv()
for c in clients:
c.queue_msg(rmsg)
if client.has_queued:
client.send_queue
client.sleep(1)
What is the best approach? Or, perhaps they are good for different use-cases, in which case, what are the pros, cons and circumstances for which they should be used. Thanks!
First of all, it seems odd to me that a single client handler would know about all the other existing clients. This should be the first thing you should abstract away and create a central message processing handler instead which the individual clients talk to.
That handler can then either send the message directly to the clients (like in your broadcast example), or add them to queues of the clients (like your queue example). Which would be the preferred version depends a bit on your network protocol.
Since you said that you will be using websockets, you have a persistent network connection to the clients anyway, so you can just send them out immediately. There is no real gain to queue (and buffer) the messages. Ideally, a client would just have a send() method anyway, and the client would then internally decide whether that means appending it to a queue or sending it immediately over the network.
Furthermore, since websockets are kind of asynchronous in their nature, you don’t need busy wait loops anyway. You can just listen for messages from the client directly, process those, and broadcast them using your central handler. And since you then don’t have a wait loop anymore, there also would be no place where you work off your queue anymore, making the immediate broadcast the more natural decision.
I'm currently writing a project in Python which has a client and a server part. I have troubles with the network communication, so I need to explain some things...
The client mainly does operations the server tells him to and sends the results of the operations back to the server. I need a way to communicate bidirectional on a TCP socket.
Current Situation
I currently use a LineReceiver of the Twisted framework on the server side, and a plain Python socket (and ssl) on client side (because I was unable to correctly implement a Twisted PushProducer). There is a Queue on the client side which gets filled with data which should be sent to the server; a subprocess continuously pulls data from the queue and sends it to the server (see code below).
This scenario works well, if only the client pushes its results to the manager. There is no possibility the server can send data to the client. More accurate, there is no way for the client to receive data the server has sent.
The Problem
I need a way to send commands from the server to the client.
I thought about listening for incoming data in the client loop I use to send data from the queue:
def run(self):
while True:
data = self.queue.get()
logger.debug("Sending: %s", repr(data))
data = cPickle.dumps(data)
self.socket.write(data + "\r\n")
# Here would be a good place to listen on the socket
But there are several problems with this solution:
the SSLSocket.read() method is a blocking one
if there is no data in the queue, the client will never receive any data
Yes, I could use Queue.get_nowait() instead of Queue.get(), but all in all it's not a good solution, I think.
The Question
Is there a good way to achieve this requirements with Twisted? I really do not have that much skills on Twisted to find my way round in there. I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event.
Is Twisted (or more general any event driven framework) able to solve this problem? I don't even have real event on the communication side. If the server decides to send data, it should be able to send it; there should not be a need to wait for any event on the communication side, as possible.
"I don't even know if using the LineReceiver is a good idea for this kind of problem, because it cannot send any data, if it does not receive data from the client. There is only a lineReceived event."
You can send data using protocol.transport.write from anywhere, not just in lineReceived.
"I need a way to send commands from the server to the client."
Don't do this. It inverts the usual meaning of "client" and "server". Clients take the active role and send stuff or request stuff from the server.
Is Twisted (or more general any event driven framework) able to solve this problem?
It shouldn't. You're inverting the role of client and server.
If the server decides to send data, it should be able to send it;
False, actually.
The server is constrained to wait for clients to request data. That's generally the accepted meaning of "client" and "server".
"One to send commands to the client and one to transmit the results to the server. Does this solution sound more like a standard client-server communication for you?"
No.
If a client sent messages to a server and received responses from the server, it would meet more usual definitions.
Sometimes, this sort of thing is described as having "Agents" which are -- each -- a kind of server and a "Controller" which is a single client of all these servers.
The controller dispatches work to the agents. The agents are servers -- they listen on a port, accept work from the controller, and do work. Each Agent must do two concurrent things (usually via the select API):
Monitor a well-known socket on which it will receive work from the one-and-only client.
Do the work (in the background).
This is what Client-Server usually means.
If each Agent is a Server, you'll find lots of libraries will support this. This is the way everyone does it.
I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?
You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.
Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.
A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.