Latched topic in ZeroMQ

Latched topic in ZeroMQ - python

Is it possible to have a "latched" topic in ZeroMQ, such that the last message sent to the topic is repeated to newly joined subscribers?
At the moment I have to create a REQ-REP-socket pair in addition to the PUB-SUB pair, so that when the new SUB joins, it asks for that last message using the REQ-socket. But this additional work, which is all boilerplate, is highly undesirable.
ROS has the "latched" option and it is described as:
When a connection is latched, the last message published is saved and
automatically sent to any future subscribers that connect. This is
useful for slow-changing to static data like a map. Note that if there
are multiple publishers on the same topic, instantiated in the same
node, then only the last published message from that node will be
sent, as opposed to the last published message from each publisher on
that single topic.

Well, your idea is doable in ZeroMQ:
Given a few bits from history, where due to a distributed-computing performance and memory capacity reasons and low costs of traffic, the topic-filter was initially implemented on the SUB-side(s), whereas later versions started to operate this feature on the PUB-side.
So, you application will never know in advance, which clients will use which version of the ZeroMQ and the problem is principally un-decidable.
Having this said,
your application user-code, on the PUB-side, can solve this, sending 2-in-1 formatted messages, and your SUB-side can be made aware of this soft-logic embedded into the message-stream.
Simply implement the "latched" logic in your user-code, be it via a naive re-send of each message per topic-line or some other means.
Yes, the very user-code is the only one, who can handle this,
not the PUB/SUB Scalable Formal Communication Pattern Archetype -- for two reasons -- it is not any general, universally applicable behaviour, but rather a user-specific speciality -- plus -- the topic-filter ( be it PUB-side or SUB-side operated ) has no prior knowledge about lexical-branching ( subscriptions are lexically interpreted from the left to the right and no one can a-priori say, what will a next subscriber actually subscribe to, and thus a "latched"-last-message store will not be able to get pre-populated until a new "next" subscriber actually joins and sets its actual topic-filter subscription ( storing all deterministic, combinatorics driven, possible {sub-|super-}-topic options is a very bad idea to circumvent the principal undecidability, isn't it? ) )

Related

gRPC response streaming with repeated field in Python

I am currently designing an API that is supposed to handle relatively small messages but many data entries. There are operations to add, delete and list all items stored in a database.
Now to my question: I want to return all entries (up to 5 Million) in a short amount of time. I figured response streaming would be the way to go.
Does it make sense to stream messages with a repeated field to be able to return multiple entries in one message. So far i haven't seen any indication whether that is faster or not.
Example:
rpc ListDataSet (ListDataSetRequest) returns (stream ListDataSetResponse);
message ListDataSetResponse {
string transaction_id = 1;
repeated Entries entries = 2;
}
And in the server i would append a certain amount of entries to each message and yield the messages while looping over the list of entries to use a generator.
Any recommendations or tips would be appreciated

Yes, it makes sense to stream messages containing repeated fields.
From a performance perspective, you may want to consider benchmarking your alternatives to prove this to yourself.
gRPC lacks comprehensive best practices but one reads that smaller messages are better and 4MiB is often given as a good, notional upper bound.
One other thing to consider is that it's not just the performance of your servers but also of your clients that you need to consider.
A more common pattern (?) is to page large results and give control to the client to ask for next|other pages. This may be worth evaluating too.
For exceptionally "huge" (unspecified) results, you'd likely be better placed returning a reference in your gRPC message to an out-of-band (e.g. object storage) object.

Python ZeroMQ broadcasting messages

I am going to implement a Practical Byzantine Fault Tolerance ( PBFT ).
Hence, I am going to have multiple processes, P0 is going to initialize a round, by sending a first message.
Is it possible to broadcast a message to all other processes using ZeroMQ?
With PUB/SUB, I need to bind/connect sockets. But I am going to take the number of processes as arguments, it seems impractical to connect all other ports ( I do not know if this is possible ?! ). I could not write any code since I am stuck in the beginning.
Basically, if I find the way to connect processes I will do this:
The proposer selects a random message m and sends it to all validators.
Upon reception each validator sends the message to other validators and the proposer.
If a validator (or proposer) receives at least 2k messages from the other
processes that are identical to its own it proceeds to the next
round of the consensus algorithm.
One more addition: Processes are going to communicate with each other directly. But connecting to all other processes sockets with REQ/REP is not clever, though.

Is it possible to broadcast a message to all other processes using ZeroMQ?
Oh sure, it is .
This is exactly why all messaging and signalling tools, like ZeroMQ, nanomsg and others, were developed for.
The beauty and the trick of PBFT is, that there ought be zero-singular point of defection, ought be there?
So any other approach, but the circular-message actually being sent and delivered one-after-another, will not help the PBFT, will it?
Feel free to sketch the solution, the port-mapping will not be your main issue in this. ZeroMQ can .bind()/.connect() in a very flexible manner. One may even create an ad-hoc, non-persistent, connectivity setup in a similarly circular-manner, if ports are indeed a scarce resource, so get a bit more courage and go get it done :o)

(py)zmq/PUB : Is it possible to call connect() then send() immediately and do not lose the message?

With this code, I always lose the message :
def publish(frontend_url, message):
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.connect(frontend_url)
socket.send(message)
However, if I introduce a short sleep(), I can get the message :
def publish(frontend_url, message):
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.connect(frontend_url)
time.sleep(0.1) # wait for the connection to be established
socket.send(message)
Is there a way to ensure the message will be delivered without sleeping between the calls to connect() and send() ?
I'm afraid I can't predict the sleep duration (network latencies, etc.)
UPDATE:
Context : I'd like to publish data updates from a Flask REST application to a message broker (eg. on resource creation/update/deletion).
Currently, the message broker is drafted using the 0mq FORWARDER device
I understand 0mq is designed to abstract the TCP sockets and message passing complexities.
In a context where connections are long-lived, I could use it.
However, when running my Flask app in an app container like gunicorn or uwsgi, I have N worker processes and I can't expect the connection nor the process to be long-lived.
As I understand the issue, I should use a real message broker (like RabbitMQ) and use a synchronous client to publish the messages there.

You can't do this exactly, but there may be other solutions that would solve your problem.
Why are you using PUB/SUB sockets? The nature of pub/sub is more suited to long-running sockets, and typically you will bind() on the PUB socket and connect on the SUB socket. What you're doing here, spinning up a socket to send one message, presumably to a "server" of some sort, doesn't really fit the PUB/SUB paradigm very well.
If you instead choose some variation of REQ or DEALER to REP or ROUTER, then things might go smoother for you. A REQ socket will hold a message until its pair is ready to receive it. If you don't particularly care about the response from the "server", then you can just discard it.
Is there any particular reason you aren't just leaving the socket open, instead of building a whole new context and socket, and re-connecting each time you want to send a message? I can think of some limited scenarios where this might be the preferred behavior, but generally it's a better idea to just leave the socket up. If you wanted to stick with PUB/SUB, then just spin the socket up at the start of your app, sleep some safe period of time that covers any reasonable latency scenario, and then start sending your messages without worrying about re-connecting every time. If you'll leave this socket up for long periods of time without any new messages you'll probably want to use heart-beating to make sure the connection stays open.

From the ZMQ Guide:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.

Many posts here start with:
"I used .PUB/.SUB and it did not the job I wanted it to do ... Anyone here, do help me make it work like I think it shall work out of the box."
This approach does not work in real world, the less in distributed systems design, the poorer in systems, where near-real-time scheduling and/or tight resources-management is simply un-avoid-able.
Inter-process / inter-platform messaging is not "just another" simple-line-of-code (SLOC)
# A sample demo-code snippet # Issues the demo-code has left to be resolved
#------------------------------------ #------------------------------------------------
def publish( frontend_url, message ): # what is a benefit of a per-call OneStopPUBLISH function?
context = zmq.Context() # .Context() has to be .Terminate()-d (!)
socket = context.socket(zmq.PUB) # is this indeed "a disposable" for each call?
socket.connect(frontend_url) # what transport-class used for .connect()/.bind()?
time.sleep(0.1) # wait for the connection to be established
socket.send(message) # ^ has no control over low-level "connection" handshaking
Anybody may draft a few one-liners and put a decent effort ( own or community outsourced ) to make it finally work ( at least somehow ).
However this is a field of vast capabilities and as such requires a bit of reshaping one's mind to allow its potential to become unlocked and fully utilised.
Sketching a need for a good solution but with wrong grounds or mis-understood SLOC-s ( be it copy/paste-d or not ) typically does not yield anything reasonable for the near, the less for the farther future.
Messaging simply introduces a new paradigm -- a new Macro-COSMOS -- of building automation in wider scale - surprisingly, your (deterministic) code becomes a member of a more complex set of Finite State Automata ( FSA ), that - not so surprisingly, as we intend to do some "MESSAGING" - speak among each other.
For that, there needs to be some [local-resource-management], some "outer" [transport], some "formal behaviour model etiquette" ( not to shout one over another ) [communication-primitive].
This is typically in-built into ZeroMQ, nanomsg and other libraries.
However, there are two important things that remain hidden.
The micro-cosmos of how the things work internally ( many, if not all, attempts to tweak this, instead of making one's best to make proper use of it, are typically waste of time )
The macro-cosmos of how to orchestrate a non-trivial herd of otherwise trivial elements [communication-primitives] into a ROBUST, SCALEABLE messaging ARCHITECTURE, that co-operates across process/localhost/network boundaries and that meets the overall design needs.
Failure to understand the distance between these two worlds typically causes a poor use of the greatest strengths we have received pre-cooked in the messaging libraries.
Simply the best thing to do is to forget the one-liner tweaking approaches. It is not productive.
Understanding the global view first, allows you to harness the powers that will work best for your good to meet your goals.
Why it is so complex?
( courtesy nanomsg.org )
Any non-trivial system is complex. Both in TimeDOMAIN and in ResourcesDOMAIN. The more, if one strives to create a stable, smart, High-performance, Low-latency, transport-class-agnostic Universal Communication Framework.
The good news is, this has been already elaborated and in-built into the micro-cosmos architecture.
The bad news is, this does not solve your needs right from the box ( except a case of some really trivial ones ).
Here we come with the macro-COSMOS design.
It is your responsibility to design a higher-space algorithm, how to make many isolated FSA-primitives converse and find an agreement in accord with the evolving many-to-many conversation. Yes. The library gives you "just" primitive building blocks (very powerful, out of doubt). But it is your responsibility to make the "outer-space" work for your needs.
And this can and typically is complex.
Well, if that would be trivial, then it most probably would have been already included "inside" the library, wouldn't it?
Where to go next?
Perhaps a best next step one may do is IMHO to make a step towards a bit more global view, which may and will sound complicated for the first few things one tries to code with ZeroMQ, but if you at least jump to the page 265 of Pieter Hintjens' book, Code Connected, Volume 1, if it were not the case of reading step-by-step there.
One can start to realise the way, how it is possible to start "programming" the macro-COSMOS of FSA-primitives, so as to form a higher-order-FSA-of-FSAs, that can and will solve all the ProblemDOMAIN specific issues.
First have an un-exposed view on the Fig.60 Republishing Updates and Fig.62 HA Clone Server pair and try only after that to go back to the roots, elements and details.

How to know if message was published to a queue using rabbitmq routing features

I've been working on a project which uses rabbitmq to communicate. Recently we discovered that it would be much scalable if we used rabbit routing feature. So basically we bind the queue to several routing keys and use an exchange with type direct.
It's working like publish/subscribe. So it's possible to bind and unbind the queue to different events so consumers/subscribers only receive messages to which they're interested.
Of course, the producer/publisher now uses the binding key (event name) as routing_key to pass it to pika implementation. However, when it publishes something for a binding that doesn't exist the message is lost, i.e. when nobody bound a queue for event foo, but some publisher calls pika.basic_publish(..., routing_key='foo').
So my question is:
Is it possible to know if the message was actually published in a queue?
What I've tried:
Checking the return value of pika.basic_publish. It always returns None.
Check if there's an exception when we try to publish for a binding that doesn't exist. There is none.
Having an additional queue to make out of band control (since all subscribers are run by the same process). This approach doesn't feel ideal to me.
Additional info
Since I'm using this routing feature, the queue names are generated by rabbit. I don't have any problem if the new approach has to name the queue itself.
If a new approach is suggested which requires binding to exchanges instead of queues, I would like to hear them, but I would prefer to avoid them as they're not actually AMQP and are an extension implemented by rabbitmq.
pika version is 0.9.5
rabbitmq version is 2.8
Thanks a lot

I believe the answer to your problem is the Mandatory flag in RabbitMQ:
This flag tells the server how to react if a message cannot be routed to a queue. Specifically, if mandatory is set and after running the bindings the message was placed on zero queues then the message is returned to the sender (with a basic.return). If mandatory had not been set under the same circumstances the server would silently drop the message.
This basically means, enqueue a message and if it can't be routed then return it to me. Take a look at basic_publish in the specification to turn it on.

It may be possible to use a dead letter exchange to store messages that have not been consumed http://www.rabbitmq.com/dlx.html
I am not sure this is exactly what you are looking for but could be used for a solution.

Multiple consumers & producers connected to a message queue, Is that possible in AMQP?

I'd like to create a farm of processes that are able to OCR text.
I've thought about using a single queue of messages which is read by multiple OCR processes.
I would like to ensure that:
each message in queue is eventually processed
the work is more or less equally distributed
an image will be parsed only by one OCR process
An OCR process won't get multiple messages at once (so that any other free OCR process can handle the message).
Is that possible to do using AMQP?
I'm planning to use python and rabbitmq

Yes, as #nailxx points out. The AMQP programming model is slightly different from JMS in that you only have queues, which can be shared between workers, or used privately by a single worker. You can also easily set up RabbitMQ to do PubSub use cases or what in JMS are called topics. Please go to our Getting Started page on the RabbitMQ web site to find a ton of helpful info about this.
Now, for your use case in particular, there are already plenty of tools available. One that people are using a lot, and that is well supported, is Celery. Here is a blog post about it, that I think will help you get started:
If you have any questions please email us or post to the rabbitmq-discuss mailing list.

Yes, that's possible. Server cluster for a real-time MMO game I'm working on operate this way. We use ActiveMQ, but I think all this possible with RabbitMQ as well.
All items that you mentioned you get out of the box, except last one.
each message in queue is eventually processed - this is one of main responsibilities of message brokers
the work is more or less equally distributed - this is another one :)
an image will be parsed only by one OCR process - the distinction of /topic and /queue exists for this. Topics are like broadcast signals, queues are tasks. You need a /queue in your scenario
To make last one work in desired way, consumers send AMQ-specific argument when subscribing to the queue:
activemq.prefetchSize: 1
This setting guarantees that consumer will not take any more messages after it took one and until it send an ack to AMQ. I believe something similar exists in RabbitMQ.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.