Python ZeroMQ broadcasting messages

Python ZeroMQ broadcasting messages - python

I am going to implement a Practical Byzantine Fault Tolerance ( PBFT ).
Hence, I am going to have multiple processes, P0 is going to initialize a round, by sending a first message.
Is it possible to broadcast a message to all other processes using ZeroMQ?
With PUB/SUB, I need to bind/connect sockets. But I am going to take the number of processes as arguments, it seems impractical to connect all other ports ( I do not know if this is possible ?! ). I could not write any code since I am stuck in the beginning.
Basically, if I find the way to connect processes I will do this:
The proposer selects a random message m and sends it to all validators.
Upon reception each validator sends the message to other validators and the proposer.
If a validator (or proposer) receives at least 2k messages from the other
processes that are identical to its own it proceeds to the next
round of the consensus algorithm.
One more addition: Processes are going to communicate with each other directly. But connecting to all other processes sockets with REQ/REP is not clever, though.

Is it possible to broadcast a message to all other processes using ZeroMQ?
Oh sure, it is .
This is exactly why all messaging and signalling tools, like ZeroMQ, nanomsg and others, were developed for.
The beauty and the trick of PBFT is, that there ought be zero-singular point of defection, ought be there?
So any other approach, but the circular-message actually being sent and delivered one-after-another, will not help the PBFT, will it?
Feel free to sketch the solution, the port-mapping will not be your main issue in this. ZeroMQ can .bind()/.connect() in a very flexible manner. One may even create an ad-hoc, non-persistent, connectivity setup in a similarly circular-manner, if ports are indeed a scarce resource, so get a bit more courage and go get it done :o)

Related

Python multi-processing one worker dynimc number of recievers of all worker data (1:n)

I am planing to setup a small proxy service for a remote sensor, that only accepts one connection. I have a temporary solution and I am now designing a more robust version, and therefore dived deeper into the python multiprocessing module.
I have written a couple of systems in python using a main process, which spawns subprocesses using the multiprocessing module and used multiprocessing.Queue to communicate between them. This works quite well and some of theses programs/scripts are doing their job in a production environment.
The new case is slightly different since it uses 2+n processes:
One data-collector, that reads data from the sensor (at 100Hz) and every once in a while receives short ASCII strings as command
One main-server, that binds to a socket and listens, for new connections and spawns...
n child-servers, that handle clients who want to have the sensor data
while communication from the child servers to the data collector seems pretty straight forward using a multiprocessing.Queue which manages a n:1 connection well enough, I have problems with the other way. I can't use a queue for that as well, because all child-servers need to get all the data the sensor produces, while they are active. At least I haven't found a way to configure a Queue to mimic that behaviour, as get takes the top most out of the Queue by design.
I looked into shared memory already, which massively increases the management overhead, since as far as I understand it while using it, I would basically need to implement a streaming buffer myself.
The only safe way I see right now, is using a redis server and messages queues, but I am a bit hesitant, since that would need more infrastructure than I like.
Is there a pure python internal way?

maybe You can use MQTT for that ?
You did not clearly specify, but sounds like observer pattern -
or do You want the clients to poll each time they need data ?
It depends which delays / data rate / jitter etc. You can accept.
after You provided the information :
The whole setup runs on one machine in one process space. What I would like to have, is a way without going through a third party process
I would suggest to check for observer pattern.
More informations can be found for example:
https://www.youtube.com/watch?v=_BpmfnqjgzQ&t=1882s
and
https://refactoring.guru/design-patterns/observer/python/example
and
https://www.protechtraining.com/blog/post/tutorial-the-observer-pattern-in-python-879
and
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Observer.html
Your Server should fork for each new connection and register with the observer, and will be therefore informed about every change.

Concurrent server in Golang

First of all I have to admit that I am a beginner concerning concurrency in general, but reading a lot about it recently. Because I heard that Golang is strong on that area. I wanted to ask how (concurrent) servers are written in this language.
I mean, there are different ways in how to write a server that can handle multiple requests/connections concurrently. You can use threads, asynchronous programming (async/asyncio in Python for example), and in Golang there are goroutines, which is more or less a lightweight thread.
However, when using Python and async/asyncio you can have one single process and one thread and it's able to handle concurrency. However, the code is complicated (at least for me without any background).
My question:
What is the way to go to write a concurrent server in Golang? Just a new goroutine for every connection or are there any asynchronous ways? What's the "best practice"?
I mean is it not expensive to have LOTS of goroutines on a highly used server? How to do a well-written server in Golang?

For beginner the best way to start is just use https://golang.org/pkg/net/http/ and just write http handlers. You don't need to initialize Go routines - the http.Server will do it for you.
The code will be straight forward with blocking calls. You don't need to think about concurrency at this stage as Go will do it for you. For example when you do a call like
record, err := someDb.GetRecordByID(123)
actually it's an asynchronous call that blocks current flow but release thread to other Go routines. It will continue flow once data returned and a thread (may be different from previous) becomes available.
If you will need to do concurrent calls within 1 HTTP request you can start Go routines. But leave it for later stage and do the Go lang tour on concurrency first.
If you really need a high load solution for HTTP requests consider using https://github.com/valyala/fasthttp instead of standard http package.

For HTTP #icza's comments & Alexander's answer give a fair idea. Just to add Goroutines are not expensive because they are lighter than normal threads. They can have variable sized stack (probably start as low as 2k) & hence can scale up very well with less operating overhead.
Also on http, there are third party libraries like Gorilla mux which can make life better as also other frameworks like Buffalo which you can explore. While I haven't used the latter, I have heard it makes life easier.
Now if you are going to be writing your own custom server (something different from http) then again Go is a great choice for it. The program can start as simple as https://golang.org/pkg/net/#example_Listener (To try running this program, you can use netcat like this from another terminal)
$ nc localhost 2000
Hellow
Hellow
And finally channels in Go make sharing data & communication much easier and safer across routines taking care of the synchronization aspects. Hope this helps.

My question: What is the way to go to write a concurrent server in
Golang? Just a new goroutine for every connection or are there any
asynchronous ways? What's "best practice"?
Golang http package will do requests concurrency handling for you and I really like that code looks like synchronous and you don't need to add any async/await keywords. Here is how you start
func helloHandler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello")
}
http.HandleFunc("/hello", helloHandler)
log.Fatal(http.ListenAndServe(":8080", nil))

(py)zmq/PUB : Is it possible to call connect() then send() immediately and do not lose the message?

With this code, I always lose the message :
def publish(frontend_url, message):
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.connect(frontend_url)
socket.send(message)
However, if I introduce a short sleep(), I can get the message :
def publish(frontend_url, message):
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.connect(frontend_url)
time.sleep(0.1) # wait for the connection to be established
socket.send(message)
Is there a way to ensure the message will be delivered without sleeping between the calls to connect() and send() ?
I'm afraid I can't predict the sleep duration (network latencies, etc.)
UPDATE:
Context : I'd like to publish data updates from a Flask REST application to a message broker (eg. on resource creation/update/deletion).
Currently, the message broker is drafted using the 0mq FORWARDER device
I understand 0mq is designed to abstract the TCP sockets and message passing complexities.
In a context where connections are long-lived, I could use it.
However, when running my Flask app in an app container like gunicorn or uwsgi, I have N worker processes and I can't expect the connection nor the process to be long-lived.
As I understand the issue, I should use a real message broker (like RabbitMQ) and use a synchronous client to publish the messages there.

You can't do this exactly, but there may be other solutions that would solve your problem.
Why are you using PUB/SUB sockets? The nature of pub/sub is more suited to long-running sockets, and typically you will bind() on the PUB socket and connect on the SUB socket. What you're doing here, spinning up a socket to send one message, presumably to a "server" of some sort, doesn't really fit the PUB/SUB paradigm very well.
If you instead choose some variation of REQ or DEALER to REP or ROUTER, then things might go smoother for you. A REQ socket will hold a message until its pair is ready to receive it. If you don't particularly care about the response from the "server", then you can just discard it.
Is there any particular reason you aren't just leaving the socket open, instead of building a whole new context and socket, and re-connecting each time you want to send a message? I can think of some limited scenarios where this might be the preferred behavior, but generally it's a better idea to just leave the socket up. If you wanted to stick with PUB/SUB, then just spin the socket up at the start of your app, sleep some safe period of time that covers any reasonable latency scenario, and then start sending your messages without worrying about re-connecting every time. If you'll leave this socket up for long periods of time without any new messages you'll probably want to use heart-beating to make sure the connection stays open.

From the ZMQ Guide:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.

Many posts here start with:
"I used .PUB/.SUB and it did not the job I wanted it to do ... Anyone here, do help me make it work like I think it shall work out of the box."
This approach does not work in real world, the less in distributed systems design, the poorer in systems, where near-real-time scheduling and/or tight resources-management is simply un-avoid-able.
Inter-process / inter-platform messaging is not "just another" simple-line-of-code (SLOC)
# A sample demo-code snippet # Issues the demo-code has left to be resolved
#------------------------------------ #------------------------------------------------
def publish( frontend_url, message ): # what is a benefit of a per-call OneStopPUBLISH function?
context = zmq.Context() # .Context() has to be .Terminate()-d (!)
socket = context.socket(zmq.PUB) # is this indeed "a disposable" for each call?
socket.connect(frontend_url) # what transport-class used for .connect()/.bind()?
time.sleep(0.1) # wait for the connection to be established
socket.send(message) # ^ has no control over low-level "connection" handshaking
Anybody may draft a few one-liners and put a decent effort ( own or community outsourced ) to make it finally work ( at least somehow ).
However this is a field of vast capabilities and as such requires a bit of reshaping one's mind to allow its potential to become unlocked and fully utilised.
Sketching a need for a good solution but with wrong grounds or mis-understood SLOC-s ( be it copy/paste-d or not ) typically does not yield anything reasonable for the near, the less for the farther future.
Messaging simply introduces a new paradigm -- a new Macro-COSMOS -- of building automation in wider scale - surprisingly, your (deterministic) code becomes a member of a more complex set of Finite State Automata ( FSA ), that - not so surprisingly, as we intend to do some "MESSAGING" - speak among each other.
For that, there needs to be some [local-resource-management], some "outer" [transport], some "formal behaviour model etiquette" ( not to shout one over another ) [communication-primitive].
This is typically in-built into ZeroMQ, nanomsg and other libraries.
However, there are two important things that remain hidden.
The micro-cosmos of how the things work internally ( many, if not all, attempts to tweak this, instead of making one's best to make proper use of it, are typically waste of time )
The macro-cosmos of how to orchestrate a non-trivial herd of otherwise trivial elements [communication-primitives] into a ROBUST, SCALEABLE messaging ARCHITECTURE, that co-operates across process/localhost/network boundaries and that meets the overall design needs.
Failure to understand the distance between these two worlds typically causes a poor use of the greatest strengths we have received pre-cooked in the messaging libraries.
Simply the best thing to do is to forget the one-liner tweaking approaches. It is not productive.
Understanding the global view first, allows you to harness the powers that will work best for your good to meet your goals.
Why it is so complex?
( courtesy nanomsg.org )
Any non-trivial system is complex. Both in TimeDOMAIN and in ResourcesDOMAIN. The more, if one strives to create a stable, smart, High-performance, Low-latency, transport-class-agnostic Universal Communication Framework.
The good news is, this has been already elaborated and in-built into the micro-cosmos architecture.
The bad news is, this does not solve your needs right from the box ( except a case of some really trivial ones ).
Here we come with the macro-COSMOS design.
It is your responsibility to design a higher-space algorithm, how to make many isolated FSA-primitives converse and find an agreement in accord with the evolving many-to-many conversation. Yes. The library gives you "just" primitive building blocks (very powerful, out of doubt). But it is your responsibility to make the "outer-space" work for your needs.
And this can and typically is complex.
Well, if that would be trivial, then it most probably would have been already included "inside" the library, wouldn't it?
Where to go next?
Perhaps a best next step one may do is IMHO to make a step towards a bit more global view, which may and will sound complicated for the first few things one tries to code with ZeroMQ, but if you at least jump to the page 265 of Pieter Hintjens' book, Code Connected, Volume 1, if it were not the case of reading step-by-step there.
One can start to realise the way, how it is possible to start "programming" the macro-COSMOS of FSA-primitives, so as to form a higher-order-FSA-of-FSAs, that can and will solve all the ProblemDOMAIN specific issues.
First have an un-exposed view on the Fig.60 Republishing Updates and Fig.62 HA Clone Server pair and try only after that to go back to the roots, elements and details.

Writing a Python data analysis server for a Java interface

I want to write data analysis plugins for a Java interface. This interface is potentially run on different computers. The interface will send commands and the Python program can return large data. The interface is distributed by a Java Webstart system. Both access the main data from a MySQL server.
What are the different ways and advantages to implement the communication? Of course, I've done some research on the internet. While there are many suggestions I still don't know what the differences are and how to decide for one. (I have no knowledge about them)
I've found a suggestion to use sockets, which seems fine. Is it simple to write a server that dedicates a Python analysis process for each connection (temporary data might be kept after one communication request for that particular client)?
I was thinking to learn how to use sockets and pass YAML strings.
Maybe my main question is: What is the relation to and advantage of systems like RabbitMQ, ZeroMQ, CORBA, SOAP, XMLRPC?
There were also suggestions to use pipes or shared memory. But that wouldn't fit to my requirements?
Does any of the methods have advantages for debugging or other pecularities?
I hope someone can help me understand the technology and help me decide on a solution, as it is hard to judge from technical descriptions.
(I do not consider solutions like Jython, JEPP, ...)

Offering an opinion on the merits you described, it sounds like you are dealing with potentially large data/queries that may take a lot of time to fetch and serialize, in which case you definitely want to go with something that can handle concurrent connections without stacking up threads. Thereby, in the Python domain, I can't recommend any networking library other than Twisted.
http://twistedmatrix.com/documents/current/core/examples/
Whether you decide to use vanilla HTTP or your own protocol, twisted is pretty much the one stop shop for concurrent networking. Sure, the name gets thrown around alot, and the documentation is Atlantean, but if you take the time to learn it there is very little in the networking domain you cannot accomplish. You can extend the base protocols and factories to make one server that can handle your data in a reactor-based event loop and respond to deferred request when ready.
The serialization format really depends on the nature of the data. Will there be any binary in what is output as a response? Complex types? That rules out JSON if so, though that is becoming the most common serialization format. YAML sometimes seems to enjoy a position of privilege among the python community - I haven't used it extensively as most of the kind of work I've done with serials was data to be rendered in a frontend with javascript.
Message queues are really the most important tool in the toolbox when you need to defer background tasks without hanging response. They are commonly employed in web apps where the HTTP request should not hang until whatever complex processing needs to take place completes, so the UI can render early and count on an implicit "promise" the processing will take place. They have two important traits: they rely on eventual consistency, in that the process can finish long after the response in the protocol is sent, and they also have fail-safe and try-again directives should a task fail. They are where you turn in the "do this really hard task as soon as you can and I trust you to get it done" problem domain.
If we are not talking about potentially HUGE response bodies, and relatively simple data types within the serialized output, there is nothing wrong with rolling a simple HTTP deferred server in Twisted.

Multiple consumers & producers connected to a message queue, Is that possible in AMQP?

I'd like to create a farm of processes that are able to OCR text.
I've thought about using a single queue of messages which is read by multiple OCR processes.
I would like to ensure that:
each message in queue is eventually processed
the work is more or less equally distributed
an image will be parsed only by one OCR process
An OCR process won't get multiple messages at once (so that any other free OCR process can handle the message).
Is that possible to do using AMQP?
I'm planning to use python and rabbitmq

Yes, as #nailxx points out. The AMQP programming model is slightly different from JMS in that you only have queues, which can be shared between workers, or used privately by a single worker. You can also easily set up RabbitMQ to do PubSub use cases or what in JMS are called topics. Please go to our Getting Started page on the RabbitMQ web site to find a ton of helpful info about this.
Now, for your use case in particular, there are already plenty of tools available. One that people are using a lot, and that is well supported, is Celery. Here is a blog post about it, that I think will help you get started:
If you have any questions please email us or post to the rabbitmq-discuss mailing list.

Yes, that's possible. Server cluster for a real-time MMO game I'm working on operate this way. We use ActiveMQ, but I think all this possible with RabbitMQ as well.
All items that you mentioned you get out of the box, except last one.
each message in queue is eventually processed - this is one of main responsibilities of message brokers
the work is more or less equally distributed - this is another one :)
an image will be parsed only by one OCR process - the distinction of /topic and /queue exists for this. Topics are like broadcast signals, queues are tasks. You need a /queue in your scenario
To make last one work in desired way, consumers send AMQ-specific argument when subscribing to the queue:
activemq.prefetchSize: 1
This setting guarantees that consumer will not take any more messages after it took one and until it send an ack to AMQ. I believe something similar exists in RabbitMQ.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.