Cleanest method of inter-program communication on separate devices - python

I need to settle on a method of communication between multiple programs spread out over multiple machines. The data itself is fairly straight forwards, comprising of variable length vectors with some meta-data descriptors.
Each program "type" will send data to 1 other program type, and expects to see a reply from it.
The number of connections a given program has is variable over time, and programs can be added or removed at any moment.
Programs may be distributed across several processors, which may use different operating systems, or could be microprocessors.
I have not had to really contend with such a inter-communication puzzle before and am unsure of what the cleanest approach would be. I've heard that ROS might be useful, but that it may not be suitable for Windows environments and has a steep learning curve. I can imagine creating a database between the nodes which would act as a kind of blackboard, but this feels like it may be very inefficient. If using sockets would resolve efficiencies, then how would the connections be managed / maintained?
My main development environment at the moment is Julia, but I can port the data over to C++ or Python without too much discomfort.
I am open to ideas and learning new things, but I can't figure out which path to take. Any nuggets of wisdom would be greatly appreciated.
As requested in a comment, here is some additional information:
Data: The data is a vector of floating point numbers, which can vary in size anywhere from 1 value to 1000's of values (maybe more depending on other aspects of the project, but I'm capping it for prototyping purposes). The meta-data is in json format and provides information such as the total length of the array, and a few identifiers (all integers).
Reliability: I can handle a few missed messages here and there. I'd need the data in the order it was sent though. So if a packet of data arrives after a more recent one it will be ignored. (In case you were instead referring to data integrity, then the data must not be corrupted in transit).
Latency: As little as possible as the program could start to introduce errors otherwise. That being said, for prototyping purposes I can persevere with 10's of milliseconds average, and irregular 100's of milliseconds. I would very much like to keep this down though.
Standard IP Facilities: You may assume all devices have these, yes.

Perhaps taking a look at something well established (and therefore should be able to hook up to whichever language/s you end up using) like Redis or RabbitMQ would be wise seeing as you'll need to be able to communicate with a number of different devices over a network. Either way having some form of central hub that handles communication among the other devices would be advisable.

Related

Data communication between local apps 2 in c# one in python where speed is the priority

To keep things simple assume I am trying to build essentially three apps one in python the other two in C#.
C# App A will be collecting data from API's processing it and distributing it to a python App and C# App B.
The operating system is Win10 for clarity.
The data will not be huge, probably in the ranges of 1-20kb for most of the time, but updated every 25ms or so in the form of complex Json arrays.
The problem I have is speed. 1ms is a long time to transfer data between the apps as far as this project is concerned.
I have tried named pipes and this seems at first glance to be slower than I thought it would be sending a single message seems to take about 9ms though this could be due to me not implementing it properly. (Using NamedPipeServerStream with one-way direction)
I then tried non-persistent memory map files which seems to do the job where it returns the data in about 1ms.
The only issue is I need to somehow tell the other apps when there is data to be read and possibly how long the data is to read, I have not done enough research on this latter problem yet.
Am I missing a trick here should pipes be faster than this?
Is memory mapping the way to go? if so is there a way to notify of a change in data without checking every 1ms and comparing if the data is the same?
Or am I missing a better way of doing this that I have not found?
(since I am new to coding probably the latter)
I don't want to be spending ages trying new things, and going around in circles to only find out later there is a much better way after all.

twisted python request/response message and substantial binary data transfer

I am trying to implement a server using python-twisted with potential C# and ObjC clients. I started with LineReceiver and that works well for basic messaging, but I can't figure out the best approach for something more robust. Any ideas for a simple solution for the following requirements?
Request and response
ex. send message to get a status, receive status back
Recieve binary data transfer (non-trivial, but not massive - less than a few megs)
ex. bytes of a small png file
AMP seems like a feasible solution for the first scenario, but may not be able to handle the size for the data transfer scenario.
I've also looked at full blown SOAP but haven't found a decent enough example to get me going.
I like AMP a lot. twisted.protocols.amp is moderately featureful and relatively easily testable (although documentation on how to test applications written with it is a little lacking).
The command/response abstraction AMP provides is comfortable and familiar (after all, we live in a world where HTTP won). AMP avoids the trap of excessive complexity (seemingly for the sake of complexity) that SOAP fell squarely into. But it's not so simple you won't be able to do the job with it (like LineReceiver most likely is).
There are intermediate steps - for example, twisted.protocols.basic.Int32Receiver gives you a more sophisticated framing mechanism (32 bit length prefixes instead of magic-bytes-terminated-lines) - but in my opinion AMP is a really good first choice for a protocol. You may find you want to switch to something else later (one size really does not fit all) but AMP is at the sweet spot between features and simplicity that seems like a good fit for a very broad range of applications.
It's true that there are some built-in length limits in AMP. This is a long standing sore spot that is just waiting for someone with a real-world need to address it. :) There is a fairly well thought-out design for lifting this limit (without breaking protocol compatibility!). If AMP seems otherwise appealing to you then I encourage you to engage the Twisted development community to find out how you can help make this a reality. ;)
There's also always the option of using AMP for messaging and to set up another channel (eg, HTTP) for transferring your larger chunks of data.

Serverless communication between network nodes in python

I am not sure if this question belongs here as it may be a little to broad. If so, I apologize. Anyway, I am planning to start a project in python and I am trying to figure out how best to implement it, or if it is even possible in any practical way. The system will consist of several "nodes" that are essentially python scripts that translate other protocols for talking to different kinds of hardware related to i/o, relays to control stuff, inputs to measure things, rfid-readers etc, to a common protocol for my system. I am no programming or network expert, but this part I can handle, I have a module from an old alarm system that uses rs-485 that I can sucessfully control and read. I want to get the nodes talking to eachother over the network so I can distribute them to different locations (on the same subnet for now). The obvious way would be to use a server that they all connect to so they can be polled and get orders to flip outputs or do something else. This should not be too hard using twisted or something like it.
The problem with this is that if this server for some reason stops working, everything else does too. I guess what I would like is some kind of serverless communication, that has no single point of failure besides the network itself. Message brokers all seem to require some kind of server, and I can not really find anything else that seems suitable for this. All nodes must know the status of all other nodes as I will need to be able to make functions based on the status of things connected to other nodes, such as, do not open this door if that door is already open. Maybe this could be done by multicast or broadcast, but that seems a bit insecure and just not right. One way I thought of could be to somehow appoint one of the nodes to accept connections from the other nodes and act as a message router and arrange for some kind of backup so that if this node crashes or goes away, another predetermined node takes over and the other nodes connect to it instead. This seems complicated and I am not sure this is any better than just using a message broker.
As I said, I am not sure this is an appropriate question here but if anyone could give me a hint to how this could be done or if there is something that does something similar to this that I can study. If I am beeing stupid, please let me know that too :)
There are messaging systems that don't require a central message broker. You might start by looking at ZeroMQ.
I ended up making my own serverless messaging system in python. It is not ready and the code is a mess, but it works. It has autodiscovery of nodes, sharing of channels and topics, and the features that I needed. If anyone is interested in using it or wants to help, it is here:
https://bitbucket.org/ssspeq/connection-manager

Dataflow computing in python

I have n (typically n < 10 but it should scale) processes running on different machines and communicating through amqp using RabbitMQ. Processes are typically long running and may be implemented in any language (though most are java/python).
Each process requires a number of inputs (numbers/strings) and produces a number of outputs (also just numbers or strings). Executing a process happens asynchronously: sending a message on its input queue and waiting for a callback to be triggered by the output queue.
Ideally the user specifies some inputs and desired outputs and the system should:
detect which processes are needed and generate the dependency graph
topologically sort the graph and execute it, node transitions will need to be event driven
A node should fire if its input is ready, allowing parallelism per branch. I can assume no cycles for now, but eventually there will be cycles (e.g., two processes may need to iterate until the output no longer changes).
This should be a known problem from (data)flow programming (discussed here before) and I want to avoid re-inventing the wheel. I would prefer a python solution and a search leads to Trellis and Pypes. Trellis is no longer developed but seems to support cycles, while pypes does not. Also not sure how actively developed pypes is.
Further searches reveal a whole list of event based programming frameworks, none of which I am particularly knowledgeable about. There are of course workflow environments like Taverna and KNIME, but that seems overkill.
Does anybody have any experience tackling this type of problem or with the libraries mentioned?
Edit: Other libraries I found are:
Stream
zflow
pyf
javafbp (Java)
python.org has a Wiki page on "Flow Based Programming" -- http://wiki.python.org/moin/FlowBasedProgramming
The bottom line is that if you can reinvent the wheel in a small number of lines of code ( a few hundred) which you completely understand and can document, then do it.
This is an area where the abstractions used are not that hard to implement given some basic foundation tools. RabbitMQ is such a tool. Node.js is another. There are lots of libraries around that implement useful ways to manages dataflows, workflows, finite state machines, etc., but they have a lot of overlap and they tend to be incomplete. Probably the original developer just built enough to get over his initial problem, and since this type of programming was not that popular, there was not the critical mass to keep development going.
There is a lot to be said for ranking all the possible solutions by popularity, picking the most popular one, and putting your effort into making it work (while sharing your work, of course).

C/Python Socket Performance?

my question simply relates to the difference in performance between a socket in C and in Python. Since my Python build is CPython, I assume it's similar, but I'm curious if someone actually has "real" benchmarks, or at least an opinion that's evidence based.
My logics is as such:
C socket much faster? then write a C
extension.
not/barely a difference?
keep writing in Python and figure out
how to obtain packet level control
(scapy? dpkt?)
I'm sure someone will want to know for either context or curiosity. I plan to build a sort of proxy for myself (not for internet browsing, anonymity, etc) and will bind the application I want to use with it to a specific port. Then, all packets on said port will be queued, address header modified, and then sent, etc, etc.
Thanks in advance.
In general, sockets in Python perform just fine. For example, the reference implementation of the BitTorrent tracker server is written in Python.
When doing networking operations, the speed of the network is usually the limiting factor. That is, any possible tiny difference in speed between C and Python's socket code is completely overshadowed by the fact that you're doing networking of some kind.
However, your description of what you want to do indicates that you want to inspect and modify individual IP packets. This is beyond the capabilities of Python's standard networking libraries, and is in any case a very OS-dependent operation. Rather than asking "which is faster?" you will need to first ask "is this possible?"
i would think C would be faster, but python would be a lot easier to manage and use.
the difference would be so small, you wouldn't need it unless you were trying to send masses amount of data (something stupid like 1 million gb/second lol)
joe

Categories

Resources