Synchronize sensor data over internet upon connection

Synchronize sensor data over internet upon connection - python

I have a subsystem that contains sensor data and posts it to the Internet via TCP or UDP requests to a server with authorization by token. All the posted data is also saved to a local capped MongoDB database.
I want this system to be tolerant to network failures and outrages and to be able to synchronize data when the network is back.
What is the correct way to implement this without re-inventing the wheel?
I see several options:
MongoDB replication.
PROs:
Replication tools exist
CONs:
How to do that in real-time? Having two systems: to post one way when the system is online and the other way when the system is offline seems to be a bad idea.
No idea how to manage access tokens (I don't want to give direct access to the server database)
Server side scheme should match the local one (but can be PRO since then manual import is trivial)
Maintaining 'last ACKed' records and and re-transmitting once a while.
PROs:
Allows for different data scheme locally and on server side
Works
CONs:
Logic is complex (detect failures, monitor for network connectivity, etc)
Exactly opposite from 'reinventing the wheel'
Manual data backfeed is hardly possible (e.g. when the system is completely disconnected for a long time and data is restored from back-ups).
I want a simple and reliable solution (the project is in Python, but I'm also fine with JavaScript/CoffeeScript in a separate process). I prefer a Python module that is tailored for this task (I failed to find one) or a piece of advice of how to organize the system UNIX way.
I believe this is a solved problem and has a known best practices which I ceased to find.
Thank you!

Related

Scheduled message passing from server to clients: what system to use?

I want to be able to schedule delivery of a lightweight message from a server to a client. This is new territory to me so I'd appreciate some advice on the possible approaches available.
The client is running on a Raspberry Pi using node.js (because I'm using node libraries to control a piece of attached hardware). Eventually there will be multiple clients like it.
The server could be anything, though I'm most familiar with Python, django and node.
I want to be able to access the server from a browser and cause it to schedule a future message to the client, effectively a push notification with a tiny bit of data.
I'm looking at pub-sub and messaging systems to do this; I started writing a system that uses node on both ends and sockets, but the approach I want is more fire-and-forget occasional messages, not constant realtime data exchange. I'm also not a huge fan of the node-cron style scheduling, I'd like to be able to retrieve and alter scheduled events and it felt somewhat heavy-handed to layer this on top of a cron system.
My current solution uses python on the server (so I can write a django web interface) with celery and rabbitmq, using a named queue per client. The client subscribes to that specific queue using node-amqp, and off we go. This also allows me to create queues that multiple clients can be interested in, which is a neat bonus.
This answer makes me think I'm doing the right thing -- but as I'm new to this stuff, it feels like I might be missing something. Are there alternatives I should consider in the world of server-client messaging?

Since you are already using python you could take a look at python remote objects, (pyro).

How to avoid making a webserver a bottleneck with cassandra?

Im new to Cassandra, so bear with me.
So, I am building a search engine using Cassandra as the db. I am interacting with it through Pycassa.
Now, I want to output Cassandra's response to a webpage, having the user submitted a query.
I am aware of tools such as django, fastCGI, SCGI, etc to allow python to web interaction. However, how does one run a python script on a webserver without turning this server into a single point of failure ( i.e., if this server dies than the system is not accessible by the user) - and therefore negating one purpose of Cassandra?

I've seen this problem before - sometimes people need much more CPU power and bandwidth to generate and serve some server generated HTML and images than they do to run the actual query in Cassandra. For one customer, this was many 10's times the number of servers serving the front end of the website than in their Cassandra cluster.
You'll need to load balance between these front end servers somehow - investigate running haproxy on a few dedicated machines. Its quick and easy to configure, and similarly easy to reconfigure when your setup changes (unlike DNS, which can take days to propagate changes). I think you can also configure nginx to do the same. If you keep per-session information in your front end servers, you'll need each client to go to the same front end server for each request - this is called "session persistence", and can be achieved by hashing the client's IP to pick the front end server. Haproxy will do this for you.
However this approach will again create a SPOF in your configuration (the haproxy server) - you should run more than one, and potentially have a hot standby. Finally you will need to somehow balance load between your haproxies - we typically use round robin DNS for this, as the nodes running haproxy seldom change.
The benefit of this system is that you can easily scale up (and down) the number of front end servers without changing your DNS. You can read (a little bit) more about the setup I'm referring to at: http://www.acunu.com/blogs/andy-ormsby/using-cassandra-acunu-power-britains-got-talent/

Theo Schlossnagle's Scalable Internet Architectures covers load balancing and a lot more. Highly recommended.

I'm looking for a network service that'll let me send messages to selected clients

I have a program which will be running on multiple devices on a network. These programs will need to send data between each other - to specified devices (not all devices).
server = server.Server('192.168.1.10')
server.identify('device1')
server.send('device2', 'this will be pickled and sent to device2')
That's some basic example code for what I need to do. Of course, it will also need to receive.
I was looking at building my own simple message passing server using Twisted when someone pointed me in the direction of MPI. I've never looked into the MPI protocol before and that website gives rather vague examples.
Is MPI a good approach? Are there better alternatives?

MPI is really good at doing the communications for running a tightly-coupled program accross several or many machines in a cluster. If you're running very loosely coupled programs - only interacting occasionally - or the machines are more distributed than within a cluster, like scattered around a LAN - then MPI is probably not what you're looking for.

There are several Open Source message brokers that already handle this kind of stuff for you, and come with a full API ready to use.
You should take a look at:
ActiveMQ which has a Python Stomp client.
RabbitMQ has a Python client too - see Building RabbitMQ apps using Python.
You could build it yourself, but that would be like reinventing the wheel (and on a side-note: I actually only realised I was half-way building a message broker before I started looking at existing solutions - building one takes a lot of work).

Consider using something like ZeroMQ. It supports the most useful messaging idioms - push/pull, publish/subscribe and so on, and although it's not 100% clear from your question which one you need, I'm pretty sure you will find the answer there.
They have a great user guide here, and the Python bindings are well-developed and supported. Some code samples are here.

You can implement MPI functions in order to create a communication between different codes. In this case the server program should public "MPI ports" with differents IDs. Clients should look for this ports and try to connect to them. Only server can accept each communication. Once the communication is stablished, codes can exchange data between them.
Another posibility is to run different programs in Multiple Instruction MPI option. In this case all programs are executed at the same time, and there is not necessity to create port communicators. After they are executed, you can create particular communicators between groups of programms you select.
Please tell me what kind of method you need and I can provide c code to implement the functions.

Proper way to publish and find services on a LAN using Python

My app opens a TCP socket and waits for data from other users on the network using the same application. At the same time, it can broadcast data to a specified host on the network.
Currently, I need to manually enter the IP of the destination host to be able to send data. I want to be able to find a list of all hosts running the application and have the user pick which host to broadcast data to.
Is Bonjour/ZeroConf the right route to go to accomplish this? (I'd like it to cross-platform OSX/Win/*Nix)

it can broadcast data to a specified host on the network
This is a non-sequitur.
I'm presuming that you don't actually mean broadcast, you mean Unicast or just "send"?
Is Bonjour/ZeroConf the right route to go to accomplish this?
This really depends on your target environment and what your application is intended to do.
As Ignacio points out, you need to install the Apple software on Windows for Zeroconf/mDNS to work at the moment.
This might be suitable for small office / home use.
However larger networks may have Layer 2 Multicast disabled for a variety of reasons, at which point your app might be in trouble.
If you want it to work in the enterprise environment, then some configuration is required, but that doesn't have to be done at the edge (in the app client instances).
Could be via a DHCP option, or by DNS service records.. in these cases you'd possibly be writing a queryable server to track active clients.. much like a BitTorrent Tracker.
Two things to consider while designing your networked app:
Would there ever be reason to run more than one "installation" of your application on a network?
Always consider the implications of versioning: One client is more up to date than another, can they still talk to each other or at least fail gracefully?

Zeroconf/DNS-SD is an excellent idea in this case. It's provided by Bonjour on OS X and Windows (but must be installed separately or as part of an Apple product on Windows), and by Avahi on FOSS *nix.

I think that ZeroConf is a very good start. You may find this document useful.

I have a list on a webpage, nice if you need internet communications.
<dl_service updated="2010-12-03 11:55:40+01:00">
<client name="internal" ip="10.0.23.234" external_ip="1.1.1.1"/>
<client name="bigone" ip="2.2.2.2" external_ip="2.2.2.2">
<messsage type="connect" from="Bigone" to="internal" />
</client>
</dl_service>
My initial idea was to add firewall punching and all that, but I just couldn't be bothered too many of the hosts where using external IPs for it to be a problem..
But I really recommend Zeroconf, at least if you use Linux+MacOSX, don't know about Windows at all.

Is there a Python interface to the Apache scoreboard (for server statistics)?

In short: Is there an existing open-source Python interface for the Apache scoreboard IPC facility? I need to collect statistics from a running server WITHOUT using the "mod_status" HTTP interface, and I'd like to avoid Perl if possible.
Some background: As I understand it, the Apache web server uses a functionality called the "scoreboard" to handle inter-process coordination. This may be purely in-memory, or it may be file-backed shared memory. (PLEASE correct me if that's a mis-statement!)
Among other uses, "mod_status" lets you query a special path on a properly-configured server, receiving back a dynamically-generated page with a human-readable breakdown of Apache's overall functioning: uptime, request count, aggregate data transfer size, and process/thread status. (VERY useful information for monitoring perforamnce, or troubleshooting running servers that can't be shut down for debugging.)
But what if you need the Apache status, but you can't open an HTTP connection to the server? (My organization sees this case from time to time. For example, the Slowloris attack.) What are some different ways that we can get the scoreboard statistics, and is there a Python interface for any of those methods?
Note that a Perl module, Apache::Scoreboard, appears to be able to do this. But I'm not at all sure whether it can reach a local server's stats directly (shared memory, with or without a backing file), or whether it has to make a TCP connection to the localhost interface. So I'm not even sure whether this Perl module can do what we're asking. Beyond that, I'd like to avoid Perl in the solution for independent organizational reasons (no offense, Perl guys!).
Also, if people are using a completely different method to grab these statistics, I would be interested in learning about it.

Apache::Scoreboard can both fetch the scoreboard over HTTP, or, if it is loaded into the same server, access the scoreboard memory directly. This is done via a XS extension (i.e. native C). See httpd/include/scoreboard.h for how to access the in-memory scoreboard from C.
If you're running in mod_python, you should be able to use the same trick as Apache::Scoreboard: write a C extension to access the scoreboard directly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.