How to avoid making a webserver a bottleneck with cassandra?

How to avoid making a webserver a bottleneck with cassandra? - python

Im new to Cassandra, so bear with me.
So, I am building a search engine using Cassandra as the db. I am interacting with it through Pycassa.
Now, I want to output Cassandra's response to a webpage, having the user submitted a query.
I am aware of tools such as django, fastCGI, SCGI, etc to allow python to web interaction. However, how does one run a python script on a webserver without turning this server into a single point of failure ( i.e., if this server dies than the system is not accessible by the user) - and therefore negating one purpose of Cassandra?

I've seen this problem before - sometimes people need much more CPU power and bandwidth to generate and serve some server generated HTML and images than they do to run the actual query in Cassandra. For one customer, this was many 10's times the number of servers serving the front end of the website than in their Cassandra cluster.
You'll need to load balance between these front end servers somehow - investigate running haproxy on a few dedicated machines. Its quick and easy to configure, and similarly easy to reconfigure when your setup changes (unlike DNS, which can take days to propagate changes). I think you can also configure nginx to do the same. If you keep per-session information in your front end servers, you'll need each client to go to the same front end server for each request - this is called "session persistence", and can be achieved by hashing the client's IP to pick the front end server. Haproxy will do this for you.
However this approach will again create a SPOF in your configuration (the haproxy server) - you should run more than one, and potentially have a hot standby. Finally you will need to somehow balance load between your haproxies - we typically use round robin DNS for this, as the nodes running haproxy seldom change.
The benefit of this system is that you can easily scale up (and down) the number of front end servers without changing your DNS. You can read (a little bit) more about the setup I'm referring to at: http://www.acunu.com/blogs/andy-ormsby/using-cassandra-acunu-power-britains-got-talent/

Theo Schlossnagle's Scalable Internet Architectures covers load balancing and a lot more. Highly recommended.

Related

How to connect server from client in Heroku using IP address

I am developing one application using heroku, but struggling with one issue.
In this application, I have 2 dynos (one is for server, and the other is for client).
Since I want to get some data from server, my client needs to know IP address of the server(dyno).
Now I am trying to use Fixie and QuotaGuard Static,
They tell me an IP address, but I can not connect to the server using these IP address.
Could you tell me how to fix it?

You want to have two dynos communicate directly over a socket connection. Unfortunately, you can't easily do that; that runs counter to the ethos of Heroku and 12-factor application design (http://12factor.net), which specifies that processes should be isolated from each other, and that communication be via "network attached services". That second point may seem like a nuance, but it affects how the dynos discover the other services (via injected environment variables).
There are many reasons for this constraint, not the least of which is the fact that "dynos", as a unit of compute, may be scaled, migrated to different physical servers, etc., many times over an application's lifecycle. Trying to connect to a socket on a dyno reliably would actually get pretty complicated (selecting the right one if multiple are running, renegotiating connections after scaling/migration events, etc.). Remember - even if you are never going to call heroku ps:scale client=2, Heroku doesn't know that and, as a platform, it is designed to assume that you will.
The solution is to use an intermediate service like Redis to facilitate the inter-process communication via a framework like Python RQ or similar.
Alternatively, treat the two dynos as separate applications - then you can connect from one to the other via HTTP using the publicly available DNS entry for that application. Note - in that case, it would still be possible to share a database if that's required.
Hope that helps.

Does Django require the full web-setup to run professionally on non-internet devices?

This is either a really good or really stupid question, but I find its worth asking --
I'm making a Django app that runs on a device as an interfaces. Is there any reason to think I could just use the python manage.py runserver and go no further? Or is there a better way to do this?
Installing the full web-bundle for local-network-devices seems excessive, hence my question. (Perhaps there is not a great deal of overhead using the full-web-setup -- I dunno). This is currently on the Raspberry pi, but for prototype purposes. The end-product will not necessarily be Pi.

It depends on how many users you're expecting to connect at once. The Django development server is suitable for only one connection at a time. It isn't good at handling multiple sessions and is not designed to stay up for long periods of time. This is the reason the docs clearly state
do not use this server in a production setting!
That said, running with an application server like gunicorn may be all you need to support hosting multiple users. It uses multiple workers so that if one user's request crashes, it can continue serving all the other users.
https://docs.djangoproject.com/en/1.11/howto/deployment/wsgi/gunicorn/
Finally, if you're serving up a lot of assets like images or videos, you should really have a full web server like Nginx to intercept asset URLs so they're not served through Django itself. Django should not be serving assets directly in production.
https://www.digitalocean.com/community/tutorials/how-to-serve-django-applications-with-uwsgi-and-nginx-on-ubuntu-14-04

Synchronize sensor data over internet upon connection

I have a subsystem that contains sensor data and posts it to the Internet via TCP or UDP requests to a server with authorization by token. All the posted data is also saved to a local capped MongoDB database.
I want this system to be tolerant to network failures and outrages and to be able to synchronize data when the network is back.
What is the correct way to implement this without re-inventing the wheel?
I see several options:
MongoDB replication.
PROs:
Replication tools exist
CONs:
How to do that in real-time? Having two systems: to post one way when the system is online and the other way when the system is offline seems to be a bad idea.
No idea how to manage access tokens (I don't want to give direct access to the server database)
Server side scheme should match the local one (but can be PRO since then manual import is trivial)
Maintaining 'last ACKed' records and and re-transmitting once a while.
PROs:
Allows for different data scheme locally and on server side
Works
CONs:
Logic is complex (detect failures, monitor for network connectivity, etc)
Exactly opposite from 'reinventing the wheel'
Manual data backfeed is hardly possible (e.g. when the system is completely disconnected for a long time and data is restored from back-ups).
I want a simple and reliable solution (the project is in Python, but I'm also fine with JavaScript/CoffeeScript in a separate process). I prefer a Python module that is tailored for this task (I failed to find one) or a piece of advice of how to organize the system UNIX way.
I believe this is a solved problem and has a known best practices which I ceased to find.
Thank you!

Possible to use websockets on a shared hosting web server?

I use PHP, JS, HTML, CSS. I'm willing to learn ruby or python if that is the best option.
My next project will involve live data being fed to users from the server and vice versa. I have shell access on my shared server, but I'm not sure about access to ports. Is it possible to use websockets or any other efficient server-client connection on a shared hosting account, and if so, what do I need to do?

For having the best performance and full control of your setup you need "your own" server.
Today there are a huge amount of virtual server providers which means you get full control over your IP but where the physical server is still shared between many clients, meaning cheaper prices and more flexibility.
I recommend utilizing the free tier program at Amazon EC2, you can always resign after the free period. And they have many geographical locations to choose from.
Another provider in Europe that I have been satisfied with is Tilaa
You can probably find many more alternatives that suits your needs on the Webhosting talk forum

Until some weeks ago, websockets deployment required either a standalone server running on a different port, or server side proxies like varnish/haproxy to listen on port 80 and redirecting normal http traffic. The latest nginx versions added built-in support for websockets, but unless your hosting provider uses it, you're out of luck. (note that I don't have personal experience with this nginx feature)
Personally I find that for most applications, websockets can be replaced with Server-sent events instead - a very lightweight protocol which is basically another http connection that stays open on the server side and sends a stream of plaintext with messages separated by double newlines.
It's supported in most decent browsers, but since this excludes internet explorer there are polyfills available here and here
This covers one side of the connection, the one that is usually implemented with long-polling. The other direction can be covered the usual way with XHR. The end result is very similar to websockets IMO, but with a bit higher latency for client->server messages.

Android app database syncing with remote database

I'm in the planning phase of an Android app which synchronizes to a web app. The web side will be written in Python with probably Django or Pyramid while the Android app will be straightforward java. My goal is to have the Android app work while there is no data connection, excluding the social/web aspects of the application.
This will be a run-of-the-mill app so I want to stick to something that can be installed easily through one click in the market and not require a separate download like CloudDB for Android.
I haven't found any databases that support this functionality so I will write it myself. One caveat with writing the sync logic is there will be some shared data between users that multiple users will be able to write to. This is a solo project so I thought I'd through this up here to see if I'm totally off-base.
The app will process local saves to the local sqlite database and then send messages to a service which will attempt to synchronize these changes to the remote database.
The sync service will alternate between checking for messages for the local app, i.e. changes to shared data by other users, and writing the local changes to the remote server.
All data will have a timestamp for tracking changes
When writing from the app to the server, if the server has newer information, the user will be warned about the conflict and prompted to overwrite what the server has or abandon the local changes. If the server has not been updated since the app last read the data, process the update.
When data comes from the server to the app, if the server has newer data overwrite the local data otherwise discard it as it will be handled in the next go around by the app updating the server.
Here's some questions:
1) Does this sound like overkill? Is there an easier way to handle this?
2) Where should this processing take place? On the client or the server? I'm thinking the advantage of the client is less processing on the server but if it's on the server, this makes it easier to implement other clients.
3) How should I handle the updates from the server? Incremental polling or comet/websocket? One thing to keep in mind is that I would prefer to go with a minimal installation on Webfaction to begin with as this is the startup.
Once these problems are tackled I do plan on contributing the solution to the geek community.

1) Looks like this is pretty good way to manage your local & remote changes + support offline work. I don't think this is overkill
2) I think, you should cache user's changes locally with local timestamp until synchronizing is finished. Then server should manage all processing: track current version, commit and rollback update attempts. Less processing on client = better for you! (Easier to support and implement)
3) I'd choose polling if I want to support offline-mode, because in offline you can't keep your socket open and you will have to reopen it every time when Internet connection is restored.
PS: Looks like this is VEEERYY OLD question... LOL

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.