portable GAE bigtable abstraction for python? - python

I am designing a new GAE python application and would like to design it in a way to allow self-hosting.
A lot of web frameworks are platform neutral, but when it comes to database, I have a very hard time finding any NoSql abstraction that would work on GAE and something (anything) else.
The only solutions I see:
AppScale http://code.google.com/p/appscale/ :
a virtualmachine that emulates the datastore API's. The biggest issue is the need of a virtual machine, so only suitable for ultra-big enterprise level development. This is probably the best solution though.....
Django-norel http://www.allbuttonspressed.com/projects/django-nonrel :
some GAE specific weirdness, but seems manageable. the main drawback is being forced to use django, (I am inclined towards pyramid)
So my question: are there any other potential solutions? a "light" abstraction allowing retargeting from bigtable to, say couchdb or another nosql database would be ideal.
PS: I know i could use Google cloud-sql (hosted mysql instance) but I'm looking to focus on nosql.

TyphoonAE includes a MongoDB stub for the datastore and the official SDK includes a SQLlite stub

Related

Django cache vs App Engine cache - which one should I use?

I'm running Django (1.5) on App Engine and I need to use some kind of key-value cache. I know App Engine's memcache API and also the Django's cache framework. I wonder which one should I use.
On one hand I would like my code to be as portable as possible for migrating it to another cloud platform. But on the other hand I would like to fully utilize the services offered by App Engine.
Is writing a custom cache backend for Django that will use the App Engine memcache is the best solution?
Tzach, I think you're already answering your question.
Putting your app in GAE and not using the services provided by Google it doesn't look to me as a wise decision, even more, when those features are key for performance at the same time free or very cheap.
On the other hand, the embedded default cache in Python is not guaranteed to give its best results under GAE, as GAE instances are not a normal server where you'd run your django instance, e.g. instances can be shutdown at any time.
These special characteristics found in Django are tuned in the django for GAE versions.
For that reason, and taking into account that using the GAE memcache is also straightforward, I'd recommend you using the easiest ones to add to your application.
And, if in the future, you move to another platform, there will be more things to change than the key-value cache.
My two cents on that is to focus firstly in getting the job done and secondly in optimizing the performance on GAE and only afterwards to start thinking on things to improve.

Migrating Django Application to Google App Engine?

I'm developing a web application and considering Django, Google App Engine, and several other options. I wondered what kind of "penalty" I will incur if I develop a complete Django application assuming it runs on a dedicated server, and then later want to migrate it to Google App Engine.
I have a basic understanding of Google's data store, so please assume I will choose a column based database for my "stand-alone" Django application rather than a relational database, so that the schema could remain mostly the same and will not be a major factor.
Also, please assume my application does not maintain a huge amount of data, so that migration of tens of gigabytes is not required. I'm mainly interested in the effects on the code and software architecture.
Thanks
Most (all?) of Django is available in GAE, so your main task is to avoid basing your designs around a reliance on anything from Django or the Python standard libraries which is not available on GAE.
You've identified the glaring difference, which is the database, so I'll assume you're on top of that. Another difference is the tie-in to Google Accounts and hence that if you want, you can do a fair amount of access control through the app.yaml file rather than in code. You don't have to use any of that, though, so if you don't envisage switching to Google Accounts when you switch to GAE, no problem.
I think the differences in the standard libraries can mostly be deduced from the fact that GAE has no I/O and no C-accelerated libraries unless explicitly stated, and my experience so far is that things I've expected to be there, have been there. I don't know Django and haven't used it on GAE (apart from templates), so I can't comment on that.
Personally I probably wouldn't target LAMP (where P = Django) with the intention of migrating to GAE later. I'd develop for both together, and try to ensure if possible that the differences are kept to the very top (configuration) and the very bottom (data model). The GAE version doesn't necessarily have to be perfect, as long as you know how to make it perfect should you need it.
It's not guaranteed that this is faster than writing and then porting, but my guess is it normally will be. The easiest way to spot any differences is to run the code, rather than relying on not missing anything in the GAE docs, so you'll likely save some mistakes that need to be unpicked. The Python SDK is a fairly good approximation to the real App Engine, so all or most of your tests can be run locally most of the time.
Of course if you eventually decide not to port then you've done unnecessary work, so you have to think about the probability of that happening, and whether you'd consider the GAE development to be a waste of your time if it's not needed.
Basically, you will change the data model base class and some APIs if you use them (PIL, urllib2, etc).
If your goal is app-engine, I would use the app engine helper http://code.google.com/appengine/articles/appengine_helper_for_django.html. It can run it on your server with a file based DB and then push it to app-engine with no changes.
It sounds like you have awareness of the major limitation in building/migrating your app -- that AppEngine doesn't support Django's ORM.
Keep in mind that this doesn't just affect the code you write yourself -- it also limits your ability to use a lot of existing Django code. That includes other applications (such as the built-in admin and auth apps) and ORM-based features such as generic views.
There are a few things that you can't do on the App Engine that you can do on your own server like uploading of files. On the App Engine you kinda have to upload it and store the datastore which can cause a few problems.
Other than that it should be fine from the Presentation part. There are a number of other little things that are better on your own dedicated server but I think eventually a lot of those things will be in the App Engine

Is Google App Engine right for me?

I am thinking about using Google App Engine.It is going to be a huge website. In that case, what is your piece of advice using Google App Engine. I heard GAE has restrictions like we cannot store images or files more than 1MB limit(they are going to change this from what I read in the GAE roadmap),query is limited to 1000 results, and I am also going to se web2py with GAE. So I would like to know your comments.
Thanks
Having developed a smallish site with GAE, I have some thoughts
If you mean "huge" like "the next YouTube", then GAE might be a great fit, because of the previously mentioned scaling.
If you mean "huge" like "massively complex, with a whole slew of screens, models, and features", then GAE might not be a good fit. Things like unit testing are hard on GAE, and there's not a built-in structure for your app that you'd get with something like (famously) (Ruby on) Rails, or (Python powered) Turbogears.
ie: there is no staging environment: just your development copy of the system and production. This may or may not be a bad thing, depending on your situation.
Additionally, it depends on the other Python modules you intend to pull in: some Python modules just don't run on GAE (because you can't talk to hardware, or because there are just too many files in the package).
Hope this helps
using web2py on Google App Engine is a great strategy. It lets you get up and running fast, and if you do outgrow the restrictions of GAE then you can move your web2py application elsewhere.
However, keeping this portability means you should stay away from the advanced parts of GAE (Task Queues, Transactions, ListProperty, etc).
The AppEngine uses BigTable as it's datastore backend. Don't try to write a traditional relational-database driven application. BigTable is much more well suited for use as a highly-scalable key-value store. Avoid joins if at all possible.
I wouldn't worry about any of this. After having played with Google App Engine for a while now, I've found that it scales quite well for large data sets. If your data elements are large (i.e. photos), then you'll need to integrate with another service to handle them, but that's probably going to be true no matter what with data of that size. Also, I've found BigTable relatively easy to work with having come from a background entirely in relational databases. Finally, Django is a somewhat hidden, but awesome, "feature" of Google App Engine. If you've never used it, it's a really nice, elegant web framework that makes a lot of common tasks trivial (forms come to mind here).
Google has just released version 1.3.0 of the SDK with support with a new Blobstore API for storage of files up to 50MB. See the post "App Engine SDK 1.3.0 Released Including Support for Larger User Uploads".
What about Google Wave? It's being built on appengine, and once live, real-time translatable chat reaches the corporate sector... I could see it hitting top 1000th... But then again, that's an internal project that gets to do special stuff other appengine apps can't.... Like hanging threads; I think... And whatever else Wave has under the hood...
If you are planning on a 'huge' website, then don't use App Engine. Simple as that. The App Engine is not built to deliver the next top 1000th website.
Allow me to also ask what do you mean by 'huge', how many simultaneous users? Queries per second? DB load?

Use Google AppEngine datastore outside of AppEngine project

For my little framework Pyxer I would like to to be able to use the Google AppEngine datastores also outside of AppEngine projects, because I'm now used to this ORM pattern and for little quick hacks this is nice. I can not use Google AppEngine for all of my projects because of its's limitations in file size and number of files.
A great alternative would also be, if there was a project that provides an ORM with the same naming as the AppEngine datastore. I also like the GQL approach very much, since this is a nice combination of ORM and SQL patterns.
Any ideas where or how I might find such a solution? Thanks.
Nick Johnson, from the app engine team himself, has a blog posting listing some of the alternatives, including his BDBdatastore.
However, that assumes you want to use exactly the same ORM that you use now in app engine. There are tons of ORM options in general out there, though I am not familiar with the state of the art in Python. This question does seem to address the issue though.
You might also want to look at AppScale, which is "a platform that allows users to deploy and host their own Google App Engine applications".
It's probably overkill for your purposes, but definitely something to look into.
There is also the Remote API which the bulkloader tool uses to upload or download data into/from the Datastore.
Maybe it could be used to have applications which are not hosted on AppEngine to still use the Datastore there.

backend for python

which is the best back end for python applications and what is the advantage of using sqlite ,how it can be connected to python applications
What do you mean with back end? Python apps connect to SQLite just like any other database, you just have to import the correct module and check how to use it.
The advantages of using SQLite are:
You don't need to setup a database server, it's just a file
No configurations needed
Cross platform
Mainly, desktops applications are the ones that take real advantage of this. For web apps, SQLite is not recommended, since the file containing the data, is easily readable (lacks any kind of encryption), and when the web server lacks special configuration, the file is downloadable by anyone.
Django, Twisted, and CherryPy are popular Python "Back-Ends" as far as web applications go, with Twisted likely being the most flexible as far as networking is concerned.
SQLite can, as has been previously posted, be directly interfaced with using SQL commands as it has native bindings for Python, or it can be accessed with an Object Relational Manager such as SQLObject (another Python library).
As far as performance is concered, SQLite is fairly scalable and should be able to handle most use cases that don't require a seperate database server (nothing enterprise level). An additional benefit of SQLite is that the database is self-contained in a single file allowing for easy backup while remained a common enough format that multiple applications can access the data. A word of advice on using SQLite with Python, however, is that you may run into issues with threading (in the past most of the bindings for SQLite were not thread-safe, although this may have changed over time).
The language you are using at the application layer has little to do with your database choice underneath. You need to examine the advantages of other DB packages to get an idea of what you want.
Here are some popular database packages for cheap or free:
ms sql server express, pg/sql, mysql
If you mean "what is the best database?" then there's simply no way to answer this question. If you just want a small database that won't be used by more than a handful of people at a time, SQLite is what you're looking for. If you're running a database for a giant corporation serving thousands, you're probably looking for Oracle. In between those, you have MySQL, PostgreSQL, SQL Server, db2, and probably more.
If you're familiar with one of those, that may be the best to go with from a practical standpoint. If you're doing a typical webapp, my advice would be to go with MySQL or PostgreSQL as they're free and well supported by just about any ORM you could think of (my personal preference is towards PostgreSQL, but I'm not experienced enough with either of these to make a good argument one way or another). If you do go with one of those two, my recommendation is to use storm as the ORM.
(And yes, there are free versions of SQL Server and Oracle. You won't have as many choices as far as ORMs go though)

Categories

Resources