Django on GoogleAppEngine: performance howto

Django on GoogleAppEngine: performance howto - python

I asked this question a few weeks ago. Today I have actually written and released a standard Django application, i.e. a fully-functional relational DB-backed (and consequently fully-functional Django admin) enabled by Google CloudSQL. The only time I had to deviate from doing things the standard Django way was to send email (had to do it the GAE way). My setup is GAE 1.6.4, Python2.7, Django 1.3 using the following in app.yaml:
libraries:
- name: django
version: "1.3"
However I do need you to suggest clear actionable steps to improve to the response time of the initial request when cold of this Django app. I have a simple webapp2 web site on GAE, which does not hit the DB, and when cold the response time is of 1.56s. The Django one, when cold, hits the DB with 2 queries (two count(*) queries over tables containing less than 300 rows each), and the response time is of 10.73s! Not encouraging for a home page ;)
Things that come to mind are to remove the middleware classes I don't need and other Django-specific optimisations. However tips that improve things also from a GAE standpoint would be really useful.
N.B. I don't want this to become a discussion about the merits of going for Django on GAE. I can mention that my personal Django expertise, and resulting development productivity, did bear considerably in adopting Django as opposed to other frameworks. Moreover with CloudSQL, it's easy to move away from GAE (hopefully not!) as the Django code will work everywhere else with little (or no) modifications. Related discussions about such topic can be found here and here.

I don't have a full answer but I'm contributing since I'd like to find a solution as well. I'm currently using a running cron job (I actually need the cron job, so it's not only for keeping my app alive).
I've seen it discussed in one of the GAE/Python/Django related mailing lists that just the time required for loading up all the Django files is significant when compared to webapp, so removing django components that you don't use from deployment should improve your startup time as well. I've been able to shave off about 3 seconds by removing certain parts of the contrib folder. I exclude them in my app.yaml.
My startup time is still around 6 seconds (full app, Django-nonrel, HRD). It used to be more like 4 when my app was simpler.
My suspicion is that Django verifies all its models on startup, and that processing time is significant. If you have time with experimenting with an app with absolutely 0 models, I'd be curious if it made any impact.
I'm also curious as to whether your two initial queries make any significant impact.

When there is no instance running, for example after version upgrade or when there is no request for 15 min, then a request will trigger loading of an instance which takes about 10s. So what you are seeing is normal.
So if your app is idle for periods longer then 15min you will see this behavior. One workaround is to have a cron job ping your instance every 10 min (though I believe google does not like that).
Update:
You can avoid this by enabling billing then in GAE admin under "Application settings" set minimum Idle Instances setting to 1. Note: the min setting is not available on free apps, only max.

Related

Django cache vs App Engine cache - which one should I use?

I'm running Django (1.5) on App Engine and I need to use some kind of key-value cache. I know App Engine's memcache API and also the Django's cache framework. I wonder which one should I use.
On one hand I would like my code to be as portable as possible for migrating it to another cloud platform. But on the other hand I would like to fully utilize the services offered by App Engine.
Is writing a custom cache backend for Django that will use the App Engine memcache is the best solution?

Tzach, I think you're already answering your question.
Putting your app in GAE and not using the services provided by Google it doesn't look to me as a wise decision, even more, when those features are key for performance at the same time free or very cheap.
On the other hand, the embedded default cache in Python is not guaranteed to give its best results under GAE, as GAE instances are not a normal server where you'd run your django instance, e.g. instances can be shutdown at any time.
These special characteristics found in Django are tuned in the django for GAE versions.
For that reason, and taking into account that using the GAE memcache is also straightforward, I'd recommend you using the easiest ones to add to your application.
And, if in the future, you move to another platform, there will be more things to change than the key-value cache.
My two cents on that is to focus firstly in getting the job done and secondly in optimizing the performance on GAE and only afterwards to start thinking on things to improve.

How to log deployments with New Relic on Heroku

I'm using the New Relic addon on a Django/Python Heroku app and I would like to log deployments, but I can't figure out how to do it.
Heroku offers an HTTP POST deploy hook, but it seems to be too restrictive to match the requirements for the New Relic REST API: it requires a x-api-key header and the parameter names don't match (see here for details).
I haven't been able to find any information about this anywhere. Am I missing something? Is there another way to do this?
Thanks.

This should happen automatically, but NewRelic deployment tracking integration with Heroku has been broken since approx Nov 1st.
I have a support ticket open on this issue and it should be fixed sometime in the next week or so.
EDIT (11/23/2013):
Heroku acknowledged this is a bug caused by an overhaul of the NewRelic addon. Here's what they said as root cause on my support ticket:
I've got an update on this, but no resolution yet. To give you some context (given you've asked how this has happened 3 times) New
Relic were the very first add-on in the Marketplace and as a result
there has been a lot of gnarly code very specific to their
implementation. On their side they've also had to . And as you've
gathered unfortunately much of it was not well tested. We've been
working with New Relic all year to finally fix that, and we've moved
them across to the standard API that all other add-ons and most PaaS
providers now adhere to. Any new customers since May have been on that
new integration so we've been testing it out for 6 months. The final
part of that process was to remove customers on the legacy integration
and that occurred as part of the migration onto the new pricing we
announced at the start of this month.
It's only after this migration we realized there was no support for deploy notifications.
New customers since may had never been exposed to the feature so
didn't notice it was missing, and it appears none of the legacy
customers we tested with in October noticed it was missing either. To
rectify the situation we've had to try and build this feature out in
the Add-ons API. That's been documented and deployed, and we're now
working with New Relic to help their engineers implement it as soon as
they possibly can.
I don't think you can view my support ticket, but you're welcome to reference it with Heroku if you file your own ticket:
https://help.heroku.com/tickets/102722
EDIT (01/06/2014):
NewRelic/Heroku appear to have fixed their integration so that deploys are now being tracked successfully. This appears to have gone into effect sometime on/before 1/2/2014.

Is there a production-safe way to measure time spent in Production w/Python?

I want to be able to instrument Python applications so that I know:
Page generation time.
Percentage of time spent in external requests (mysql, api calls).
Number of mysql queries, what the MySQL queries were.
I want this data from production (not offline profiling) - because the time spent in various places will be different under load.
In PHP I can do this with XHProf or instrumentation-for-php. In Ruby on Rails/.NET/Java, I can do this with New Relic.
Is there such a package recommended for Python or django?

Yes, it's perfectly possible. E.g. use some magic switch in URL, like "?profile-me" which triggers profiling in Django middleware.
There are a number of snippets on the Internet, like this one: http://djangosnippets.org/snippets/70/ or modules like this one: http://code.google.com/p/django-profiling/ - but I haven't used any of them so I cannot recommend anything.
Anyway, the approach they take is similar to what I do - i.e. use Python Hotshot profiler module in a middleware that wraps your view. For the MySQL part, you can just use connection.queries form Django.
The nice thing about Hotshot is that its output can be browsed using Kcachegrind like here: http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/

New Relic now had a package for Python, including Django through mod_wsgi.
https://support.newrelic.com/help/kb/python

django-prometheus is a good choice for handling production workloads, especially in a container environment like Kubernetes. Out of the box, it has middleware for tracking request latencies and counts (by view method), as well as Database and cache access times. It wouldn't be a good solution for tracking which queries are actually executing, but that's where a logging solution like ELK would come into play. If it helps, I've written a post which walks through how to add custom metrics to a Django application.

Migrating Django Application to Google App Engine?

I'm developing a web application and considering Django, Google App Engine, and several other options. I wondered what kind of "penalty" I will incur if I develop a complete Django application assuming it runs on a dedicated server, and then later want to migrate it to Google App Engine.
I have a basic understanding of Google's data store, so please assume I will choose a column based database for my "stand-alone" Django application rather than a relational database, so that the schema could remain mostly the same and will not be a major factor.
Also, please assume my application does not maintain a huge amount of data, so that migration of tens of gigabytes is not required. I'm mainly interested in the effects on the code and software architecture.
Thanks

Most (all?) of Django is available in GAE, so your main task is to avoid basing your designs around a reliance on anything from Django or the Python standard libraries which is not available on GAE.
You've identified the glaring difference, which is the database, so I'll assume you're on top of that. Another difference is the tie-in to Google Accounts and hence that if you want, you can do a fair amount of access control through the app.yaml file rather than in code. You don't have to use any of that, though, so if you don't envisage switching to Google Accounts when you switch to GAE, no problem.
I think the differences in the standard libraries can mostly be deduced from the fact that GAE has no I/O and no C-accelerated libraries unless explicitly stated, and my experience so far is that things I've expected to be there, have been there. I don't know Django and haven't used it on GAE (apart from templates), so I can't comment on that.
Personally I probably wouldn't target LAMP (where P = Django) with the intention of migrating to GAE later. I'd develop for both together, and try to ensure if possible that the differences are kept to the very top (configuration) and the very bottom (data model). The GAE version doesn't necessarily have to be perfect, as long as you know how to make it perfect should you need it.
It's not guaranteed that this is faster than writing and then porting, but my guess is it normally will be. The easiest way to spot any differences is to run the code, rather than relying on not missing anything in the GAE docs, so you'll likely save some mistakes that need to be unpicked. The Python SDK is a fairly good approximation to the real App Engine, so all or most of your tests can be run locally most of the time.
Of course if you eventually decide not to port then you've done unnecessary work, so you have to think about the probability of that happening, and whether you'd consider the GAE development to be a waste of your time if it's not needed.

Basically, you will change the data model base class and some APIs if you use them (PIL, urllib2, etc).
If your goal is app-engine, I would use the app engine helper http://code.google.com/appengine/articles/appengine_helper_for_django.html. It can run it on your server with a file based DB and then push it to app-engine with no changes.

It sounds like you have awareness of the major limitation in building/migrating your app -- that AppEngine doesn't support Django's ORM.
Keep in mind that this doesn't just affect the code you write yourself -- it also limits your ability to use a lot of existing Django code. That includes other applications (such as the built-in admin and auth apps) and ORM-based features such as generic views.

There are a few things that you can't do on the App Engine that you can do on your own server like uploading of files. On the App Engine you kinda have to upload it and store the datastore which can cause a few problems.
Other than that it should be fine from the Presentation part. There are a number of other little things that are better on your own dedicated server but I think eventually a lot of those things will be in the App Engine

Feedback on using Google App Engine? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Looking to do a very small, quick 'n dirty side project. I like the fact that the Google App Engine is running on Python with Django built right in - gives me an excuse to try that platform... but my question is this:
Has anyone made use of the app engine for anything other than a toy problem? I see some good example apps out there, so I would assume this is good enough for the real deal, but wanted to get some feedback.
Any other success/failure notes would be great.

I have tried app engine for my small quake watch application
http://quakewatch.appspot.com/
My purpose was to see the capabilities of app engine, so here are the main points:
it doesn't come by default with Django, it has its own web framework which is pythonic has URL dispatcher like Django and it uses Django templates
So if you have Django exp. you will find it easy to use
But you can use any pure python framework and Django can be easily added see
http://code.google.com/appengine/articles/django.html
google-app-engine-django (http://code.google.com/p/google-app-engine-django/) project is excellent and works almost like working on a Django project
You can not execute any long running process on server, what you do is reply to request and which should be quick otherwise appengine will kill it
So if your app needs lots of backend processing appengine is not the best way
otherwise you will have to do processing on a server of your own
My quakewatch app has a subscription feature, it means I had to email latest quakes as they happend, but I can not run a background process in app engine to monitor new quakes
solution here is to use a third part service like pingablity.com which can connect to one of your page and which executes the subscription emailer
but here also you will have to take care that you don't spend much time here
or break task into several pieces
It provides Django like modeling capabilities but backend is totally different but for a new project it should not matter.
But overall I think it is excellent for creating apps which do not need lot of background processing.
Edit:
Now task queues can be used for running batch processing or scheduled tasks
Edit:
after working/creating a real application on GAE for a year, now my opnion is that unless you are making a application which needs to scale to million and million of users, don't use GAE. Maintaining and doing trivial tasks in GAE is a headache due to distributed nature, to avoid deadline exceeded errors, count entities or do complex queries requires complex code, so small complex application should stick to LAMP.
Edit:
Models should be specially designed considering all the transactions you wish to have in future, because entities only in same entity group can be used in a transaction and it makes the process of updating two different groups a nightmare e.g. transfer money from user1 to user2 in transaction is impossible unless they are in same entity group, but making them same entity group may not be best for frequent update purposes....
read this http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine

I am using GAE to host several high-traffic applications. Like on the order of 50-100 req/sec. It is great, I can't recommend it enough.
My previous experience with web development was with Ruby (Rails/Merb). Learning Python was easy. I didn't mess with Django or Pylons or any other framework, just started from the GAE examples and built what I needed out of the basic webapp libraries that are provided.
If you're used to the flexibility of SQL the datastore can take some getting used to. Nothing too traumatic! The biggest adjustment is moving away from JOINs. You have to shed the idea that normalizing is crucial.
Ben

One of the compelling reasons I have come across for using Google App Engine is its integration with Google Apps for your domain. Essentially it allows you to create custom, managed web applications that are restricted to the (controlled) logins of your domain.
Most of my experience with this code was building a simple time/task tracking application. The template engine was simple and yet made a multi-page application very approachable. The login/user awareness api is similarly useful. I was able to make a public page/private page paradigm without too much issue. (a user would log in to see the private pages. An anonymous user was only shown the public page.)
I was just getting into the datastore portion of the project when I got pulled away for "real work".
I was able to accomplish a lot (it still is not done yet) in a very little amount of time. Since I had never used Python before, this was particularly pleasant (both because it was a new language for me, and also because the development was still fast despite the new language). I ran into very little that led me to believe that I wouldn't be able to accomplish my task. Instead I have a fairly positive impression of the functionality and features.
That is my experience with it. Perhaps it doesn't represent more than an unfinished toy project, but it does represent an informed trial of the platform, and I hope that helps.

The "App Engine running Django" idea is a bit misleading. App Engine replaces the entire Django model layer so be prepared to spend some time getting acclimated with App Engine's datastore which requires a different way of modeling and thinking about data.

I used GAE to build http://www.muspy.com
It's a bit more than a toy project but not overly complex either. I still depend on a few issues to be addressed by Google, but overall developing the website was an enjoyable experience.
If you don't want to deal with hosting issues, server administration, etc, I can definitely recommend it. Especially if you already know Python and Django.

I think App Engine is pretty cool for small projects at this point. There's a lot to be said for never having to worry about hosting. The API also pushes you in the direction of building scalable apps, which is good practice.
app-engine-patch is a good layer between Django and App Engine, enabling the use of the auth app and more.
Google have promised an SLA and pricing model by the end of 2008.
Requests must complete in 10 seconds, sub-requests to web services required to complete in 5 seconds. This forces you to design a fast, lightweight application, off-loading serious processing to other platforms (e.g. a hosted service or an EC2 instance).
More languages are coming soon! Google won't say which though :-). My money's on Java next.

This question has been fully answered. Which is good.
But one thing perhaps is worth mentioning.
The google app engine has a plugin for the eclipse ide which is a joy to work with.
If you already do your development with eclipse you are going to be so happy about that.
To deploy on the google app engine's web site all I need to do is click one little button - with the airplane logo - super.

Take a look the the sql game, it is very stable and actually pushed traffic limits at one point so that it was getting throttled by Google. I have seen nothing but good news about App Engine, other than hosting you app on servers someone else controls completely.

I used GAE to build a simple application which accepts some parameters, formats and send email. It was extremely simple and fast. I also made some performance benchmarks on the GAE datastore and memcache services (http://dbaspects.blogspot.com/2010/01/memcache-vs-datastore-on-google-app.html ). It is not that fast. My opinion is that GAE is serious platform which enforce certain methodology. I think it will evolve to the truly scalable platform, where bad practices simply not allowed.

I used GAE for my flash gaming site, Bearded Games. GAE is a great platform. I used Django templates which are so much easier than the old days of PHP. It comes with a great admin panel, and gives you really good logs. The datastore is different than a database like MySQL, but it's much easier to work with. Building the site was easy and straightforward and they have lots of helpful advice on the site.

I used GAE and Django to build a Facebook application. I used http://code.google.com/p/app-engine-patch as my starting point as it has Django 1.1 support. I didn't try to use any of the manage.py commands because I assumed they wouldn't work, but I didn't even look into it. The application had three models and also used pyfacebook, but that was the extent of the complexity. I'm in the process of building a much more complicated application which I'm starting to blog about on http://brianyamabe.com.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.