On a website I'm making, there's a section that hits the database pretty hard. Harder than I want. The data that's being retrieved is all very static. It will rarely change. So I want to cache it.
I came across http://wiki.pylonshq.com/display/pylonsdocs/Caching+in+Templates+and+Controllers and had a good read have been making use of template caching using:
return render('tmpl.html', cache_expire='never')
That works great until I modify the HTML. The only way I've found to delete the cache is to remove the cache_expire parameter from render() and delete the cache folder. But, meh, it works.
What I want to be able to, however, is cache Lists, Tuples and Dictionaries. From reading the above wiki page, it seems this isn't possible?
I want to be able to do something like:
data = [i for i in range(0, 2000000)]
mycache = cache.get_cache('cachename')
value = mycache.get(key='dataset1', list=data, type='memory', expiretime='3600')
print value
Allowing me to do some CPU intensive work (list generation, in this example) and then cache it.
Can this be done with Pylons?
As alternative of traditional cache you can use app globals variables. Once on server startup load data to variable and then use data in you actions or direct in templates.
http://pylonsbook.com/en/1.1/exploring-pylons.html#app-globals-object
Also you can code some action to update this global variable through the admin interface or by other events.
Why not use memcached?
Look at this question on SO on how to use it with pylons: Pylons and Memcached
Related
I am implementing a really lightweight Web Project, which has just one page, showing data in a diagram. I use Django as a Webserver and d3.js as plotting routine for this diagram. As you can imagine, there are just a few simple time series which have to be responded by Django server, so I was wondering if I simply could hold this variable in ram. My first test was positive, I had something like this in my views.py:
X = np.array([123,23,1,32,123,1])
#csrf_exempt
def getGraph(request):
global X
return HttpResponse(json.dumps(X))
Notice, X is updated by another function every now and then, but all user access is read-only. Do I have to deal with
security issues by defining a global variable?
inconsistencies in general?
I found a thread discussing global variables in Django, but in that case, the difficulty is of handling multiple write-access.
To answer potential questions on why I don't want store data in database: All data I got in my X is already stored in a huge remote database and this web app just needs to display data.
Storing it in a variable does indeed have threading implications (and also scalibility - what if you have two Django servers running the same app?). The advice from the Django community is don't!.
This sounds like a good fit for the Django cache system though. Just cache your getGraph view with #cache_page and the job is done. No need to use memcache, the built-in in-memory memory-cache cache-backend* will work fine. Put a very high number as the time-out on the cache (years).
This way you are storing the HTTP response (JSON) not the value of X. But from your code sample, that is not a problem. If you need to re-calculate X you need to re-calculate the JSON, and if you need to re-calculate the JSON you will need to re-calculate X.
https://docs.djangoproject.com/en/dev/topics/cache/?from=olddocs/
1 or just 'built-in memory backend', I couldn't resist
I'm creating a Django-powered site for my newspaper-ish site. The least obvious and common-sense task that I have come across in getting the site together is how best to generate a "top articles" list for the sidebar of the page.
The first thing that came to mind was some sort of database column that is updated (based on what?) with every view. That seems (to my instincts) ridiculously database intensive and impractical and thus I think I'd like to find another solution.
Thanks all.
I would give celery a try (with django-celery). While it's not so easy to configure and use as cache, it enables you to queue tasks like incrementing counters and do them in background. It could be even combined with cache technique - in views increment counters in cache and define PeriodicTask that will run every now and then, resetting counters and writing them to the database.
I just remembered - I once found this blog entry which provides nice way of incrementing 'viewed_count' (or similar) column in database with AJAX JS call. If you don't have heavy traffic maybe it's good idea?
Also mentioned in this post is django-tracking, but I don't know much about it, I never used it myself (yet).
Premature optimization, first try the db way and then see if it really is too database sensitive. Any decent database has so good caches it probably won't matter very much. And even if it is a problem, take a look at the other db/cache suggestions here.
It is most likely by the way is that you will have many more intensive db queries with each view than a simple view update.
If you do something like sort by top views, it would be fast if you index the view column in the DB. Another option is to only collect the top x articles every hour or so, and toss that value into Django's cache framework.
The nice thing about caching the list is that the algorithm you use to determine top articles can be as complex as you like without hitting the DB hard with every page view. Django's cache framework can use memory, db, or file system. I prefer DB, but many others prefer memory. I believe it uses pickle, so you can also store Python objects directly. It's easy to use, recommended.
An index wouldn't help as them main problem I believe is not so much getting the sorted list as having a DB write with every page view of an article. Another index actually makes that problem worse, albeit only a little.
So I'd go with the cache. I think django's cache shim is a problem here because it requires timeouts on all keys. I'm not sure if that's imposed by memcached, if not then go with redis. Actually just go with redis anyway, the python library is great, I've used it from django projects before, and it has atomic increments and powerful sorting - everything you need.
I have a need for some kind of information that is in essence static. There is not much of this information, but alot of objects will use that information.
Since there is not a lot of that information (few dictionaries and some lists), I thought that I have 2 options - create models for holding that information in the database or write them as dictionaries/lists to some settings file. My question is - which is faster, to read that information from the database or from a settings file? In either case I need to be able to access that information in lot of places, which would mean alot of database read calls. So which would be faster?
If they're truly never, ever going to change, then feel free to put them in your settings.py file as you would declare a normal Python dictionary.
However, if you want your information to be modifiable through the normal Django methods, then use the database for persistent storage, and then make the most of Django's cache framework.
Save your data to the database as normal, and then the first time it is accessed, cache them:
from django.core.cache import cache
def some_view_that_accesses_date(request):
my_data = cache.get('some_key')
if my_data is None:
my_data = MyObject.objects.all()
cache.set('some_key', my_data)
... snip ... normal view code
Make sure never to save None in a cache, as:
We advise against storing the literal
value None in the cache, because you
won't be able to distinguish between
your stored None value and a cache
miss signified by a return value of
None.
Make sure you kill the cache on object deletion or change:
from django.core.cache import cache
from django.db.models.signals import post_save
from myapp.models import MyModel
def kill_object_cache(sender, **kwargs):
cache.delete('some_key')
post_save.connect(kill_object_cache, sender=MyModel)
post_delete.connect(kill_object_cache, sender=MyModel)
I've got something similar to this in one of my apps, and it works great. Obviously you won't see any performance improvements if you then go and use the database backend, but this is a more Django-like (Djangonic?) approach than using memcached directly.
Obviously it's probably worth defining the cache key some_key somewhere, rather than littering it all over your code, the examples above are just intended to be easy to follow, rather than necessarily full-blown implementations of caching.
If the data is static, there is no need to keep going back to the database. Just read it the first time it is required and cache the result.
If there is some reason you can't cache the result in your app, you can always use memcached to avoid hitting the database.
The advantage of using memcached is that if the data does change, you can simply update the value in memcached.
Pseudocode for using memcached
if 'foo' in memcached
data = memcached.get('foo')
else
data = database.get('foo')
memcached.put('foo', data)
If you need fast access from multiple processes, then a database is the best option for you.
However, if you just want to keep data in memory and access it from multiple places in the same process, then Python dictionaries will be faster than accessing a DB.
I've written an application for Google AppEngine, and I'd like to make use of the memcache API to cut down on per-request CPU time. I've profiled the application and found that a large chunk of the CPU time is in template rendering and API calls to the datastore, and after chatting with a co-worker I jumped (perhaps a bit early?) to the conclusion that caching a chunk of a page's rendered HTML would cut down on the CPU time per request significantly. The caching pattern is pretty clean, but the question of where to put this logic of caching and evicting is a bit of a mystery to me.
For example, imagine an application's main page has an Announcements section. This section would need to be re-rendered after:
first read for anyone in the account,
a new announcement being added, and
an old announcement being deleted
Some options of where to put the evict_announcements_section_from_cache() method call:
in the Announcement Model's .delete(), and .put() methods
in the RequestHandler's .post() method
anywhere else?
Then in the RequestHandler's get page, I could potentially call get_announcements_section() which would follow the standard memcache pattern (check cache, add to cache on miss, return value) and pass that HTML down to the template for that chunk of the page.
Is it the typical design pattern to put the cache-evicting logic in the Model, or the Controller/RequestHandler, or somewhere else? Ideally I'd like to avoid having evicting logic with tentacles all over the code.
I've got just such a decorator up in an open source Github project:
http://github.com/jamslevy/gae_memoize/tree/master
It's a bit more in-depth, allowing for things like forcing execution of the function (when you want to refresh the cache) or forcing caching locally...these were just things that I needed in my app, so I baked them into my memoize decorator.
A couple of alternatives to regular eviction:
The obvious one: Don't evict, and set a timer instead. Even a really short one - a few seconds - can cut down on effort a huge amount for a popular app, without users even noticing data may be a few seconds stale.
Instead of evicting, generate the cache key based on criteria that change when the data does. For example, if retrieving the key of the most recent announcement is cheap, you could use that as part of the key of the cached data. When a new announcement is posted, you go looking for a key that doesn't exist, and create a new one as a result.
Suppose I have a simple view which needs to parse data from an external website.
Right now it looks something like this:
def index(request):
source = urllib2.urlopen(EXTERNAL_WEBSITE_URL)
bs = BeautifulSoup.BeautifulSoup(source.read())
finalList = [] # do whatever with bs to populate the list
return render_to_response('someTemplate.html', {'finalList': finalList})
First of all, is this an acceptable use?
Obviously, this is not good performance-wise. The external website page is pretty big, and I am only extracting a small part of it. I thought of two solutions:
Do all of this asynchronously. Load the rest of the page, populate with data once I get it. But I don't even know where to start. I'm just starting with Django and never done anything async up until now.
I don't care if this data is updated every 2-3 minutes, so caching is a good solution as well (also saves me the extra round-trips). How would I go about caching this data?
First, don't optimize prematurely. Get this to work.
Then, add enough logging to see what the performance problems (if any) really are.
You may find that end-user's PC is the slowest part; getting data from another site may, actually, be remarkably fast when you do not fetch .JS libraries and .CSS and artwork and the render then entire thing in a browser.
Once you're absolutely sure that the fetch of the remote content really IS a problem. Really. Then you have to do the following.
Write a "crontab" script that does the remote fetch form time to time.
Design a place to cache the remote results. Database or file system, pick one.
Update your Django app to get the data from the cache (database or filesystem) instead of the remote URL.
Only after you have absolute proof that the urllib2 read of the remote site is the bottleneck.
Caching with django is pretty easy,
from django.core.cache import cache
key = 'some-key'
data = cache.get(key)
if data is None:
# soupify the page and what not
cache.set(data, key, 60*60*8)
return render_to_response ...
return render_to_response
To answer your questions, you can do this asynchronously, but then you would have to use something like django cron to update the cache ever so often. On the other hand you can write this as a standalone python script, replace the cache imported from django with memcache and it would work the same way. It would reduce some of the performance issues your site could have, and as long as you know the cache key, you can retrieve the data from the cache.
Like Jarret said I would read django's caching docs and memcache's docs for more information.
Django has robust, built-in support for caching views: http://docs.djangoproject.com/en/dev/topics/cache/#topics-cache.
It offers solutions for caching entire views (such as in your case), or just certain parts of data in the view. There are even controls for how often to update the cache, and so forth.