Models in database speed vs static dictionaries speed - python

I have a need for some kind of information that is in essence static. There is not much of this information, but alot of objects will use that information.
Since there is not a lot of that information (few dictionaries and some lists), I thought that I have 2 options - create models for holding that information in the database or write them as dictionaries/lists to some settings file. My question is - which is faster, to read that information from the database or from a settings file? In either case I need to be able to access that information in lot of places, which would mean alot of database read calls. So which would be faster?

If they're truly never, ever going to change, then feel free to put them in your settings.py file as you would declare a normal Python dictionary.
However, if you want your information to be modifiable through the normal Django methods, then use the database for persistent storage, and then make the most of Django's cache framework.
Save your data to the database as normal, and then the first time it is accessed, cache them:
from django.core.cache import cache
def some_view_that_accesses_date(request):
my_data = cache.get('some_key')
if my_data is None:
my_data = MyObject.objects.all()
cache.set('some_key', my_data)
... snip ... normal view code
Make sure never to save None in a cache, as:
We advise against storing the literal
value None in the cache, because you
won't be able to distinguish between
your stored None value and a cache
miss signified by a return value of
None.
Make sure you kill the cache on object deletion or change:
from django.core.cache import cache
from django.db.models.signals import post_save
from myapp.models import MyModel
def kill_object_cache(sender, **kwargs):
cache.delete('some_key')
post_save.connect(kill_object_cache, sender=MyModel)
post_delete.connect(kill_object_cache, sender=MyModel)
I've got something similar to this in one of my apps, and it works great. Obviously you won't see any performance improvements if you then go and use the database backend, but this is a more Django-like (Djangonic?) approach than using memcached directly.
Obviously it's probably worth defining the cache key some_key somewhere, rather than littering it all over your code, the examples above are just intended to be easy to follow, rather than necessarily full-blown implementations of caching.

If the data is static, there is no need to keep going back to the database. Just read it the first time it is required and cache the result.
If there is some reason you can't cache the result in your app, you can always use memcached to avoid hitting the database.
The advantage of using memcached is that if the data does change, you can simply update the value in memcached.
Pseudocode for using memcached
if 'foo' in memcached
data = memcached.get('foo')
else
data = database.get('foo')
memcached.put('foo', data)

If you need fast access from multiple processes, then a database is the best option for you.
However, if you just want to keep data in memory and access it from multiple places in the same process, then Python dictionaries will be faster than accessing a DB.

Related

update existing cache data with newer items in django

I want to use caching in Django and I am stuck up with how to go about it. I have data in some specific models which are write intensive. records will get added continuously to the model. Each user has some specific data in the model similar to orders table.
Since my model is write intensive I am not sure how effective caching frameworks in Django are going to be. I tried Django view specific caching and I am try to develop a view where first it will pick up data from the cache. Then I will have another call which will bring in data which was added to the model after the caching was done. What I want to do is add the updated data to the original cache data and store it again.
It is like I don't want to expire my cache, I just want to keep adding to my existing cache data. may be once in 3 hrs I can clear it.
Is what I am doing right. Are there better ways than this. Can I really add to items in existing cache.
I will be very glad for your help
You ask about "caching" which is a really broad topic, and the answer is always a mix of opinion, style and the specific app requirements. Here are a few points to consider.
If the data is per user, you can cache it per user:
from django.core.cache import cache
cache.set(request.user.id,"foo")
cache.get(request.user.id)
The common practice it to keep a database flag that tells you if the user's data changed since it was cached. So before you fetch the data from cache, check only this flag from the DB. If the flag says nothing changed, get the data from cache. If it did change, pull from DB, replace the cache, and set the flag again.
The flag check should be fast and simple: one table, indexed by user.id, and a boolean flag field. This will squeeze a lot of index rows into a single DB page, and enables a fast fetching of a single one field row. Yet you still get a persistent updated main storage, that prevents the use of not updated cache data. You can check this flag in a middleware.
You can run expiry in many ways: clear cache when user logs out, run a cron script that clears items, or let the cache backend expire items. If you use a flag check before you use the cache, there is no issue in keeping items in cache except space, and caching backends handle that. If you use the django simple file cache (which is easy, simple and zero config), you will have to clear the cache. A simple cron script will do.

How do I cache a list/dictionary in Pylons?

On a website I'm making, there's a section that hits the database pretty hard. Harder than I want. The data that's being retrieved is all very static. It will rarely change. So I want to cache it.
I came across http://wiki.pylonshq.com/display/pylonsdocs/Caching+in+Templates+and+Controllers and had a good read have been making use of template caching using:
return render('tmpl.html', cache_expire='never')
That works great until I modify the HTML. The only way I've found to delete the cache is to remove the cache_expire parameter from render() and delete the cache folder. But, meh, it works.
What I want to be able to, however, is cache Lists, Tuples and Dictionaries. From reading the above wiki page, it seems this isn't possible?
I want to be able to do something like:
data = [i for i in range(0, 2000000)]
mycache = cache.get_cache('cachename')
value = mycache.get(key='dataset1', list=data, type='memory', expiretime='3600')
print value
Allowing me to do some CPU intensive work (list generation, in this example) and then cache it.
Can this be done with Pylons?
As alternative of traditional cache you can use app globals variables. Once on server startup load data to variable and then use data in you actions or direct in templates.
http://pylonsbook.com/en/1.1/exploring-pylons.html#app-globals-object
Also you can code some action to update this global variable through the admin interface or by other events.
Why not use memcached?
Look at this question on SO on how to use it with pylons: Pylons and Memcached

Dynamically Created Top Articles List in Django?

I'm creating a Django-powered site for my newspaper-ish site. The least obvious and common-sense task that I have come across in getting the site together is how best to generate a "top articles" list for the sidebar of the page.
The first thing that came to mind was some sort of database column that is updated (based on what?) with every view. That seems (to my instincts) ridiculously database intensive and impractical and thus I think I'd like to find another solution.
Thanks all.
I would give celery a try (with django-celery). While it's not so easy to configure and use as cache, it enables you to queue tasks like incrementing counters and do them in background. It could be even combined with cache technique - in views increment counters in cache and define PeriodicTask that will run every now and then, resetting counters and writing them to the database.
I just remembered - I once found this blog entry which provides nice way of incrementing 'viewed_count' (or similar) column in database with AJAX JS call. If you don't have heavy traffic maybe it's good idea?
Also mentioned in this post is django-tracking, but I don't know much about it, I never used it myself (yet).
Premature optimization, first try the db way and then see if it really is too database sensitive. Any decent database has so good caches it probably won't matter very much. And even if it is a problem, take a look at the other db/cache suggestions here.
It is most likely by the way is that you will have many more intensive db queries with each view than a simple view update.
If you do something like sort by top views, it would be fast if you index the view column in the DB. Another option is to only collect the top x articles every hour or so, and toss that value into Django's cache framework.
The nice thing about caching the list is that the algorithm you use to determine top articles can be as complex as you like without hitting the DB hard with every page view. Django's cache framework can use memory, db, or file system. I prefer DB, but many others prefer memory. I believe it uses pickle, so you can also store Python objects directly. It's easy to use, recommended.
An index wouldn't help as them main problem I believe is not so much getting the sorted list as having a DB write with every page view of an article. Another index actually makes that problem worse, albeit only a little.
So I'd go with the cache. I think django's cache shim is a problem here because it requires timeouts on all keys. I'm not sure if that's imposed by memcached, if not then go with redis. Actually just go with redis anyway, the python library is great, I've used it from django projects before, and it has atomic increments and powerful sorting - everything you need.

memcache entities without ReferenceProperty

I have a list of entities which I want to store in the memcache. The
problem is that I have large Models referenced by their
ReferenceProperty which are automatically also stored in the memcache.
As a result I'm exceeding the size limit for objects stored in
memcache.
Is there any possibility to prevent the ReferenceProperties from
loading the referenced Models while putting them in memcache?
I tried something like
def __getstate__(self):
odict = self.__dict__.copy()
odict['model'] = None
return odict
in the class I want to store in memcache, but that doesn't seem to do
the trick.
Any suggestions would be highly appreciated.
Edit: I verified by adding a logging-statement that the __getstate__-Method is executed.
For large entities, you might want to manually handle the loading of the related entities by storing the keys of the large entities as something other than a ReferenceProperty. That way you can choose when to load the large entity and when not to. Just use a long property store ids or a string property to store keynames.
odict = self.copy()
del odict.model
would probably be better than using dict (unless getstate needs to return dict - i'm not familiar with it). Not sure if this solves Your problem, though... You could implement del in Model to test if it's freed. For me it looks like You still hold a reference somewhere.
Also check out the pickle module - you would have to store everything under a single key, but it automaticly protects You from multiple references to the same object (stores it only once). Sorry no link, mobile client ;)
Good luck!

Caching data from other websites in Django

Suppose I have a simple view which needs to parse data from an external website.
Right now it looks something like this:
def index(request):
source = urllib2.urlopen(EXTERNAL_WEBSITE_URL)
bs = BeautifulSoup.BeautifulSoup(source.read())
finalList = [] # do whatever with bs to populate the list
return render_to_response('someTemplate.html', {'finalList': finalList})
First of all, is this an acceptable use?
Obviously, this is not good performance-wise. The external website page is pretty big, and I am only extracting a small part of it. I thought of two solutions:
Do all of this asynchronously. Load the rest of the page, populate with data once I get it. But I don't even know where to start. I'm just starting with Django and never done anything async up until now.
I don't care if this data is updated every 2-3 minutes, so caching is a good solution as well (also saves me the extra round-trips). How would I go about caching this data?
First, don't optimize prematurely. Get this to work.
Then, add enough logging to see what the performance problems (if any) really are.
You may find that end-user's PC is the slowest part; getting data from another site may, actually, be remarkably fast when you do not fetch .JS libraries and .CSS and artwork and the render then entire thing in a browser.
Once you're absolutely sure that the fetch of the remote content really IS a problem. Really. Then you have to do the following.
Write a "crontab" script that does the remote fetch form time to time.
Design a place to cache the remote results. Database or file system, pick one.
Update your Django app to get the data from the cache (database or filesystem) instead of the remote URL.
Only after you have absolute proof that the urllib2 read of the remote site is the bottleneck.
Caching with django is pretty easy,
from django.core.cache import cache
key = 'some-key'
data = cache.get(key)
if data is None:
# soupify the page and what not
cache.set(data, key, 60*60*8)
return render_to_response ...
return render_to_response
To answer your questions, you can do this asynchronously, but then you would have to use something like django cron to update the cache ever so often. On the other hand you can write this as a standalone python script, replace the cache imported from django with memcache and it would work the same way. It would reduce some of the performance issues your site could have, and as long as you know the cache key, you can retrieve the data from the cache.
Like Jarret said I would read django's caching docs and memcache's docs for more information.
Django has robust, built-in support for caching views: http://docs.djangoproject.com/en/dev/topics/cache/#topics-cache.
It offers solutions for caching entire views (such as in your case), or just certain parts of data in the view. There are even controls for how often to update the cache, and so forth.

Categories

Resources