How do I use beaker caching in Pyramid? - python

I have the following in my ini file:
cache.regions = default_term, second, short_term, long_term
cache.type = memory
cache.second.expire = 1
cache.short_term.expire = 60
cache.default_term.expire = 300
cache.long_term.expire = 3600
And this in my __init__.py:
from pyramid_beaker import set_cache_regions_from_settings
set_cache_regions_from_settings(settings)
However, I'm not sure how to perform the actual caching in my views/handlers. Is there a decorator available? I figured there would be something in the response API but only cache_control is available - which instructs the user to cache the data. Not cache it server-side.
Any ideas?

My mistake was to call decorator function #cache_region on a view-callable. I got no error reports but there were no actual caching. So, in my views.py I was trying like:
#cache_region('long_term')
def photos_view(request):
#just an example of a costly call from Google Picasa
gd_client = gdata.photos.service.PhotosService()
photos = gd_client.GetFeed('...')
return {
'photos': photos.entry
}
No errors and no caching. Also your view-callable will start to require another parameter! But this works:
#make a separate function and cache it
#cache_region('long_term')
def get_photos():
gd_client = gdata.photos.service.PhotosService()
photos = gd_client.GetFeed('...')
return photos.entry
And then in view-callable just:
def photos_view(request):
return {
'photos': get_photos()
}
The same way it works for #cache.cache etc.
Summary: do not try to cache view-callables.
PS. I still have a slight suspiction that view callables can be cached :)
UPD.: As hlv later explains, when you cache a view-callabe, the cache actually is never hit, because #cache_region uses callable's request param as the cache id. And request is unique for every request.

btw.. the reason it didnt work for you when calling view_callable(request) is,
that the function parameters get pickled into a cache-key for later lookup in the cache.
since "self" and "request" change for every request, the return values ARE indeed cached, but can never be looked up again. instead your cache gets bloated with lots of useless keys.
i cache parts of my view-functions by defining a new function inside the view-callable
like
def view_callable(self, context, request):
#cache_region('long_term', 'some-unique-key-for-this-call_%s' % (request.params['some-specific-id']))
def func_to_cache():
# do something expensive with request.db for example
return something
return func_to_cache()
it SEEMS to work nicely so far..
cheers

You should use cache region:
from beaker.cache import cache_region
#cache_region('default_term')
def your_func():
...

A hint for those using #cache_region on functions but not having their results cached - make sure the parameters of the function are scalar.
Example A (doesn't cache):
#cache_region('hour')
def get_addresses(person):
return Session.query(Address).filter(Address.person_id == person.id).all()
get_addresses(Session.query(Person).first())
Example B (does cache):
#cache_region('hour')
def get_addresses(person):
return Session.query(Address).filter(Address.person_id == person).all()
get_addresses(Session.query(Person).first().id)
The reason is that the function parameters are used as the cache key - something like get_addresses_123. If an object is passed this key can't be made.

Same problem here, you can perform caching using default parameters with
from beaker.cache import CacheManager
and then decorators like
#cache.cache('get_my_profile', expire=60)
lik in http://beaker.groovie.org/caching.html, but I can't find the solution how to make it work with pyramid .ini configuration.

Related

caching.memoize & response_filter for internal server errors

I am using flask_caching to cache responses of my flask API. I am using the decorator on my routes like this
import random
class Status(Resource):
#cache.memoize(timeout=60) # cache for a minute
def post(self):
return random.randint(0, 5)
which will return the same random number for a minute. However, what if the random function (read: "any functionality inside the route") breaks, and the route returns a 500 internal server error? As far as I know, flask_caching would be caching this, and return the bad response for all further calls within a minute, which is not what I want.
I read into this and found the response_filter parameter, which can be added to the decorator easily, seemingly specifically to prevent this from happening ("Useful to prevent caching of code 500 responses.", from the docs:
https://flask-caching.readthedocs.io/en/latest/api.html?highlight=response_filter#flask_caching.Cache.memoize)
#cache.memoize(timeout=60, response_filter=callable_check_for_500(???))
However, I am unable to find an example of this use case. It says "If the callable returns False, the content will not be cached." - how do I implement this callable to check if the status code is 500? Any links or ideas appreciated
I figured out "a solution", but I'm not entirely happy with it
basically, the check_500() function gets the argument resp by default, however its not the full response object, and unfortunately lacks the status_code attribute, like I expected.
the status code itself is in the data, and I'm just looking at the last entry of the response, which is all the data returned. In my case its just the returned json as [0], and the status_code at [-1].
implementation is currently as follows:
#cache.memoize(timeout=60, response_filter=check_500) # cache for a minute
with the callable check_500 function defined above
def check_500(resp):
if resp[-1] == 500:
return False
else:
return True
This works pretty much like above_c_level suggested in the comment, so thank you very much, however I would advise to look at the last index of the response instead of checking if 500 is in the response data at all. It still seems a bit wonky, if anyone has a more elaborate idea, feel free to post another answer.

django lambda and django-activity-stream

I am not familiar with the lambda function itself, and don't really know how to debug this issue.
Django-1.1.2
I am using, django-activity-stream in order to render activity streams for my users.
In the documentation for this items it says that you need to pass two lambda functions to incorporate with existing newtworks, such as django-friends(the one I am using)
Here are the functions that need to be pasted into your settings.py file.
ACTIVITY_GET_PEOPLE_I_FOLLOW = lambda user: get_people_i_follow(user)
ACTIVITY_GET_MY_FOLLOWERS = lambda user: get_my_followers(user)
I have done so, but now everytime I try and render the page that makes use of this, I get the following traceback.
Caught NameError while rendering: global name 'get_people_i_follow' is not defined
Although this has been set in my settings...
Your help is much appreciated!
Somewhere above where these lambdas are defined, you need to import the names get_people_i_follow and get_my_followers. I'm not familiar with django-activity-stream, but it's probably something like from activity_stream import get_people_i_follow, get_my_follwers.
Lambda is a keyword for creating anonymous functions on the fly, so the meaning of your code is basically the same as if you had written the following.
def ACTIVITY_GET_PEOPLE_I_FOLLOW(user):
return get_people_i_follow(user)
def ACTIVITY_GET_MY_FOLLOWERS(user):
return get_my_followers(user)
You need to make sure that the functions get_people_i_follow and get_my_followers are imported into your settings files.
e.g.:
from activity_stream.models import get_people_i_follow, get_my_followers
Lambda is just a shorthand for defining a function so:
ACTIVITY_GET_PEOPLE_I_FOLLOW = lambda user: get_people_i_follow(user)
Is equivalent to:
def activity_get_people_i_follow(user):
return get_people_i_follow(user)
ACTIVITY_GET_PEOPLE_I_FOLLOW = activity_get_people_i_follow
Which upon reflection means you don't gain a lot in this case. However if you needed to avoid importing those function too early in your settings file (i.e. due to circular import) then you could do:
def activity_get_people_i_follow(user):
from activity_stream.models import get_people_i_follow
return get_people_i_follow(user)
ACTIVITY_GET_PEOPLE_I_FOLLOW = activity_get_people_i_follow
and just import the activity stream function as you need it.
UPDATE: looks like defining these settings is a red-herring:
https://github.com/philippWassibauer/django-activity-stream/blob/master/activity_stream/models.py#L133
As you can see these settings are only needed if you are not using the default activity streams. So simply remove them from your settings file.
The seg-fault is probably due to an infinite recursion occurring, as get_people_i_follow calls whatever function is defined by ACTIVITY_GET_PEOPLE_I_FOLLOW, which in this case calls get_people_i_follow again...
If you are integrating with a pre-existing network I don't believe you're actually supposed to write verbatim:
ACTIVITY_GET_PEOPLE_I_FOLLOW = lambda user: get_people_i_follow(user)
ACTIVITY_GET_MY_FOLLOWERS = lambda user: get_my_followers(user)
I believe the author was just showing an example that ACTIVITY_GET_PEOPLE_I_FOLLOW and
ACTIVITY_GET_MY_FOLLOWERS need to be set to a lambda or function that accepts one user argument and returns a list of users. You should probably be looking for something like friends_for_user in django-friends, or writing your own functions to implement this functionality.
get_people_i_follow is indeed defined in activity_stream.models but it is just importing what's defined in settings.py. So if settings.py has ACTIVITY_GET_PEOPLE_I_FOLLOW = lambda user: get_people_i_follow(user) you're going to get a wild and crazy circular import / infinite recursion.

OpenERP cache features

I want to cache some results in my OpenERP module, so I dug around a bit and found the cache decorator. Unfortunately, the most documentation I've been able to find is in the class declaration:
Use it as a decorator of the function you plan to cache Timeout: 0 = no timeout, otherwise in seconds
Can anybody recommend a good example of how to use this? Are there known problems to avoid?
After digging around some more, the simplest example I've found is the ir_model_data._get_id() method:
#tools.cache()
def _get_id(self, cr, uid, module, xml_id):
ids = self.search(cr, uid, [('module','=',module),('name','=', xml_id)])
if not ids:
raise ValueError('No references to %s.%s' % (module, xml_id))
# the sql constraints ensure us we have only one result
return ids[0]
It seems like you just choose a model method you want to cache and then add the cache as a decorator. If some events should clear the cache like this update() method, you use the cached method as a cache object:
if not result3:
self._get_id.clear_cache(cr.dbname, uid, module, xml_id)
It looks like by default, the first two parameters of the method are ignored when caching (cursor and user id in most cases).
This is all just based on skimming the code. I'd love to hear some feedback from anyone who's actually used it.
The cache is currently more usable since it is LRU and not an infinite cache anymore.
http://bazaar.launchpad.net/~openerp/openobject-server/5.0/revision/2151
It looks like by default, the first
two parameters of the method are
ignored when caching (cursor and user
id in most cases).
this can be modified by passing skiparg parameter
the arguments being skipped are the implicitly passed self and cursor. The userid is used in caching when the skiparg is 2.

Hashing a python function to regenerate output when the function is modified

I have a python function that has a deterministic result. It takes a long time to run and generates a large output:
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result
I modify time_consuming_function from time to time, but I would like to avoid having it run again while it's unchanged. [time_consuming_function only depends on functions that are immutable for the purposes considered here; i.e. it might have functions from Python libraries but not from other pieces of my code that I'd change.] The solution that suggests itself to me is to cache the output and also cache some "hash" of the function. If the hash changes, the function will have been modified, and we have to re-generate the output.
Is this possible or ridiculous?
Updated: based on the answers, it looks like what I want to do is to "memoize" time_consuming_function, except instead of (or in addition to) arguments passed into an invariant function, I want to account for a function that itself will change.
If I understand your problem, I think I'd tackle it like this. It's a touch evil, but I think it's more reliable and on-point than the other solutions I see here.
import inspect
import functools
import json
def memoize_zeroadic_function_to_disk(memo_filename):
def decorator(f):
try:
with open(memo_filename, 'r') as fp:
cache = json.load(fp)
except IOError:
# file doesn't exist yet
cache = {}
source = inspect.getsource(f)
#functools.wraps(f)
def wrapper():
if source not in cache:
cache[source] = f()
with open(memo_filename, 'w') as fp:
json.dump(cache, fp)
return cache[source]
return wrapper
return decorator
#memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result
Rather than putting the function in a string, I would put the function in its own file. Call it time_consuming.py, for example. It would look something like this:
def time_consuming_method():
# your existing method here
# Is the cached data older than this file?
if (not os.path.exists(data_file_name)
or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
data = time_consuming_method()
save_data(data_file_name, data)
else:
data = load_data(data_file_name)
# redefine method
def time_consuming_method():
return data
While testing the infrastructure for this to work, I'd comment out the slow parts. Make a simple function that just returns 0, get all of the save/load stuff working to your satisfaction, then put the slow bits back in.
The first part is memoization and serialization of your lookup table. That should be straightforward enough based on some python serialization library. The second part is that you want to delete your serialized lookup table when the source code changes. Perhaps this is being overthought into some fancy solution. Presumably when you change the code you check it in somewhere? Why not add a hook to your checkin routine that deletes your serialized table? Or if this is not research data and is in production, make it part of your release process that if the revision number of your file (put this function in it's own file) has changed, your release script deletes the serialzed lookup table.
So, here is a really neat trick using decorators:
def memoize(f):
cache={};
def result(*args):
if args not in cache:
cache[args]=f(*args);
return cache[args];
return result;
With the above, you can then use:
#memoize
def myfunc(x,y,z):
# Some really long running computation
When you invoke myfunc, you will actually be invoking the memoized version of it. Pretty neat, huh? Whenever you want to redefine your function, simply use "#memoize" again, or explicitly write:
myfunc = memoize(new_definition_for_myfunc);
Edit
I didn't realize that you wanted to cache between multiple runs. In that case, you can do the following:
import os;
import os.path;
import cPickle;
class MemoizedFunction(object):
def __init__(self,f):
self.function=f;
self.filename=str(hash(f))+".cache";
self.cache={};
if os.path.exists(self.filename):
with open(filename,'rb') as file:
self.cache=cPickle.load(file);
def __call__(self,*args):
if args not in self.cache:
self.cache[args]=self.function(*args);
return self.cache[args];
def __del__(self):
with open(self.filename,'wb') as file:
cPickle.dump(self.cache,file,cPickle.HIGHEST_PROTOCOL);
def memoize(f):
return MemoizedFunction(f);
What you describe is effectively memoization. Most common functions can be memoized by defining a decorator.
A (overly simplified) example:
def memoized(f):
cache={}
def memo(*args):
if args in cache:
return cache[args]
else:
ret=f(*args)
cache[args]=ret
return ret
return memo
#memoized
def time_consuming_method():
# lots_of_computing_time to come up with the_result
return the_result
Edit:
From Mike Graham's comment and the OP's update, it is now clear that values need to be cached over different runs of the program. This can be done by using some of of persistent storage for the cache (e.g. something as simple as using Pickle or a simple text file, or maybe using a full blown database, or anything in between). The choice of which method to use depends on what the OP needs. Several other answers already give some solutions to this, so I'm not going to repeat that here.

Django: Streaming dynamically generated XML output through an HttpResponse

recently I wanted to return through a Django view a dynamically generated XML tree. The module I use for XML manipulation is the usual cElementTree.
I think I tackled what I wanted by doing the following:
def view1(request):
resp = HttpResponse(g())
return resp
def g():
root = Element("ist")
list_stamp = SubElement(root, "list_timestamp")
list_creation = str(datetime.now())
for i in range(1,1000000):
root.text = str(i)
yield cET.tostring(root)
Is something like this a good idea ? Do I miss something ?
About middlewares "breaking" streaming:
CommonMiddleware will try to consume the whole iterator if you set USE_ETAGS = True in settings. But in modern Django (1.1) there's a better way to do conditional get than CommonMiddleware + ConditionalGetMiddleware -- condition decorator. Use that and your streaming will stream okay :-)
Another thing that will try to consume the iterator is GzipMiddleware. If you want to use it you can avoid gzipping your streaming responses by turning it into a decorator and applying to individual views instead of globally.
Does it work? If it doesn't work, what error does it throw?
If you're building a full-blown API for a django site, take a look at django-piston. It takes care of a lot of the busywork related to that.
http://bitbucket.org/jespern/django-piston/wiki/Home
Yes, it's perfectly legitimate to return an iterator in an HttpResponse. As you've discovered, that allows you to stream content to the client.
Yes. That's THE WAY you do it on Django.

Categories

Resources