OpenERP cache features - python

I want to cache some results in my OpenERP module, so I dug around a bit and found the cache decorator. Unfortunately, the most documentation I've been able to find is in the class declaration:
Use it as a decorator of the function you plan to cache Timeout: 0 = no timeout, otherwise in seconds
Can anybody recommend a good example of how to use this? Are there known problems to avoid?

After digging around some more, the simplest example I've found is the ir_model_data._get_id() method:
#tools.cache()
def _get_id(self, cr, uid, module, xml_id):
ids = self.search(cr, uid, [('module','=',module),('name','=', xml_id)])
if not ids:
raise ValueError('No references to %s.%s' % (module, xml_id))
# the sql constraints ensure us we have only one result
return ids[0]
It seems like you just choose a model method you want to cache and then add the cache as a decorator. If some events should clear the cache like this update() method, you use the cached method as a cache object:
if not result3:
self._get_id.clear_cache(cr.dbname, uid, module, xml_id)
It looks like by default, the first two parameters of the method are ignored when caching (cursor and user id in most cases).
This is all just based on skimming the code. I'd love to hear some feedback from anyone who's actually used it.

The cache is currently more usable since it is LRU and not an infinite cache anymore.
http://bazaar.launchpad.net/~openerp/openobject-server/5.0/revision/2151
It looks like by default, the first
two parameters of the method are
ignored when caching (cursor and user
id in most cases).
this can be modified by passing skiparg parameter
the arguments being skipped are the implicitly passed self and cursor. The userid is used in caching when the skiparg is 2.

Related

Pythonic way to parse command line output into a container object

Please read this whole question before answering, as it's not what you think... I'm looking at creating python object wrappers that represent hardware devices on a system (trimmed example below).
class TPM(object):
#property
def attr1(self):
"""
Protects value from being accidentally modified after
constructor is called.
"""
return self._attr1
def __init__(self, attr1, ...):
self._attr1 = attr1
...
#classmethod
def scan(cls):
"""Calls Popen, parses to dict, and passes **dict to constructor"""
Most of the constructor inputs involve running command line outputs in subprocess.Popen and then parsing the output to fill in object attributes. I've come up with a few ways to handle these, but I'm unsatisfied with what I've put together just far and am trying to find a better solution. Here are the common catches that I've found. (Quick note: tool versions are tightly controlled, so parsed outputs don't change unexpectedly.)
Many tools produce variant outputs, sometimes including fields and sometimes not. This means that if you assemble a dict to be wrapped in a container object, the constructor is more or less forced to take **kwargs and not really have defined fields. I don't like this because it makes static analysis via pylint, etc less than useful. I'd prefer a defined interface so that sphinx documentation is clearer and errors can be more reliably detected.
In lieu of **kwargs, I've also tried setting default args to None for many of the fields, with what ends up as pretty ugly results. One thing I dislike strongly about this option is that optional fields don't always come at the end of the command line tool output. This makes it a little mind-bending to look at the constructor and match it up to tool output.
I'd greatly prefer to avoid constructing a dictionary in the first place, but using setattr to create attributes will make pylint unable to detect the _attr1, etc... and create warnings. Any ideas here are welcome...
Basically, I am looking for the proper Pythonic way to do this. My requirements, for a re-summary are the following:
Command line tool output parsed into a container object.
Container object protects attributes via properties post-construction.
Varying number of inputs to constructor, with working static analysis and error detection for missing required fields during runtime.
Is there a good way of doing this (hopefully without a ton of boilerplate code) in Python? If so, what is it?
EDIT:
Per some of the clarification requests, we can take a look at the tpm_version command. Here's the output for my laptop, but for this TPM it doesn't include every possible attribute. Sometimes, the command will return extra attributes that I also want to capture. This makes parsing to known attribute names on a container object fairly difficult.
TPM 1.2 Version Info:
Chip Version: 1.2.4.40
Spec Level: 2
Errata Revision: 3
TPM Vendor ID: IFX
Vendor Specific data: 04280077 0074706d 3631ffff ff
TPM Version: 01010000
Manufacturer Info: 49465800
Example code (ignore lack of sanity checks, please. trimmed for brevity):
def __init__(self, chip_version, spec_level, errata_revision,
tpm_vendor_id, vendor_specific_data, tpm_version,
manufacturer_info):
self._chip_version = chip_version
...
#classmethod
def scan(cls):
tpm_proc = Popen("/usr/sbin/tpm_version")
stdout, stderr = Popen.communicate()
tpm_dict = dict()
for line in tpm_proc.stdout.splitlines():
if "Version Info:" in line:
pass
else:
split_line = line.split(":")
attribute_name = (
split_line[0].strip().replace(' ', '_').lower())
tpm_dict[attribute_name] = split_line[1].strip()
return cls(**tpm_dict)
The problem here is that this (or a different one that I may not be able to review the source of to get every possible field) could add extra things that cause my parser to work, but my object to not capture the fields. That's what I'm really trying to solve in an elegant way.
I've been working on a more solid answer to this the last few months, as I basically work on hardware support libraries and have finally come up with a satisfactory (though pretty verbose) answer.
Parse the tool outputs, whatever they look like, into objects structures that match up to how the tool views the device. These can have very generic dict structures, but should be broken out as much as possible.
Create another container class on top of that that which uses attributes to access items in the tool-container-objects. This enforces an API and can return sane errors across multiple versions of the tool, and across differing tool outputs!

Idiomatic/fast Django ORM check for existence on mysql/postgres

If I want to check for the existence and if possible retrieve an object, which of the following methods is faster? More idiomatic? And why? If not either of the two examples I list, how else would one go about doing this?
if Object.objects.get(**kwargs).exists():
my_object = Object.objects.get(**kwargs)
my_object = Object.objects.filter(**kwargs)
if my_object:
my_object = my_object[0]
If relevant, I care about mysql and postgres for this.
Why not do this in a try/except block to avoid the multiple queries / query then an if?
try:
obj = Object.objects.get(**kwargs)
except Object.DoesNotExist:
pass
Just add your else logic under the except.
django provides a pretty good overview of exists
Using your first example it will do the query two times, according to the documentation:
if some_queryset has not yet been evaluated, but you
know that it will be at some point, then using some_queryset.exists()
will do more overall work (one query for the existence check plus an
extra one to later retrieve the results) than simply using
bool(some_queryset), which retrieves the results and then checks if
any were returned.
So if you're going to be using the object, after checking for existance, the docs suggest just using it and forcing evaluation 1 time using
if my_object:
pass

How do I use beaker caching in Pyramid?

I have the following in my ini file:
cache.regions = default_term, second, short_term, long_term
cache.type = memory
cache.second.expire = 1
cache.short_term.expire = 60
cache.default_term.expire = 300
cache.long_term.expire = 3600
And this in my __init__.py:
from pyramid_beaker import set_cache_regions_from_settings
set_cache_regions_from_settings(settings)
However, I'm not sure how to perform the actual caching in my views/handlers. Is there a decorator available? I figured there would be something in the response API but only cache_control is available - which instructs the user to cache the data. Not cache it server-side.
Any ideas?
My mistake was to call decorator function #cache_region on a view-callable. I got no error reports but there were no actual caching. So, in my views.py I was trying like:
#cache_region('long_term')
def photos_view(request):
#just an example of a costly call from Google Picasa
gd_client = gdata.photos.service.PhotosService()
photos = gd_client.GetFeed('...')
return {
'photos': photos.entry
}
No errors and no caching. Also your view-callable will start to require another parameter! But this works:
#make a separate function and cache it
#cache_region('long_term')
def get_photos():
gd_client = gdata.photos.service.PhotosService()
photos = gd_client.GetFeed('...')
return photos.entry
And then in view-callable just:
def photos_view(request):
return {
'photos': get_photos()
}
The same way it works for #cache.cache etc.
Summary: do not try to cache view-callables.
PS. I still have a slight suspiction that view callables can be cached :)
UPD.: As hlv later explains, when you cache a view-callabe, the cache actually is never hit, because #cache_region uses callable's request param as the cache id. And request is unique for every request.
btw.. the reason it didnt work for you when calling view_callable(request) is,
that the function parameters get pickled into a cache-key for later lookup in the cache.
since "self" and "request" change for every request, the return values ARE indeed cached, but can never be looked up again. instead your cache gets bloated with lots of useless keys.
i cache parts of my view-functions by defining a new function inside the view-callable
like
def view_callable(self, context, request):
#cache_region('long_term', 'some-unique-key-for-this-call_%s' % (request.params['some-specific-id']))
def func_to_cache():
# do something expensive with request.db for example
return something
return func_to_cache()
it SEEMS to work nicely so far..
cheers
You should use cache region:
from beaker.cache import cache_region
#cache_region('default_term')
def your_func():
...
A hint for those using #cache_region on functions but not having their results cached - make sure the parameters of the function are scalar.
Example A (doesn't cache):
#cache_region('hour')
def get_addresses(person):
return Session.query(Address).filter(Address.person_id == person.id).all()
get_addresses(Session.query(Person).first())
Example B (does cache):
#cache_region('hour')
def get_addresses(person):
return Session.query(Address).filter(Address.person_id == person).all()
get_addresses(Session.query(Person).first().id)
The reason is that the function parameters are used as the cache key - something like get_addresses_123. If an object is passed this key can't be made.
Same problem here, you can perform caching using default parameters with
from beaker.cache import CacheManager
and then decorators like
#cache.cache('get_my_profile', expire=60)
lik in http://beaker.groovie.org/caching.html, but I can't find the solution how to make it work with pyramid .ini configuration.

Duplicate an AppEngine Query object to create variations of a filter without affecting the base query

In my AppEngine project I have a need to use a certain filter as a base then apply various different extra filters to the end, retrieving the different result sets separately. e.g.:
base_query = MyModel.all().filter('mainfilter', 123)
Then I need to use the results of various sub queries separately:
subquery1 = basequery.filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = basequery.filter('subfilter2', 'abc')
#Do something with subquery2 results here
Unfortunately 'filter()' affects the state of the basequery Query instance, rather than just returning a modified version. Is there any way to duplicate the Query object and use it as a base? Is there perhaps a standard Python way of duping an object that could be used?
The extra filters are actually applied by the results of different forms dynamically within a wizard, and they use the 'running total' of the query in their branch to assess whether to ask further questions.
Obviously I could pass around a rudimentary stack of filter criteria, but I'd rather use the Query itself if possible, as it adds simplicity and elegance to the solution.
There's no officially approved (Eg, not likely to break) way to do this. Simply creating the query afresh from the parameters when you need it is your best option.
As Nick has said, you better create the query again, but you can still avoid repeating yourself. A good way to do that would be like this:
#inside a request handler
def create_base_query():
return MyModel.all().filter('mainfilter', 123)
subquery1 = create_base_query().filter('subfilter1', 'xyz')
#Do something with subquery1 results here
subquery2 = create_base_query().filter('subfilter2', 'abc')
#Do something with subquery2 results here

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers
To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)
With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.
It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.
Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.
The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Categories

Resources