struggling with memcache on google app engine python

struggling with memcache on google app engine python - python

I've been struggling to get memcache working on my app for a bit now. I thought I had finally got it working where it never reads from the database (unless memcache data is lost of course), only to have my site shut down because of a over-qota number of datastore reads! I'm currently using a free appspot and would like to keep it that way for as long as possible. Anyway, here's my code, maybe somebody can help me find the hole in it.
I am currently trying to implement memcache by overriding the db.Model.all(), delete(), and put() methods to query memcache first. I have memcache set up where each object in the datastore has it's own memcache value with it's id as the key. Then for each Model class I have a list of the id's under a key it knows how to query. I hope I explained this clear enough.
""" models.py """
#classmethod
def all(cls, order="sent"):
result = get_all("messages", Message)
if not result or memcache.get("updatemessages"):
result = list(super(Message, cls).all())
set_all("messages", result)
memcache.set("updatemessages", False)
logging.info("DB Query for messages")
result.sort(key=lambda x: getattr(x, order), reverse=True)
return result
#classmethod
def delete(cls, message):
del_from("messages", message)
super(Message, cls).delete(message)
def put(self):
super(Message, self).put()
add_to_all("messages", self)
""" helpers.py """
def get_all(type, Class):
all = []
ids = memcache.get(type+"allid")
query_amount = 0
if ids:
for id in ids:
ob = memcache.get(str(id))
if ob is None:
ob = Class.get_by_id(int(id))
if ob is None:
continue
memcache.set(str(id), ob)
query_amount += 1
all.append(ob)
if query_amount: logging.info(str(query_amount) + " ob queries")
return all
return None
def add_to_all(type, object):
memcache.set(str(object.key().id()), object)
all = memcache.get(type+"allid")
if not all:
all = [str(ob.key().id()) for ob in object.__class__.all()]
logging.info("DB query for %s" % type)
assert all is not None, "query returned None. Send this error code to ____: 2 3-193A"
if not str(object.key().id()) in all:
all.append(str(object.key().id()))
memcache.set(type+"allid", all)
#log_on_fail
def set_all(type, objects):
assert type in ["users", "messages", "items"], "set_all was not passed a valid type. Send this error code to ____: 33-205"
assert not objects is None, "set_all was passed None as the list of objects. Send this error code to _________: 33-206"
all = []
for ob in objects:
error = not memcache.set(str(ob.key().id()), ob)
if error:
logging.warning("keys not setting properly. Object must not be pickleable")
all.append(str(ob.key().id()))
memcache.set(type+"allid", all)
#log_on_fail
def del_from(type, object):
all = memcache.get(type+"allid")
if not all:
all = object.__class__.all()
logging.info("DB query %s" % type)
assert all, "Could not find any objects. Send this error code to _____: 13- 219"
assert str(object.key().id()) in all, "item not found in cache. Send this error code to ________: 33-220"
del all[ all.index(str(object.key().id())) ]
memcache.set(type+"allid", all)
memcache.delete(str(object.key().id()))
I apologize for all of the clutter and lack of elegance. Hopefully somebody will be able to help. I've thought about switching to ndb but for now I rather stick to my custom cache. You'll notice the logging.info("some-number of ob queries"). I get this log quite often. Maybe once or twice every half hour. Does memcache really lose data that often or is something wrong with my code?

Simple solution: switch to NDB.
NDB models will store values in memcache and in an instance cache (which is 100% free) and these models will also invalidate the cache for you when you update/delete your objects. Retrieval will first try to get from the instance cache, then if that fails from memcache, and finally from the datastore, and it will set the value in any of the caches missed on the way up.

App Engine memcache removes objects by an optimized eviction algorithm, thus having this log message with the frequency you described results in two possible explanations.
Either these data are not accessed very often or the amount of data you have in your memcache is pretty large, and thus some of it is removed from time to time.
I would also propose to move to ndb which handles the use of memcache and instance cache quite efficiently.
Hope this helps!

Related

Can a #classmethod modify the record it creates in GAE?

Updates to an entity within the #classmethod that created it are not reliably persisted in the datastore.
My create method is below. The parameter is the object to be persisted.
#classmethod
def create(cls, obj):
"""Factory method for a new system entity using an System instance. Returns a System object (representation) including the meta_key."""
if isinstance(obj, System):
pass
else:
raise Exception('Object is not of type System.')
#check for duplicates
q = dbSystem.all(keys_only=True)
q.filter('meta_guid = ', obj.meta_guid)
if q.get(): #match exists already
raise Exception('dbSystem with this meta_guid already exists. Cannot create.')
# store stub so we can get the key
act = cls(
meta_status = obj.meta_status,
meta_type = obj.meta_type,
meta_guid = obj.meta_guid,
json = None,
lastupdated=datetime.datetime.now())
act.put()
# get the key for the datastore entity and add it to the representation
newkey = str(act.key())
# update our representation
obj.meta_key = newkey
# store the representation
act.json = jsonpickle.encode(obj)
act.put()
return(obj) #return the representation
My unittest tests confirm the returned object has a meta_key, and that the json for the associated entity is not none:
self.assertIsNotNone(systemmodel.dbSystem().get(s.meta_key).json) #json is not empty
However, when running my app on the development server, I find that the json field is intermittently NULL when this entity is retrieved later.
I have spent some time researching the datastore model, trying to find something that could explain the inconsistenent results, with no luck. Two key sources are the model class and a really good overview of the App Engine datastore I found on Google code.
Can anyone confirm whether updates to an entity within the #classmethod that created it should be considered reliable? Is there a better way to persist a representation of an object?

The problem is likely this line:
q = dbSystem.all(keys_only=True)
You haven't said what dbSystem is but if it does an app engine query, then you are not guaranteed to get the most recent version of an object, and you could get an older version.
Instead, you should get the object by its key, which will guarantee that you get the most recent version. Something like this:
q = dbSystem.get(obj.key())
Check out the app engine docs for getting an object by its key.

Delete Data store entries synchronously in Google App Engine

I use python in GAP and try to delete one entries in datastore by using db.delete(model_obj). I suppose this operation is undertaken synchronously, since the document tell the difference between delete() and delete_async(), but when I read the source code in the db, the delete method just simply call the delete_async, which is not match what the document says :(
So is there any one to do delete in synchronous flow?
Here is the source code in db:
def delete_async(models, **kwargs):
"""Asynchronous version of delete one or more Model instances.
Identical to db.delete() except returns an asynchronous object. Call
get_result() on the return value to block on the call.
"""
if isinstance(models, (basestring, Model, Key)):
models = [models]
else:
try:
models = iter(models)
except TypeError:
models = [models]
keys = [_coerce_to_key(v) for v in models]
return datastore.DeleteAsync(keys, **kwargs)
def delete(models, **kwargs):
"""Delete one or more Model instances.
"""
delete_async(models, **kwargs).get_result()
EDIT: From a comment, this is the original misbehaving code:
def tearDown(self):
print self.account
db.delete(self.device)
db.delete(self.account)
print Account.get_by_email(self.email, case_sensitive=False)
The result for two print statement is <Account object at 0x10d1827d0> <Account object at 0x10d1825d0>. Even two memory addresses are different but they point to the same object. If I put some latency after the delete like for loop, the object fetched is None.

The code you show for delete calls delete_async, yes, but then it calls get_result on the returned asynchronous handle, which will block until the delete actually occurs. So, delete is synchronous.
The reason the sample code you show is returning an object is that you're probably running a query to fetch the account; I presume the email is not the db.Key of the account? Normal queries are not guaranteed to return updated results immediately. To avoid seeing stale data, you either need to use an ancestor query or look up the entity by key, both of which are strongly consistent.

How to retrieve properties only once from database in django

I have some relationships in my database that I describe like that:
#property
def translations(self):
"""
:return: QuerySet
"""
if not hasattr(self, '_translations'):
self._translations = ClientTranslation.objects.filter(base=self)
return self._translations
The idea behind the hasattr() and self._translation is to have the db only hit one time, while the second time the stored property is returned.
However, after reading, the docs, I'm not sure if the code is doing that - as queries are only hitting the db when the values are really needed - which comes after my code.
How would a correct approach look like?

Yes, DB is hit the first time someone needs the value. But as you pointed out, you save the query, not the results. Wrap the query with list(...) to save the results.
By the way, you can use the cached_property decorator to make it more elegant. It is not a built-in, though. It can be found here. You end up with:
#cached_property
def translations(self):
return list(ClientTranslation.objects.filter(base=self))

Is it possible to determine with NDB if model is persistent in the datastore or not?

I am in process of migration from db.Model to ndb.Model. The only issue that I have to solve before finish this migration is that there is no Model.is_saved method. I have used db.Model.is_saved in my application to determine if sharded counters must be updated on put/delete, to check for conflicted keys on creating entities etc.
The documentation says that ndb.Model has no equivalent for is_saved method. I can reimplement some use cases with get_or_insert instead of is_saved. But not all of them.
As a dirty hack I can set flag like _in_memory_instance for every instance I have created by calling constructor. But it does not solve my issue. I still have to update this flag at least after every put() call.
The question is: is there better way to determine if model is persistent in the datastore or not without extra datastore hit?
Edit 1: Forgot to mention: all the entities got keys so check for Model._has_complete_key() does not work for me.
Edit 2: After this discussion https://groups.google.com/d/topic/google-appengine/Tm8NDWIvc70/discussion it seems to be the only way to solve my issue is to use _post_get_hook/_post_put_hook. I wondering why such a trivial thing was not included in official API.
Edit 3: I ended up with next base class for all my models. Now I can leave my codebase (almost) untouched:
class BaseModel(ndb.Model):
#classmethod
def _post_get_hook(cls, key, future):
self = future.get_result()
if self:
self._is_saved = bool(key)
def _post_put_hook(self, future):
self._is_saved = future.state == future.FINISHING
def is_saved(self):
if self._has_complete_key():
return getattr(self, "_is_saved", False)
return False

To get the same kind of state in NDB you would need a combination of
post-get-hook and post-put-hook to set a flag. Here's a working
example:
class Employee(ndb.Model):
<properties here>
saved = False # class variable provides default value
#classmethod
def _post_get_hook(cls, key, future):
obj = future.get_result()
if obj is not None:
# test needed because post_get_hook is called even if get() fails!
obj.saved = True
def _post_put_hook(self, future):
self.saved = True
There's no need to check for the status of the future -- when either
hook is called, the future always has a result. This is because the
hook is actually a callback on the future. However there is a need to
check if its result is None!
PS: Inside a transaction, the hooks get called as soon as the put() call returns; success or failure of the transaction doesn't enter affect them. See https://developers.google.com/appengine/docs/python/ndb/contextclass#Context_call_on_commit for a way to run a hook after a successful commit.

Based on #Tim Hoffmans idea you can you a post hook like so:
class Article(ndb.Model):
title = ndb.StringProperty()
is_saved = False
def _post_put_hook(self, f):
if f.state == f.FINISHING:
self.is_saved = True
else:
self.is_saved = False
article = Article()
print article.is_saved ## False
article.put()
print article.is_saved ## True
I can't guarantee that it's persisted in the datastore. Didn't find anything about it on google :)
On a side not, looking to see if a ndb.Model instance has a key won't probably work since a new instance seems to get a Key before it's ever sent to the datastore. You can look at the source code to see what happens when you create an instance of the ndb.Model class.

If you do not mention a key while creating an instance of the Model, you can use the following implementation of is_saved() to know if the object has been written to the datastore or not atleast once. (should be appropriate if you are migrating from google.appengine.ext.db to google.appengine.ext.ndb)
Using the example given by #fredrik,
class Article(ndb.Model):
title = ndb.StringProperty()
def is_saved(self):
if self.key:
return True
return False
P.S. - I do not know if this would work with google.cloud.ndb

Pyramid resource: In plain English

I've been reading on the ways to implement authorization (and authentication) to my newly created Pyramid application. I keep bumping into the concept called "Resource". I am using python-couchdb in my application and not using RDBMS at all, hence no SQLAlchemy. If I create a Product object like so:
class Product(mapping.Document):
item = mapping.TextField()
name = mapping.TextField()
sizes = mapping.ListField()
Can someone please tell me if this is also called the resource? I've been reading the entire documentation of Pyramids, but no where does it explain the term resource in plain simple english (maybe I'm just stupid). If this is the resource, does this mean I just stick my ACL stuff in here like so:
class Product(mapping.Document):
__acl__ = [(Allow, AUTHENTICATED, 'view')]
item = mapping.TextField()
name = mapping.TextField()
sizes = mapping.ListField()
def __getitem__(self, key):
return <something>
If I were to also use Traversal, does this mean I add the getitem function in my python-couchdb Product class/resource?
Sorry, it's just really confusing with all the new terms (I came from Pylons 0.9.7).
Thanks in advance.

I think the piece you are missing is the traversal part. Is Product
the resource? Well it depends on what your traversal produces, it
could produce products.....
Perhaps it might be best to walk this through from the view back to
how it gets configured when the application is created...
Here's a typical view.
#view_config(context=Product, permission="view")
def view_product(context, request):
pass # would do stuff
So this view gets called when context is an instance of Product. AND
if the acl attribute of that instance has the "view"
permission. So how would an instance of Product become context?
This is where the magic of traversal comes in. The very logic of
traversal is simply a dictionary of dictionaries. So one way that this
could work for you is if you had a url like
/product/1
Somehow, some resource needs to be traversed by the segments of the
url to determine a context so that a view can be determined. What if
we had something like...
class ProductContainer(object):
"""
container = ProductContainer()
container[1]
>>> <Product(1)>
"""
def __init__(self, request, name="product", parent=None):
self.__name__ = name
self.__parent__ = parent
self._request = request
def __getitem__(self, key):
p = db.get_product(id=key)
if not p:
raise KeyError(key)
else:
p.__acl__ = [(Allow, Everyone,"view")]
p.__name__ = key
p.__parent__ = self
return p
Now this is covered in the documentation and I'm attempting to boil it
down to the basics you need to know. The ProductContainer is an object
that behaves like a dictionary. The "name" and "parent"
attributes are required by pyramid in order for the url generation
methods to work right.
So now we have a resource that can be traversed. How do we tell
pyramid to traverse ProductContainer? We do that through the
Configurator object.
config = Configurator()
config.add_route(name="product",
path="/product/*traverse",
factory=ProductContainer)
config.scan()
application = config.make_wsgi_app()
The factory parameter expects a callable and it hands it the current
request. It just so happens that ProductContainer.init will do
that just fine.
This might seem a little much for such a simple example, but hopefully
you can imagine the possibilities. This pattern allows for very
granular permission models.
If you don't want/need a very granular permission model such as row
level acl's you probably don't need traversal, instead you can use
routes with a single root factory.
class RootFactory(object):
def __init__(self, request):
self._request = request
self.__acl__ = [(Allow, Everyone, "view")] # todo: add more acls
#view_config(permission="view", route_name="orders")
def view_product(context, request):
order_id, product_id = request.matchdict["order_id"], request.matchdict["product_id"]
pass # do what you need to with the input, the security check already happened
config = Configurator(root_factory=RootFactory)
config.add_route(name="orders",
path="/order/{order_id}/products/{product_id}")
config.scan()
application = config.make_wsgi_app()
note: I did the code example from memory, obviously you need all the necessary imports etc. in other words this isn't going to work as a copy/paste

Have you worked through http://michael.merickel.org/projects/pyramid_auth_demo/ ? If not, I suspect it may help. The last section http://michael.merickel.org/projects/pyramid_auth_demo/object_security.html implements the pattern you're after (note the example "model" classes inherit from nothing more complex than object).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

struggling with memcache on google app engine python - python

Related

Can a #classmethod modify the record it creates in GAE?

Delete Data store entries synchronously in Google App Engine

How to retrieve properties only once from database in django

Is it possible to determine with NDB if model is persistent in the datastore or not?

Pyramid resource: In plain English

Categories

Resources