I am struggling to find a good tutorial or best practices document for the use of memcache in app engine.
I'm pretty happy with it on the level presented in the docs. Get an object by ID, checking memcache first, but I'm unclear on things like:
If you cache a query, is there an accepted method for ensuring that the cache is cleared/updated when an object stored in that query is updated.
What are the effects of using ReferenceProperties ? If a cache a Foo object with a Bar reference. Is my foo.bar in memcache too and in need of clearing down if it gets updated from some other part of my application.
I don't expect answers to this here (unless you are feeling particularly generous!), but pointers to things I could read would be very gratefully received.
If you cache a query, is there an
accepted method for ensuring that the
cache is cleared/updated when an
object stored in that query is
updated.
Typically you wrap your reads with a conditional to retrieve the value from the main DB if it's not in the cache. Just wrap your updates as well to fill the cache whenever you write the data. That's if you need the results to be as up to date as possible - if you're not so bothered about it being out of date just set an expiry time that is low enough for the application to have to re-request the data from the main DB often enough.
One way to easily do this is by searching for any datastore put() call and set memcache after it. Next make sure to get data from memcache before attempting a datastore query.
Set memcache after writing to datastore:
data.put()
memcache.set(user_id, data)
Try getting data from memcache before doing a datastore query:
data = memcache.get(user_id)
if data is None:
data = Data.get_by_key_name(user_id)
memcache.set(user_id, data)
Using Memcache reduces app engine costs significantly. More details on how I optimized on app engine.
About reference properties, let's say you have MainModel and RefModel with a reference property 'ref' that points to a MainModel instance. Whenever you cal ref_model.ref it does a datastore get operation and retrieves the object from datastore. It does not interact with memcache in any way.
Related
I'm finally upgrading from db to ndb (it is a much bigger headache than I anticipated...).
I used a lot of ReferenceProperty and I've converted these to KeyProperty. Now, every place where I used a ReferenceProperty I need to add an explicit get because it was previously done for me automatically.
My question relates to whether I should restructure my code to make it more efficient. I have many methods that use a KeyProperty and I need to do an explicit get(). I'm wondering if I should change these methods so that I am passing the entity to them instead of using the KeyProperty and a get().
Is the automatic memcaching for ndb good enough that I don't need to restructure? Or should I restructure my code to avoid repeated gets of the same entity?
We are not looking at huge inefficiencies here. But for a single HTTP GET/POST I might be getting the same entity 3-5 times.
In your case the In-Context Cache will take over and save you the db calls:
The in-context cache is fast; this cache lives in memory. When an NDB function writes to the Datastore, it also writes to the in-context cache. When an NDB function reads an entity, it checks the in-context cache first. If the entity is found there, no Datastore interaction takes place.
Each request will get a new context.
I have a problem with SQL Alchemy - my app works as a constantly working python application.
I have function like this:
def myFunction(self, param1):
s = select([statsModel.c.STA_ID, statsModel.c.STA_DATE)])\
.select_from(statsModel)
statsResult = self.connection.execute(s).fetchall()
return {'result': statsResult, 'calculation': param1}
I think this is clear example - one result set is fetched from database, second is just passed as argument.
The problem is that when I change data in my database, this function still returns data like nothing was changed. When I change data in input parameter, returned parameter "calculation" has proper value.
When I restart the app server, situation comes back to normal - new data are fetched from MySQL.
I know that there were several questions about SQLAlchemy caching like:
How to disable caching correctly in Sqlalchemy orm session?
How to disable SQLAlchemy caching?
but how other can I call this situation? It seems SQLAlchemy keeps the data fetched before and does not perform new queries until application restart. How can I avoid such behavior?
Calling session.expire_all() will evict all database-loaded data from the session. Any access of object attributes subsequent emits a new SELECT statement and gets new data back. Please see http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#refreshing-expiring for background.
If you still see so-called "caching" after calling expire_all(), then you need to close out transactions as described in my answer linked above.
A few possibilities.
You are reusing your session improperly or at improper time. Best practice is to throw away your session after you commit, and get a new one at the last possible moment before you use it. The behavior that appears to be caching may in fact be due to a session lifetime being very long in your application.
Objects that survive longer than the session are not being merged into a subsequent session. The "metadata" may not be able to update their state if you do not merge them back in. This is more a concern for the ORM API of SQLAlchemy, which you do not appear so far to be using.
Your changes are not committed. You say they are so we'll assume this is not it, but if none of the other avenues explain it you may want to look again.
One general debugging tip: if you want to know exactly what SQLAlchemy is doing in the database, pass echo=True to the create_engine function. The engine will print all queries it runs.
Also check out this suggestion I made to someone else, who was using ORM and had transactionality problems, which resolved their issue without ever pinpointing it. Maybe it will help you.
You need to change transaction isolation level to READ_COMMITTED
http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#mysql-isolation-level
I'm new to app engine (and SO)...
I'm writing a twitter bot that replies to mentions. Wanting to remember the last tweet it replied to using since_id but im not sure of the best way to store that ID so the next time the page loads it can check it, reply to any since, and then overwrite and store the new ID.
Do I use memcache for this?
Cheers!
Unless you're fine with losing the data, don't use memcache for data storage; it should only be used as a cache. The reason for this is that although you can set how long it should keep this data, it's documented that it can delete it at any time.
Create a datastore entity to hold the object.
example:
class LastTweet(db.Model):
since_id = db.IntegerProperty()
Okay, I have watched the video and read the articles in the App Engine documentation (including Using the High Replication Datastore). However I am still completely confused on the practical usage of it. I understand the benefits (from the video) and they sound great. But what I am lacking is a few practical examples. There are plenty of master/slave examples on the web, but very little illustrating (with proper documentation) the high replication datastore. The guestbook code example used in the Using the High Replication Datastore article illustrates the ancestor key by adding a new functionality that the previous guestbook code example does not have (seems you can change guestbook). This just adds to the confusion.
I often use djangoforms on GAE and I was wondering if someone can help me translate all these queries into high replication datastore compatible queries (let's forget for a moment the discussion that not all queries necessarily need to be high replication datastore compatible queries and focus on the example itself).
UPDATE: with high replication datastore compatible queries I refer to queries that always return the latest data and not potential stale data. Using entity groups seems to be the way to go here but as mentioned before, I don't have many practical code examples of how to do this, so that is what I am looking for!
So the queries in this article are:
The main recurring query in this article is:
query = db.GqlQuery("SELECT * FROM Item ORDER BY name")
which we will translate to:
query = Item.all().order('name') // datastore request
validating the form happens like:
data = ItemForm(data=self.request.POST)
if data.is_valid():
# Save the data, and redirect to the view page
entity = data.save(commit=False)
entity.added_by = users.get_current_user()
entity.put() // datastore request
and getting the latest entry from the datastore for populating a form happens like:
id = int(self.request.get('id'))
item = Item.get(db.Key.from_path('Item', id)) // datastore request
data = ItemForm(data=self.request.POST, instance=item)
So what do I/we need to do to make all these datastore requests compatible with the high replication datastore?
One last thing that is also not clear to me. Using ancestor keys, does this have any impact on the model in datastore. For example, in the guestbook code example they use:
def guestbook_key(guestbook_name=None):
return db.Key.from_path('Guestbook', guestbook_name or 'default_guestbook')
However 'Guestbook' does not exist in the model, so how can you use 'db.Key.from_path' on this and why would this work? Does this change how data is stored in the datastore which I need to keep into account when retrieving the data (e.g. does it add another field I should exclude from showing when using djangoforms)?
Like I said before, this is confusing me a lot and your help is greatly appreciated!
I'm not sure why you think you need to change your queries at all. The documentation that you link to clearly states:
The back end changes, but the datastore API does not change at all. You'll use the same programming interfaces no matter which datastore you're using.
The point of that page is just to say that queries may be out of sync if you don't use entity groups. Your final code snippet is just an example of that - the string 'Guestbook' is exactly an ancestor key. I don't understand why you think it needs to exist in the model. Once again, this is unchanged from the non-HR datastore - it has always been the case that keys are built up from paths, which can consist of arbitrary strings. You probably need to reread the documentation on entity groups and keys.
The changes to use the HRD are not in how queries are made, but in what guarantees are made about what data you get back. The example you give:
query = db.GqlQuery("SELECT * FROM Item ORDER BY name")
will work in the HRD as well. The catch (basically) is that this kind of query (using either this syntax, or the Item.all() form) can return objects slightly out-of-date. This is probably not a big deal with the guestbook.
Note that if you're getting an object by key directly, it will never be out-of-date. It's only for queries that you can see this issue. You can avoid this problem with queries by placing all the entities that need to be consistent in a single entity group. Note that this limits the rate at which you can write to the entity group.
In answer to your follow-up question, "Guestbook" is the name of the entity.
How can I cache a Reference Property in Google App Engine?
For example, let's say I have the following models:
class Many(db.Model):
few = db.ReferenceProperty(Few)
class Few(db.Model):
year = db.IntegerProperty()
Then I create many Many's that point to only one Few:
one_few = Few.get_or_insert(year=2009)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Now, if I want to iterate over all the Many's, reading their few value, I would do this:
for many in Many.all().fetch(1000):
print "%s" % many.few.year
The question is:
Will each access to many.few trigger a database lookup?
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time?
As noted in one comment: I know about memcache, but I'm not sure how I can "inject it" when I'm calling the other entity through a reference.
In any case memcache wouldn't be useful, as I need caching within an execution, not between them. Using memcache wouldn't help optimizing this call.
The first time you dereference any reference property, the entity is fetched - even if you'd previously fetched the same entity associated with a different reference property. This involves a datastore get operation, which isn't as expensive as a query, but is still worth avoiding if you can.
There's a good module that adds seamless caching of entities available here. It works at a lower level of the datastore, and will cache all datastore gets, not just dereferencing ReferenceProperties.
If you want to resolve a bunch of reference properties at once, there's another way: You can retrieve all the keys and fetch the entities in a single round trip, like so:
keys = [MyModel.ref.get_value_for_datastore(x) for x in referers]
referees = db.get(keys)
Finally, I've written a library that monkeypatches the db module to locally cache entities on a per-request basis (no memcache involved). It's available, here. One warning, though: It's got unit tests, but it's not widely used, so it could be broken.
The question is:
Will each access to many.few trigger a database lookup? Yes. Not sure if its 1 or 2 calls
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time? You should be able to use the memcache repository to do this. This is in the google.appengine.api.memcache package.
Details for memcache are in http://code.google.com/appengine/docs/python/memcache/usingmemcache.html