db.ReferenceProperty() vs ndb.KeyProperty in App Engine

db.ReferenceProperty() vs ndb.KeyProperty in App Engine - python

ReferenceProperty was very helpful in handling references between two modules. Fox example:
class UserProf(db.Model):
name = db.StringProperty(required=True)
class Team(db.Model):
manager_name = db.ReferenceProperty(UserProf, collection_name='teams')
name = db.StringProperty(required=True)
To get 'manager_name' with team instance, we use team_ins.manager_name.
To get 'teams' which are managed by particular user instance, we use user_instance.teams and iterate over.
Doesn't it look easy and understandable?
In doing same thing using NDB, we have to modify
db.ReferenceProperty(UserProf, collection_name='teams') --> ndb.KeyProperty(kind=UserProf)
team_ins.manager_name.get() would give you manager name
To get all team which are manger by particular user, we have to do
for team in Team.query(Team.manager_name == user_ins.key):
print "team name:", team.name
As you can see handling these kind of scenarios looks easier and readable in db than ndb.
What is the reason for removing ReferenceProperty in ndb?
Even db's query user_instance.teams would have doing the same thing as it is done in ndb's for loop. But in ndb, we are explicitly mentioning using for loop.
What is happening behind the scenes when we do user_instance.teams?
Thanks in advance..

Tim explained it well. We found that a common anti-pattern was using reference properties and loading them one at a time, because the notation "entity.property1.property2" doesn't make it clear that the first dot causes a database "get" operation. So we made it more obvious by forcing you to write "entity.property1.get().property2", and we made it easier to do batch prefetching (without the complex solution from Nick's blog) by simply saying "entity.property1.get_async()" for a bunch of entities -- this queues a single batch get operation without blocking for the result, and when you next reference any of these properties using "entity.property1.get().property2" this won't start another get operation but just waits for that batch get to complete (and the second time you do this, the batch get is already complete). Also this way in-process and memcache integration comes for free.

I don't know the answer as to why Guido didn't implement reference property.
However I found a spent a lot of time using pre_fetch_refprops http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine (pre fetches all of the reference properties by grabbing all the keys with get_value_for_datastore,) and then it does a get_multi on the keys.
This was vastly more efficient.
Also if the object referenced doesn't exist you would get an error when trying to fetch the object.
If you pickled an object which had references you ended up pickling a lot more than you probably planned too.
So I found except for the one case, where you have single entity and you wanted to grab the referenced object with .name type accessor you had to jump through all sorts of hoops to prevent the referenced entity from being fetched.

Related

SQLAlchemy insert multiple objects with relationships

I am trying to create many objects. Currently it is done by calling session.add() then session.flush() upon each object being created. This is bad because there are over 500+ objects being created so each of them performs a query which affects performance. I am trying to optimise this.
To give some background, here is how my object roughly looks like:
class Metric:
data: str
fk_document_id: int (relationship)
fk_user_id: int (realtionship)
Key thing to highlight, there are 2 relationships in this object, implemented using the relationship property that SQLAlchemy offers.
My first go was to change the code to use bulk_save_objects() however this method does not save the relationships and also requires a session.commit() which I'd ideally not do because there is a chance this transaction could get rolled back after the creation of the Metric objects.
Then I found add_all() however this performs an add() for each object anyways, so I do not think it will impact the current performance much.
I can't see any other approach, so my question is if there is a way in this case to use bulk_save_objects() to persist relationships and possibly not require a commit. Alternatively, if there is another method to do this that I do not see, please let me know.

Store reference to non-NDB object in an NDB model

As a caveat: I am an utter novice here. I wouldn't be surprised to learn a) this is already answered, but I can't find it because I lack the vocabulary to describe my problem or b) my question is basically silly to begin with, because what I want to do is silly.
Is there some way to store a reference to a class instance that defined and stored in active memory and not stored in NDB? I'm trying to write an app that would help manage a number of characters/guilds in an MMO. I have a class, CharacterClass, that includes properties such as armor, name, etc. that I define in main.py as a base python object, and then define the properties for each of the classes in the game. Each Character, which would be stored in Datastore, would have a property charClass, which would be a reference to one of those instances of CharacterClass. In theory I would be able to do things like
if character.charClass.armor == "Cloth":
while storing the potentially hundreds of unique characters and their specifc data in Datastore, but without creating a copy of "Cloth" for every cloth-armor character, or querying Datastore for what kind of armor a mage wears thousands of times a day.
I don't know what kind of NDB property to use in Character to store the reference to the applicable CharacterClass. Or if that's the right way to do it, even. Thanks for taking the time to puzzle through my confused question.

A string is all you need. You just need to fetch the class based on the string value. You could create a custom property that automatically instantiates the class on reference.
However I have a feeling that hard coding the values in code might be a bit unwieldy. May be you character class instances should be datastore entities as well. It means you can adjust these parameters without deploying new code.
If you want these objects in memory then you can pre-cache them on warmup.

"Too much contention" when creating new entity in dataStore

This morning my GAE application generated several error log: "too much contention on these datastore entities. please try again.". In my mind, this type of error only happens when multiple requests try modify the same entity or entities in the same entity group.
When I got this error, my code is inserting new entities. I'm confused. Does this mean there is a limitation of how fast we can create new entity?
My code of model definition and calling sequence is show below:
# model defnition
class ExternalAPIStats(ndb.Model):
uid = ndb.StringProperty()
api = ndb.StringProperty()
start_at = ndb.DateTimeProperty(auto_now_add=True)
end_at = ndb.DateTimeProperty()
# calling sequence
stats = ExternalAPIStats(userid=current_uid, api="eapi:hr:get_by_id", start_at=start_at, end_at=end_at)
stats.put() # **too much contention** happen here
That's pretty mysterious to me. I was wondering how I shall deal with this problem. Please let me know if any suggestion.

Without seeing how the calls are made(you show the calling code but how often is it called, via loop or many pages calling the same put at the same time) but I believe the issue is better explained here. In particular
You will also see this problem if you create new entities at a high rate with a monotonically increasing indexed property like a timestamp, because these properties are the keys for rows in the index tables in Bigtable.
with the 'start_at' being the culprit. This article explains in more detail.
Possibly (though untested) try doing your puts in batches. Do you run queries on the 'start_at' field? If not removing its indexes will also fix the issue.
How is the puts called (ie what I was asking above in a loop, multiple pages calling)? With that it might be easier to narrow down the issue.

Here is everything you need to know about Datastore Contention and how to avoid it:
https://developers.google.com/appengine/articles/scaling/contention?hl=en
(Deleted)
UPDATE:
You are reaching writes per second limit on the same entity group. Default it is 1 write per second.
https://cloud.google.com/datastore/docs/concepts/limits
Source: https://stackoverflow.com/a/47800087/1034622

Python sqlite3, saving instance of a class with an other instance as it's attribute?

I'm creating a game mod for Counter-Strike in python, and it's basically all done. The only thing left is to code a REAL database, and I don't have any experience on sqlite, so I need quite a lot of help.
I have a Player class with attribute self.steamid, which is unique for every Counter-Strike player (received from the game engine), and self.entity, which holds in an "Entity" for player, and Entity-class has lots and lots of more attributes, such as level, name and loads of methods. And Entity is a self-made Python class).
What would be the best way to implement a database, first of all, how can I save instances of Player with an other instance of Entity as it's attribute into a database, powerfully?
Also, I will need to get that users data every time he connects to the game server, (I have player_connect event), so how would I receive the data back?
All the tutorials I found only taught about saving strings or integers, but nothing about whole instances. Will I have to save every attribute on all instances (Entity instance has few more instances as it's attributes, and all of them have huge amounts of attributes...), or is there a faster, easier way?
Also, it's going to be a locally saved database, so I can't really use any other languages than sql.

You need an ORM. Either you roll your own (which I never suggest), or you use one that exists already. Probably the two most popular in Python are sqlalchemy, and the ORM bundled with Django.

SQL databses typically can hold only fundamental datatypes. You can use SQLAlchemy if you want to map your models so that their attributes are automatically mapped to SQL types - but it would require a lot of study and trial and error using SQLlite on your part.
I think you are not entirely correct when you say "it has to be SQL" - if you are running Python code, you can save whatver format you like.
However, Python allows you to serialize your instance Data to a string - which is persistable in a database.
So, you can create a varchar(65535) field in the SQL, along with an ID field (which could be the player ID number you mentioned, for example), and persist to it the value returned by:
import pickle
value = pickle.dumps(my_instance)
When retrieving the value you do the reverse:
my_instance = pickle.loads(value)

appengine: cached reference property?

How can I cache a Reference Property in Google App Engine?
For example, let's say I have the following models:
class Many(db.Model):
few = db.ReferenceProperty(Few)
class Few(db.Model):
year = db.IntegerProperty()
Then I create many Many's that point to only one Few:
one_few = Few.get_or_insert(year=2009)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Now, if I want to iterate over all the Many's, reading their few value, I would do this:
for many in Many.all().fetch(1000):
print "%s" % many.few.year
The question is:
Will each access to many.few trigger a database lookup?
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time?
As noted in one comment: I know about memcache, but I'm not sure how I can "inject it" when I'm calling the other entity through a reference.
In any case memcache wouldn't be useful, as I need caching within an execution, not between them. Using memcache wouldn't help optimizing this call.

The first time you dereference any reference property, the entity is fetched - even if you'd previously fetched the same entity associated with a different reference property. This involves a datastore get operation, which isn't as expensive as a query, but is still worth avoiding if you can.
There's a good module that adds seamless caching of entities available here. It works at a lower level of the datastore, and will cache all datastore gets, not just dereferencing ReferenceProperties.
If you want to resolve a bunch of reference properties at once, there's another way: You can retrieve all the keys and fetch the entities in a single round trip, like so:
keys = [MyModel.ref.get_value_for_datastore(x) for x in referers]
referees = db.get(keys)
Finally, I've written a library that monkeypatches the db module to locally cache entities on a per-request basis (no memcache involved). It's available, here. One warning, though: It's got unit tests, but it's not widely used, so it could be broken.

The question is:
Will each access to many.few trigger a database lookup? Yes. Not sure if its 1 or 2 calls
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time? You should be able to use the memcache repository to do this. This is in the google.appengine.api.memcache package.
Details for memcache are in http://code.google.com/appengine/docs/python/memcache/usingmemcache.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

db.ReferenceProperty() vs ndb.KeyProperty in App Engine - python

Related

SQLAlchemy insert multiple objects with relationships

Store reference to non-NDB object in an NDB model

"Too much contention" when creating new entity in dataStore

Python sqlite3, saving instance of a class with an other instance as it's attribute?

appengine: cached reference property?

Categories

Resources