Accessing related object key without fetching object in App Engine

Accessing related object key without fetching object in App Engine - python

In general, it's better to do a single query vs. many queries for a given object. Let's say I have a bunch of 'son' objects each with a 'father'. I get all the 'son' objects:
sons = Son.all()
Then, I'd like to get all the fathers for that group of sons. I do:
father_keys = {}
for son in sons:
father_keys.setdefault(son.father.key(), None)
Then I can do:
fathers = Father.get(father_keys.keys())
Now, this assumes that son.father.key() doesn't actually go fetch the object. Am I wrong on this? I have a bunch of code that assumes the object.related_object.key() doesn't actually fetch related_object from the datastore.
Am I doing this right?

You can find the answer by studying the sources of appengine.ext.db in your download of the App Engine SDK sources -- and the answer is, no, there's no special-casing as you require: the __get__ method (line 2887 in the sources for the 1.3.0 SDK) of the ReferenceProperty descriptor gets invoked before knowing if .key() or anything else will later be invoked on the result, so it just doesn't get a chance to do the optimization you'd like.
However, see line 2929: method get_value_for_datastore does do exactly what you want!
Specifically, instead of son.father.key(), use Son.father.get_value_for_datastore(son) and you should be much happier as a result;-).

I'd rather loop through the sons and get parent's keys using son.parent_key().
parent_key()
Returns the Key of the parent entity of this instance, or None if
this instance does not have a parent.
Since all the path is saved in the instance's key, theoretically, there is no need to hit the database again to get the parent's key.
After that, it's possible to get all parents' instances at once using db.get().
get(keys)
Gets the entity or entities for the given key or keys, of any Model.
Arguments:
keys
A Key object or a list of Key objects.
If one Key is provided, the return value is an instance of the
appropriate Model class, or None if no
entity exists with the given Key. If a
list of Keys is provided, the return
value is a corresponding list of model
instances, with None values when no
entity exists for a corresponding Key.

Related

How to access self.context property using a variable in flask

I want to access a property exist in the self.context using a variable. I have a variable name "prop" and it contains a value and it is already set in the self.context. I am using Flask Restplus framework.
prop = 'binding'
If I try to access this property like below then it gives me an error:
Object is not subscriptable
I want to know if there is any way to get the value? Like doing this:
print(self.context[prop])
I only get one solution don't know if its correct or not, I tried this :
self.context.__getattribute__(prop)

There are two ways to do this, the simplest is using getattr:
getattr(self.context, prop)
This function internally calls __getattribute__ so it's the same as your code, just a little neater.
However, you still have the problem that you probably only want to access some of the context, while not risking editing values set by different parts of your application. A much better way would be to store a dict in context:
self.context.attributes = {}
self.context.attributes["binding"] = False
print(self.context.attributes[prop])
This way, only certain context variables can be accessed and those that are meant to be dynamic don't mess with any used by your application code directly.

append to request.sessions[list] in Django

Something is bugging me.
I'm following along with this beginner tutorial for django (cs50) and at some point we receive a string back from a form submission and want to add it to a list:
https://www.youtube.com/watch?v=w8q0C-C1js4&list=PLhQjrBD2T380xvFSUmToMMzERZ3qB5Ueu&t=5777s
def add(request):
if 'tasklist' not in request.session:
request.session['tasklist'] = []
if request.method == 'POST':
form_data = NewTaskForm(request.POST)
if form_data.is_valid():
task = form_data.cleaned_data['task']
request.session['tasklist'] += [task]
return HttpResponseRedirect(reverse('tasks:index'))
I've checked the type of request.session['tasklist']and python shows it's a list.
The task variable is a string.
So why doesn't request.session['tasklist'].append(task) work properly? I can see it being added to the list via some print statements but then it is 'forgotten again' - it doesn't seem to be permanently added to the tasklist.
Why do we use this request.session['tasklist'] += [task] instead?
The only thing I could find is https://ogirardot.wordpress.com/2010/09/17/append-objects-in-request-session-in-django/ but that refers to a site that no longer exists.
The code works fine, but I'm trying to understand why you need to use a different operation and can't / shouldn't use the append method.
Thanks.

The reason why it does not work is because django does not see that you have changed anything in the session by using the append() method on a list that is in the session.
What you are doing here is essentially pulling out the reference to the list and making changes to it without the session backend knowing anything about it. An other way to explain:
The append() method is on the list itself not on the session object
When you call append() on the list you are only talking to the list and the list's parent (the session) has no idea what you guys are doing
When you however do an assignment on the session itself session['whatever'] = 'something' then it knows that something is up and changes are made
So the key here is that you need to operate on the session object directly if you want your changes to be updated automatically
Django only thinks it needs to save a changed session item if the item got reassigned to the session. See here: django session base code the __setitem__ method containing a self.modified = True statement.
The session['list'] += [new_element] adds a new list item (mutates the list stored in the session, so the list reference stays the same) and then gets it reassigned to the session again -> thus triggering first a __getitem__ call -> then your += / __iadd__ runs on the value read -> then a __setitem__ call is made (with the list ref. passed to it). You can see it in the django codebase that it marks the session after each __setitem__ call as modified.
The session['list'] = session['list'] + [new_item] mode of doing the same does create a new list every time it's run so its a bit less efficient, but you should not store hundreds of items in the session anyway. So you're probably fine. This also works exactly as above.
However if you use sub-keys in the session like session['list']['x'] = 'whatever' the session will not see itself as modified so you need to mark it as by request.session.modified = True

Short answer: It's about how Python chooses to implement the dict data structure.
Long answer:
Let's start by saying that request.session is a dictionary.
Quoting Django's documentation, "By default, Django only saves to the session database when the session has been modified – that is if any of its dictionary values have been assigned or deleted". Link
So, the problem is that the session database is not being modified by
request.session['tasklist'].append(task)
Seeing the related parts Django's Session base code (as posted by #Csaba K. in an answer), the variable self.modified is to be set True when setitem dunder method is called.
Now, at this step the problem seems like the setitem dunder method is not being called with request.session['tasklist'].append(task) but with request.session['tasklist'] += [task] it gets called. It is not due to if the reference of request.session['tasklist'] is changing or not as pointed out by another answer, because the reference to the underlying list remains the same.
To confirm, let's create a custom dictionary which extends the Python dict, and print something when setitem dunder method is called.
class MyDict(dict):
def __init__(self, globalVar):
super().__init__()
self.globalVar = globalVar
def __setitem__(self, key, value):
super().__setitem__(key, value)
print("Called Set item when: ", end="")
myDict = MyDict(0)
print("Creating Dict")
print("-----")
myDict["y"] = []
print("Adding a new key-value pair")
print("-----")
myDict["y"] += ["x"]
print(" using +=")
print("-----")
myDict["y"].append("x")
print("append")
print("-----")
myDict["y"].extend(["x"])
print("extend")
print("-----")
myDict["y"] = myDict["y"] + ["x"]
print(" using +",)
print("-----")
It prints:
Creating Dict
-----
Called Set item when: Adding a new key-value pair
-----
Called Set item when: using +=
-----
append
-----
extend
-----
Called Set item when: using +
-----
As we can see, setitem dunder method is called and in turn self.modified is set true only when adding a new key-value pair, or using += or using +, but not when initializing, appending or extending an iterable (in this case a list). Now, the operator + and += do very different things in Python, as explained in the other answer. += behaves more like the append method but in this case, I guess it's more about how Python chooses to implement the dict data structure rather than how +, += and append behave on lists.

I found this while doing some more searching:
https://code.djangoproject.com/wiki/NewbieMistakes
Scroll to 'Appending to a list in session doesn't work'
Again, it is a very dated entry but still seems to hold true.
Not completely satisfied because this does not answer the question as to 'why' this doesn't work, but at the very least confirms 'something's up' and you should probably still use the recommendations there.
(if anyone out there can actually explain this in a more verbose manner then I'd be happy to hear it)

py2neo cypher create several relations to central node in for loop

just starting out with neo4j, py2neo and Cypher.
I have encountered the following problem and google and my knowledge of what to ask have not yet given me an answer or a helpful hint in the right direction. Anyway:
Problem:
I don't know how to, in python/py2neo, create relations between a unique starting node and a number of following nodes that I create dynamically in a for loop.
Background:
I have a json object which defines a person object, who will have an id, and several properties, such as favourite colour, favourite food etc.
So at the start of my py2neo script I define my person. After this I loop through my json for every property this person has.
This works fine, and with no relations I end up with a neo4j chart with several nodes with the right parameters.
If I'm understanding the docs right I have to make a match to find my newly created person, for each new property I want to link. This seems absurd to me as I just created this person and still have the reference to the person object in memory. But for me it is unclear on how to actually write the code for creating the relation. Also, as a relative newbie in both python and Cypher, best practices are still an unknown to me.
What I understand is I can use py2neo
graph = Graph(http://...)
tx = graph.begin()
p = Node("Person", id)
tx.create(p)
and then I can reference p later on. But for my properties, of which there can be many, I create a string in python like so (pseudocode here, I have a nice oneliner for this that fits my actual case with lambda, join, map, format and so on)
for param in params:
par = "MERGE (par:" + param + ... )
tx.append(par)
tx.process()
tx.commit()
How do I create a relation "likes" back to the person for each and every par in the for loop?
Or do I need to rethink my whole solution?
Help?! :-)
//Jonas

Considering you've created a node Alice and you want to create the other as dynamic, I'll suggest while dynamically parsing through the nodes, store it everytime (in the loop) in a variable, make a node out of it and then implement in Relationship Syntax. The syntax is
Relationship(Node_1, Relation, Node_2)
Now key thing to know here is type(Node_1) and type(Node_2) both will be Node.
I've stored all the nodes (only their names) from json in a list named nodes.
Since you mentioned you only have reference to Alice
a = ("Person", name:"Alice")
for node in nodes: (excluding Alice)
= Node(, name:"")
= Relationship(a, ,
Make sure to iterate variable name, else it'll keep overwriting.

django values function strange behaviors?

I have +5 hours training to explain how : Item.objects.values('type', 'state') returns a dictionary that contains only two keys.
However Item.objects.values('type', 'state').annotate(nb=Count('id')) works !!
How does the interpreter knows that id attribute exists if it's not returned by values function ?

It knows that the id attribute exists because Item.objects.values('type', 'state') isn't just a dictionary. It's a object that represents itself as a dictionary based on the parameters you give it.
Imagine the object is a piece of paper, let's call it paper A:
id : 1
type : cheese
state : melted
What you're actually seeing when you call it is a representation of that object created by only showing you the relevant parts, like a piece of paper with holes put on top of A, paper B:
████████████████████
████████████████████
+--------------------+
|type: cheese |
+--------------------+
+--------------------+
|state: melted |
+--------------------+
But paper A is still underneath paper B, intact. That's why Item.objects.values('type', 'state').annotate(nb=Count('id')) works: when annotate goes to look at the object, it's asking for what it actually is, not what it looks like to an outside observer. In other words, annotate looks at paper A, not paper B.
By having the object Item.objects.values('type', 'state') represent itself differently to the user and the system, it allows the system to retain as much information as possible in case it needs to check it. This is common in ORM models so that discrepancies don't arise between the database and representations of the database.

Your model has id in background and also Django ORM is aware of your model definition
Django ORM is lazy loading and it wont execute nothing before result is called. So in moment when you are calling annotate it is not yet a dictionary it is still object. In moment that you ask for it result it triggers query to database and returns your result
Django ORM translates this into query to the database similar to this
SELECT type, state, count(id) as nb FROM items

The first query does not return a dictionary with two keys. On the contrary, it returns a ValuesQuerySet; each element of that queryset is a dictionary.
The ValuesQuerySet, like any other queryset, retains a connection with the model, and it is therefore able to add any other elements to the query as necessary. The query as a whole is not executed until the queryset is iterated.

Completing object with its relations and avoiding unnecessary queries in sqlalchemy

I have some database structure; as most of it is irrelevant for us, i'll describe just some relevant pieces. Let's lake Item object as example:
items_table = Table("invtypes", gdata_meta,
Column("typeID", Integer, primary_key = True),
Column("typeName", String, index=True),
Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))
mapper(Item, items_table,
properties = {"group" : relation(Group, backref = "items"),
"_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
"effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
"metaGroup" : relation(MetaType,
primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
uselist = False),
"ID" : synonym("typeID"),
"name" : synonym("typeName")})
I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas:
1) Requesting the same item twice:
item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)
Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:
itemMapId = {}
itemMapName = {}
#cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
if isinstance(lookfor, (int, float)):
id = int(lookfor)
if eager is None and id in itemMapId:
item = itemMapId[id]
else:
item = session.query(Item).options(*processEager(eager)).get(id)
itemMapId[item.ID] = item
itemMapName[item.name] = item
elif isinstance(lookfor, basestring):
if eager is None and lookfor in itemMapName:
item = itemMapName[lookfor]
else:
# Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
itemMapId[item.ID] = item
itemMapName[item.name] = item
return item
I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there's such functionality in sqlalchemy, how to access it?
2) Eager loading of relationships often helps to save alot of requests to database. Say, i'll definitely need following set of item=Item() properties:
item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)
If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i'd access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.
2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database - its already loaded relations do not become unloaded, am i correct?
2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i'll need ("items" and "items.metaGroup"). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won't fetch anything from the underlying database?
2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with ("items", "items.metaGroup") eager load, or check each item from items list individually, and if item or its metaGroup is not loaded - load them? It seems to depend on the situation, because if everything has already been loaded some time ago - issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?
As an illustration to 2.3 - i can fetch group with ID 83, eagerly fetching "items" and "items.metaGroup". Is there a way to determine from an item (which has groupID of an 83), does it have "group", "group.items" and "group.items.metaGroup" loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?

To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) from sqlalchemy.orm.attributes.instance_state(obj).unloaded.
You don't use deferred columns in your example, but I'll describe them here for completeness. The typical scenario for handling deferred columns is the following:
Decorate selected columns with deferred(). Combine them into one or several groups by using group parameter to deferred().
Use undefer() and undefer_group() options in query when desired.
Accessing deferred column put in group will load all columns in this group.
Unfortunately this doesn't work reverse: you can combine columns into groups without deferring loading of them by default with column_property(Column(…), group=…), but defer() option won't affect them (it works for Columns only, not column properties, at least in 0.6.7).
To force loading deferred column properties session.refresh(obj, attribute_names=…) suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed as attribute_names argument (e.g. by using intersection with state.unloaded).
Update
1) SQLAlchemy does track loaded objects. That's how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use weak_identity_map=False to change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won't do SQL request for query.get(pk) when object is already in the session. But this works for get() method only, so query.filter_by(id=pk).first() will do SQL request and refresh object in the session with loaded data.
2) Eager loading of relations will lead to fewer requests, but it's not always faster. You have to check this for your database and data.
2.1) Refetching data from database won't unload objects bound via relations.
2.2) item.group is loaded using query.get() method, so there won't lead to SQL request if object is already in the session.
2.3) Yes, it depends on situation. For most cases it's the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects' relations are loaded via state.unloaded and so recursively to any depth. But when relation is not loaded yet you can't get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just check state.unloaded recursively.

1)
From the Session documentation:
[The Session] is somewhat used as a cache, in that
it implements the identity map
pattern, and stores objects keyed to
their primary key. However, it doesn’t
do any kind of query caching. ... It’s only
when you say query.get({some primary
key}) that the Session doesn’t have to
issue a query.
2.1) You are correct, relationships are not modified when you refresh an object.
2.2) Yes, the group will be in the identity map.
2.3) I believe your best bet will be to attempt to reload the entire group.items in a single query. From my experience it is usually much quicker to issue one large request than several smaller ones. The only time it would make sense to only reload a specific group.item is there was exactly one of them that needed to be loaded. Though in that case you are doing one large query instead of one small one so you don't actually reduce the number of queries.
I have not tried it, but I believe you should be able to use the sqlalchemy.orm.util.identity_key method to determine whether an object is in sqlalchemy's identiy map. I would be interested to find out what calling identiy_key(Group, 83) returns.
Initial Question)
If I understand correctly you have an object that you fetched from the database where some of its relationships were eagerloaded and you would like to fetch the rest of the relationships with a single query? I believe you may be able to use the Session.refresh() method passing in the the names of the relationships that you want to load.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.