I have +5 hours training to explain how : Item.objects.values('type', 'state') returns a dictionary that contains only two keys.
However Item.objects.values('type', 'state').annotate(nb=Count('id')) works !!
How does the interpreter knows that id attribute exists if it's not returned by values function ?
It knows that the id attribute exists because Item.objects.values('type', 'state') isn't just a dictionary. It's a object that represents itself as a dictionary based on the parameters you give it.
Imagine the object is a piece of paper, let's call it paper A:
id : 1
type : cheese
state : melted
What you're actually seeing when you call it is a representation of that object created by only showing you the relevant parts, like a piece of paper with holes put on top of A, paper B:
████████████████████
████████████████████
+--------------------+
|type: cheese |
+--------------------+
+--------------------+
|state: melted |
+--------------------+
But paper A is still underneath paper B, intact. That's why Item.objects.values('type', 'state').annotate(nb=Count('id')) works: when annotate goes to look at the object, it's asking for what it actually is, not what it looks like to an outside observer. In other words, annotate looks at paper A, not paper B.
By having the object Item.objects.values('type', 'state') represent itself differently to the user and the system, it allows the system to retain as much information as possible in case it needs to check it. This is common in ORM models so that discrepancies don't arise between the database and representations of the database.
Your model has id in background and also Django ORM is aware of your model definition
Django ORM is lazy loading and it wont execute nothing before result is called. So in moment when you are calling annotate it is not yet a dictionary it is still object. In moment that you ask for it result it triggers query to database and returns your result
Django ORM translates this into query to the database similar to this
SELECT type, state, count(id) as nb FROM items
The first query does not return a dictionary with two keys. On the contrary, it returns a ValuesQuerySet; each element of that queryset is a dictionary.
The ValuesQuerySet, like any other queryset, retains a connection with the model, and it is therefore able to add any other elements to the query as necessary. The query as a whole is not executed until the queryset is iterated.
Related
I'm trying to concatenate many querysets together. I tried out the marked answer from this question a while back, but that didn't work in my case. I needed to return a queryset not a list. So I used the |, from the second answer. This worked fine at the time, but now that I'm trying to use it again for something else I get the following error:
Expression tree is too large (maximum depth 1000)
I originally thought that | would concat the querysets, but after reading the docs it appears that it concats the actual query. And that this specific problem occurs if the query becomes too long/complex.
This is what I'm trying to do:
def properties(self, request, pk=None):
project = self.get_object()
if project is None:
return Response({'detail': 'Missing project id'}, status=404)
functions = Function.objects.filter(project=project)
properties = Property.objects.none()
for function in functions:
properties = properties | function.property_set.all()
return Response([PropertySerializer(x).data for x in properties])
Since the functions query returns roughly 1200 results, and each function has about 5 properties, I can understand the query becoming too long/complex.
How can I prevent the query from becoming too complex? Or how can I execute multiple queries and concat them afterwards, while keeping the end result a queryset?
I think you want to obtain all the Property objects that have as Function a certain project.
We can query this with:
properties = Property.objects.filter(function__project=project)
This thus is a queryset that contains all property objects for which the function (I assume this is a ForeignKey) has as project (probably again a ForeignKey is the given project). This will result in a single query as well, but you will avoid constructing gigantic unions.
Alternatively, you can do it in two steps, but this would actually make it slower:
# probably less efficient
function_ids = (Function.objects.filter(project=project)
.values_list('pk', flat=True))
properties = Properties.object(function_id__in=function_ids)
I have a collection of properties. some of these properties are given review score. review is saved in AggregateReview table. I have to sort these properties on basis of their score. Property with highest review score will come first.When all properties with review score will be sorted, the i have to append those property which aren't reviewed.
This is the code i have written, It's working fine....but i want to know is there anything in this code, that can be optimized, I am new to app engine and ndb, so any help will be appreciated. ( I think there are so many calls to db )...
sortedProperties = []
sortedProperties.extend(sorted([eachProperty for eachProperty in properties if AggregateReview.query(AggregateReview.property == eachProperty.key).get()],key=lambda property: AggregateReview.query(AggregateReview.property == property.key).get().rating,reverse=True))
sortedProperties.extend([eachProperty for eachProperty in properties if AggregateReview.query(AggregateReview.property == eachProperty.key).get() is None])
return sortedProperties
after bit of workaround i came to this:
return sorted(properties,key=lambda property: property.review_aggregate.get().average,reverse=True)
but it throws an error :
'NoneType' object has no attribute 'average'
because it can not find the review_aggregate for every property. I want it to accept None....
I am not sure what's going on in your code there, to be honest, as the lines are soo long. But I know that you may want to look into Memcached. It can be used to minimize hits on a database, and is widely used indeed. It is actually built into Google app engine. Read the docs on it here
I'm writing a quick and dirty maintenace script to delete some rows and would like to avoid having to bring my ORM classes/mappings over from the main project. I have a query that looks similar to:
address_table = Table('address',metadata,autoload=True)
addresses = session.query(addresses_table).filter(addresses_table.c.retired == 1)
According to everything I've read, if I was using the ORM (not 'just' tables) and passed in something like:
addresses = session.query(Addresses).filter(addresses_table.c.retired == 1)
I could add a .delete() to the query, but when I try to do this using only tables I get a complaint:
File "/usr/local/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 2146, in delete
target_cls = self._mapper_zero().class_
AttributeError: 'NoneType' object has no attribute 'class_'
Which makes sense as its a table, not a class. I'm quite green when it comes to SQLAlchemy, how should I be going about this?
Looking through some code where I did something similar, I believe this will do what you want.
d = addresses_table.delete().where(addresses_table.c.retired == 1)
d.execute()
Calling delete() on a table object gives you a sql.expression (if memory serves), that you then execute. I've assumed above that the table is bound to a connection, which means you can just call execute() on it. If not, you can pass the d to execute(d) on a connection.
See docs here.
When you call delete() from a query object, SQLAlchemy performs a bulk deletion. And you need to choose a strategy for the removal of matched objects from the session. See the documentation here.
If you do not choose a strategy for the removal of matched objects from the session, then SQLAlchemy will try to evaluate the query’s criteria in Python straight on the objects in the session. If evaluation of the criteria isn’t implemented, an error is raised.
This is what is happening with your deletion.
If you only want to delete the records and do not care about the records in the session after the deletion, you can choose the strategy that ignores the session synchronization:
address_table = Table('address', metadata, autoload=True)
addresses = session.query(address_table).filter(address_table.c.retired == 1)
addresses.delete(synchronize_session=False)
I have some database structure; as most of it is irrelevant for us, i'll describe just some relevant pieces. Let's lake Item object as example:
items_table = Table("invtypes", gdata_meta,
Column("typeID", Integer, primary_key = True),
Column("typeName", String, index=True),
Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))
mapper(Item, items_table,
properties = {"group" : relation(Group, backref = "items"),
"_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
"effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
"metaGroup" : relation(MetaType,
primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
uselist = False),
"ID" : synonym("typeID"),
"name" : synonym("typeName")})
I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas:
1) Requesting the same item twice:
item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)
Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:
itemMapId = {}
itemMapName = {}
#cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
if isinstance(lookfor, (int, float)):
id = int(lookfor)
if eager is None and id in itemMapId:
item = itemMapId[id]
else:
item = session.query(Item).options(*processEager(eager)).get(id)
itemMapId[item.ID] = item
itemMapName[item.name] = item
elif isinstance(lookfor, basestring):
if eager is None and lookfor in itemMapName:
item = itemMapName[lookfor]
else:
# Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
itemMapId[item.ID] = item
itemMapName[item.name] = item
return item
I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there's such functionality in sqlalchemy, how to access it?
2) Eager loading of relationships often helps to save alot of requests to database. Say, i'll definitely need following set of item=Item() properties:
item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)
If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i'd access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.
2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database - its already loaded relations do not become unloaded, am i correct?
2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i'll need ("items" and "items.metaGroup"). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won't fetch anything from the underlying database?
2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with ("items", "items.metaGroup") eager load, or check each item from items list individually, and if item or its metaGroup is not loaded - load them? It seems to depend on the situation, because if everything has already been loaded some time ago - issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?
As an illustration to 2.3 - i can fetch group with ID 83, eagerly fetching "items" and "items.metaGroup". Is there a way to determine from an item (which has groupID of an 83), does it have "group", "group.items" and "group.items.metaGroup" loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?
To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) from sqlalchemy.orm.attributes.instance_state(obj).unloaded.
You don't use deferred columns in your example, but I'll describe them here for completeness. The typical scenario for handling deferred columns is the following:
Decorate selected columns with deferred(). Combine them into one or several groups by using group parameter to deferred().
Use undefer() and undefer_group() options in query when desired.
Accessing deferred column put in group will load all columns in this group.
Unfortunately this doesn't work reverse: you can combine columns into groups without deferring loading of them by default with column_property(Column(…), group=…), but defer() option won't affect them (it works for Columns only, not column properties, at least in 0.6.7).
To force loading deferred column properties session.refresh(obj, attribute_names=…) suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed as attribute_names argument (e.g. by using intersection with state.unloaded).
Update
1) SQLAlchemy does track loaded objects. That's how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use weak_identity_map=False to change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won't do SQL request for query.get(pk) when object is already in the session. But this works for get() method only, so query.filter_by(id=pk).first() will do SQL request and refresh object in the session with loaded data.
2) Eager loading of relations will lead to fewer requests, but it's not always faster. You have to check this for your database and data.
2.1) Refetching data from database won't unload objects bound via relations.
2.2) item.group is loaded using query.get() method, so there won't lead to SQL request if object is already in the session.
2.3) Yes, it depends on situation. For most cases it's the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects' relations are loaded via state.unloaded and so recursively to any depth. But when relation is not loaded yet you can't get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just check state.unloaded recursively.
1)
From the Session documentation:
[The Session] is somewhat used as a cache, in that
it implements the identity map
pattern, and stores objects keyed to
their primary key. However, it doesn’t
do any kind of query caching. ... It’s only
when you say query.get({some primary
key}) that the Session doesn’t have to
issue a query.
2.1) You are correct, relationships are not modified when you refresh an object.
2.2) Yes, the group will be in the identity map.
2.3) I believe your best bet will be to attempt to reload the entire group.items in a single query. From my experience it is usually much quicker to issue one large request than several smaller ones. The only time it would make sense to only reload a specific group.item is there was exactly one of them that needed to be loaded. Though in that case you are doing one large query instead of one small one so you don't actually reduce the number of queries.
I have not tried it, but I believe you should be able to use the sqlalchemy.orm.util.identity_key method to determine whether an object is in sqlalchemy's identiy map. I would be interested to find out what calling identiy_key(Group, 83) returns.
Initial Question)
If I understand correctly you have an object that you fetched from the database where some of its relationships were eagerloaded and you would like to fetch the rest of the relationships with a single query? I believe you may be able to use the Session.refresh() method passing in the the names of the relationships that you want to load.
In general, it's better to do a single query vs. many queries for a given object. Let's say I have a bunch of 'son' objects each with a 'father'. I get all the 'son' objects:
sons = Son.all()
Then, I'd like to get all the fathers for that group of sons. I do:
father_keys = {}
for son in sons:
father_keys.setdefault(son.father.key(), None)
Then I can do:
fathers = Father.get(father_keys.keys())
Now, this assumes that son.father.key() doesn't actually go fetch the object. Am I wrong on this? I have a bunch of code that assumes the object.related_object.key() doesn't actually fetch related_object from the datastore.
Am I doing this right?
You can find the answer by studying the sources of appengine.ext.db in your download of the App Engine SDK sources -- and the answer is, no, there's no special-casing as you require: the __get__ method (line 2887 in the sources for the 1.3.0 SDK) of the ReferenceProperty descriptor gets invoked before knowing if .key() or anything else will later be invoked on the result, so it just doesn't get a chance to do the optimization you'd like.
However, see line 2929: method get_value_for_datastore does do exactly what you want!
Specifically, instead of son.father.key(), use Son.father.get_value_for_datastore(son) and you should be much happier as a result;-).
I'd rather loop through the sons and get parent's keys using son.parent_key().
parent_key()
Returns the Key of the parent entity of this instance, or None if
this instance does not have a parent.
Since all the path is saved in the instance's key, theoretically, there is no need to hit the database again to get the parent's key.
After that, it's possible to get all parents' instances at once using db.get().
get(keys)
Gets the entity or entities for the given key or keys, of any Model.
Arguments:
keys
A Key object or a list of Key objects.
If one Key is provided, the return value is an instance of the
appropriate Model class, or None if no
entity exists with the given Key. If a
list of Keys is provided, the return
value is a corresponding list of model
instances, with None values when no
entity exists for a corresponding Key.