I have a set of ID's that I'd like to retrieve all of the objects for. My current solution works, however it hammers the database with a a bunch of get queries inside a loop.
objects = [SomeModel.objects.get(id=id_) for id_ in id_set]
Is there a more efficient way of going about this?
There's an __in (documentation here) field lookup that you can use to get all objects for which a certain field matches one of a list of values
objects = SomeModel.objects.filter(id__in=id_set)
Works just the same for lots of different field types (e.g. CharFields), not just id fields.
Related
we have the following data base schema to store different types of data.
DataDefinition: basic information about the new data.
*FieldDefinition: Every DataDefinition has some fields. Every field has a type, title, etc, that information is stored here. Every DataDefinition has more than one FieldDefinition associated. I have put '' because we have a lot of different models, one for every kind of field supported.
DataValue, *FieldValues: we store the definition and the values in different models.
With this setup, to retrieve a data from our database we need to do a lot of queries:
Retrieve the DataDefinition.
Retrieve the DataValue.
Retrieve the *FieldDefinition associated to that DataDefinition.
Retrieve all the *FieldValues associated to those *FieldDefinition.
So, if n is the average number of fields of a DataDefinition, we need to make 2*n+2 queries to the database to retrieve a single value.
We cannot change this setup, but queries are quite slow. So to speed it up I have thought the following: store a joined version of the tables. I do not know if this is possible but I cannot think of any other way. Any suggestion?
Update: we are already using prefetch_related and select_related and it's still slow.
Use Case Right now: get an entire data object from the one object value:
someValue = SomeTypeValue.objects.filter(value=value).select_related('DataValue', 'DataDefinition')[0]
# for each *FieldDefinition/*FieldValue model
definition = SomeFieldDefinition.objects.filter(*field_definition__id=someValue.data_value.data_definition.id)
value = SomeFieldValue.objects.filter(*field_definition__id=definition.id)
And with that info you can now build the entire data object.
Django: 1.11.20
Python: 2.7
I like how Django ORM lazy loads related objects in the queryset, but I guess it's quite unpredictable as it is.
The queryset API doesn't keep the related objects when they are used to make a queryset, thereby fetching them again when accessed later.
Suppose I have a ModelA instance (say instance_a) which is a foreign key (say for_a) of some N instances of ModelB. Now I want to perform query on ModelB which has the given ModelA instance as the foreign key.
Django ORM provides two ways:
Using .filter() on ModelB:
b_qs = ModelB.objects.filter(for_a=instance_a)
for instance_b in b_qs:
instance_b.for_a # <-- fetches the same row for ModelA again
Results in 1 + N queries here.
Using reverse relations on ModelA instance:
b_qs = instance_a.for_a_set.all()
for instance_b in b_qs:
instance_b.for_a # <-- this uses the instance_a from memory
Results in 1 query only here.
While the second way can be used to achieve the result, it's not part of the standard API and not useable for every scenario. For example, if I have instances of 2 foreign keys of ModelB (say, ModelA and ModelC) and I want to get related objects to both of them.
Something like the following works:
ModelB.objects.filter(for_a=instance_a, for_c=instance_c)
I guess it's possible to use .intersection() for this scenario, but I would like a way to achieve this via the standard API. After all, covering such cases would require more code with non-standard queryset functions which may not make sense to the next developer.
So, the first question, is it possible to optimize such scenarios with the the standard API itself?
The second question, if it's not possible right now, can it be added with some tweaks with the QuerySet?
PS: It's my first time asking a question here, so forgive me if I made any mistake.
You could improve the query by using select_related():
b_qs = ModelB.objects.select_related('for_a').filter(for_a=instance_a)
or
b_qs = instance_a.for_a_set.select_related('for_a')
Does that help?
You use .select_related(..) [Django-doc] for ForeignKeys, or .prefetch_related(..) [Django-doc] for something-to-many relations.
With .select_related(..) you will make a LEFT OUTER JOIN at the database side, and fetch records for the two objects, and thus do the deserialization to the proper objects.
ModelB.objects.select_related('for_a').filter(for_a=instance_a)
For relations that are one-to-many (so a reversed ForeignKey), or ManyToManyFields, this is not a good idea, since it could result in a large amount of duplicate objects that are retrieved. This would result in a large answer from the database, and a lot of work at the Python end to deserialize these objects. .prefetch_related will make individual queries, and then do the linking itself.
I used GAE and NDB for a project. I just noticed that if I create several objects, and then I retrieve the list of these objects the order is not preserved (i use the fetch() on the object).
This is a screenshot of the admin page, which shows the same problem:
as you may (if it's too small here is the link) see i've several sessions. Now, i created the sessions that have as name day in order, from 0 to 7.
But as you see the order is not preserved.
I checked and actually the keys are not incremental. Neither the id (id should be incremental, shouldn't it? but anyway in some classes, not this one, I used a hand-made key, so there will be no id).
Is there a way to preserve insertion order?
(or it's just a strange behaviour? or it's my bad?)
PS: if you want to have a look at the code: this is the session model which extends this class i made
Neither keys nor ids are strictly incremental (and incremental by one) in ndb. You can set your own ids and assure they autoincrement properly.
Or you can add to your model(s) a DateTimeProperty:
created = ndb.DateTimeProperty(auto_now_add=True)
And in your view you can use a filter to sort the entities by the date of insertion, for ex:
posts = Post.query().order(-Post.created).fetch()
which will order and fetch your (let's say) Post entities in the descending order of insertion dates.
It's not expected that the order would be preserved unless you perform a query that would retrieve then in a particular order.
What makes you think they should be ordered?
Say I have RootEntity, AEntity(child of RootEntity), BEntity(child of AEntity).
class RootEntity(ndb.Model):
rtp = ndb.StringProperty()
class AEntity(ndb.Model):
ap = ndb.IntegerProperty()
class BEntity(ndb.Model):
bp = ndb.StringProperty()
So in different handlers I need to get instances of BEntity with specific ancestor(instance of AEntity).
There is a my query: BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", AEntity.query(ancestor = ndb.Key("RootEntity", 1)).filter(AEntity.ap == int(some_value)).get().key.integer_id()))
How I can to optimize this query? Make it better, may be less sophisticated?
Upd:
This query is a part of function with #ndb.transactional decorator.
You should not use Entity Groups to represent entity relationships.
Entity groups have a special purpose: to define the scope of transactions. They give you ability to update multiple entities transactionally, as long as they are a part of the same entity group (this limitation has been somewhat relaxed with the new XG transactions). They also allow you to use queries within transactions (not available via XG transactions).
The downside of entity groups is that they have an update limitation of 1 write/second.
In your case my suggestion would be to use separate entities and make references between them. The reference should be a Key of the referenced entity as this is type-safe.
Regarding query simplicity: GAE unfortunately does not support JOINs or reference (multi-entity) queries, so you would still need to combine multiple queries together (as you do now).
There is a give and take with ancestor queries. They are a more verbose and messy to deal with but you get a better structure to your data and consistency in your queries.
To simplify this, if your handler knows the BEntity you want to get, just pass around the key.urlsafe() encoded key, it already has all of your ancestor information encoded.
If this is not possible, try possibly restructuring your data. Since these objects are all of the same ancestor, they belong to the same entity group, thus at most you can insert/update ~1 time per second for objects in that entity group. If you require higher throughput or do not require consistent ancestral queries, then try using ndb.KeyProperty to link entities with a reference to a parent rather than as an ancestor. Then you'd only need to get a single parent to query on rather than the parent and the parent's parent.
You should also try and use IDs whenever possible, so you can avoid having to filter for entities in your datastore by properties and just reference them by ID:
BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", int(some_value)))
Here, int(some_value) is the integer ID of the AEntity you used when you created that object. Just be sure that you can ensure the IDs you manually create/use will be unique across all instances of that Model that share the same parent.
EDIT:
To clarify, my last example should have been made more clear in that I was suggesting to restructure the data such that int(some_value) be used as the integer ID of the AEntity rather than storing is as a separate property of the Entity - if possible of course. From the example given, a query is performed for the AEntity objects that have a given integer field value of int(some_value) and executed with a get() - implying that you will always expect a single value return for that integer ID making it a good candidate to use as the integer ID for the key of that object eliminating the need for a query.
I have an appengine project written in Python.
I use a model with a tags = ndb.StringProperty(repeated=True).
What I want is, given a list of tags, search for all the objects that have every tag in the list.
My problem is that the list may contain any number of tags.
What should I do?
When you make a query on a list property, it actually creates a set of subqueries at the datastore level. The maximum number of subqueries that can be spawned by a single query is 30. Thus, if your list has more that 30 elements, you will get an exception.
In order to tackle this issue, either you will have to change your database model or create multiple queries based on the number of list elements you have and then combine the results. Both these approaches need to be handled by your code.
Update: In case you need all the tags in the list to match the list property in your model, then you can create your basic query and then append AND operators in a loop (as marcadian describes). For example:
qry = YourModel.query()
qry = qry.filter(YourModel.tags == tag[i]) for enumerate(tags)
But, as I mentioned earlier you should be careful of the length of the list property in your model and your indexes configuration in order to avoid problems like index explosion. For more information about this, you may check:
Datastore Indexes
Index Selection and Advanced Search