Ndb default order does not preserve insertion order

Ndb default order does not preserve insertion order - python

I used GAE and NDB for a project. I just noticed that if I create several objects, and then I retrieve the list of these objects the order is not preserved (i use the fetch() on the object).
This is a screenshot of the admin page, which shows the same problem:
as you may (if it's too small here is the link) see i've several sessions. Now, i created the sessions that have as name day in order, from 0 to 7.
But as you see the order is not preserved.
I checked and actually the keys are not incremental. Neither the id (id should be incremental, shouldn't it? but anyway in some classes, not this one, I used a hand-made key, so there will be no id).
Is there a way to preserve insertion order?
(or it's just a strange behaviour? or it's my bad?)
PS: if you want to have a look at the code: this is the session model which extends this class i made

Neither keys nor ids are strictly incremental (and incremental by one) in ndb. You can set your own ids and assure they autoincrement properly.
Or you can add to your model(s) a DateTimeProperty:
created = ndb.DateTimeProperty(auto_now_add=True)
And in your view you can use a filter to sort the entities by the date of insertion, for ex:
posts = Post.query().order(-Post.created).fetch()
which will order and fetch your (let's say) Post entities in the descending order of insertion dates.

It's not expected that the order would be preserved unless you perform a query that would retrieve then in a particular order.
What makes you think they should be ordered?

Related

Google app engine: better way to make query

Say I have RootEntity, AEntity(child of RootEntity), BEntity(child of AEntity).
class RootEntity(ndb.Model):
rtp = ndb.StringProperty()
class AEntity(ndb.Model):
ap = ndb.IntegerProperty()
class BEntity(ndb.Model):
bp = ndb.StringProperty()
So in different handlers I need to get instances of BEntity with specific ancestor(instance of AEntity).
There is a my query: BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", AEntity.query(ancestor = ndb.Key("RootEntity", 1)).filter(AEntity.ap == int(some_value)).get().key.integer_id()))
How I can to optimize this query? Make it better, may be less sophisticated?
Upd:
This query is a part of function with #ndb.transactional decorator.

You should not use Entity Groups to represent entity relationships.
Entity groups have a special purpose: to define the scope of transactions. They give you ability to update multiple entities transactionally, as long as they are a part of the same entity group (this limitation has been somewhat relaxed with the new XG transactions). They also allow you to use queries within transactions (not available via XG transactions).
The downside of entity groups is that they have an update limitation of 1 write/second.
In your case my suggestion would be to use separate entities and make references between them. The reference should be a Key of the referenced entity as this is type-safe.
Regarding query simplicity: GAE unfortunately does not support JOINs or reference (multi-entity) queries, so you would still need to combine multiple queries together (as you do now).

There is a give and take with ancestor queries. They are a more verbose and messy to deal with but you get a better structure to your data and consistency in your queries.
To simplify this, if your handler knows the BEntity you want to get, just pass around the key.urlsafe() encoded key, it already has all of your ancestor information encoded.
If this is not possible, try possibly restructuring your data. Since these objects are all of the same ancestor, they belong to the same entity group, thus at most you can insert/update ~1 time per second for objects in that entity group. If you require higher throughput or do not require consistent ancestral queries, then try using ndb.KeyProperty to link entities with a reference to a parent rather than as an ancestor. Then you'd only need to get a single parent to query on rather than the parent and the parent's parent.
You should also try and use IDs whenever possible, so you can avoid having to filter for entities in your datastore by properties and just reference them by ID:
BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", int(some_value)))
Here, int(some_value) is the integer ID of the AEntity you used when you created that object. Just be sure that you can ensure the IDs you manually create/use will be unique across all instances of that Model that share the same parent.
EDIT:
To clarify, my last example should have been made more clear in that I was suggesting to restructure the data such that int(some_value) be used as the integer ID of the AEntity rather than storing is as a separate property of the Entity - if possible of course. From the example given, a query is performed for the AEntity objects that have a given integer field value of int(some_value) and executed with a get() - implying that you will always expect a single value return for that integer ID making it a good candidate to use as the integer ID for the key of that object eliminating the need for a query.

Appengine - ndb query with unknown list size

I have an appengine project written in Python.
I use a model with a tags = ndb.StringProperty(repeated=True).
What I want is, given a list of tags, search for all the objects that have every tag in the list.
My problem is that the list may contain any number of tags.
What should I do?

When you make a query on a list property, it actually creates a set of subqueries at the datastore level. The maximum number of subqueries that can be spawned by a single query is 30. Thus, if your list has more that 30 elements, you will get an exception.
In order to tackle this issue, either you will have to change your database model or create multiple queries based on the number of list elements you have and then combine the results. Both these approaches need to be handled by your code.
Update: In case you need all the tags in the list to match the list property in your model, then you can create your basic query and then append AND operators in a loop (as marcadian describes). For example:
qry = YourModel.query()
qry = qry.filter(YourModel.tags == tag[i]) for enumerate(tags)
But, as I mentioned earlier you should be careful of the length of the list property in your model and your indexes configuration in order to avoid problems like index explosion. For more information about this, you may check:
Datastore Indexes
Index Selection and Advanced Search

Getting a list of results, 1 for each foreign key

I have a model, Reading, which has a foreign key, Type. I'm trying to get a reading for each type that I have, using the following code:
for type in Type.objects.all():
readings = Reading.objects.filter(
type=type.pk)
if readings.exists():
reading_list.append(readings[0])
The problem with this, of course, is that it hits the database for each sensor reading. I've played around with some queries to try to optimize this to a single database call, but none of them seem efficient. .values for instance will provide me a list of readings grouped by type, but it will give me EVERY reading for each type, and I have to filter them with Python in memory. This is out of the question, as we're dealing with potentially millions of readings.

if you use PostgreSQL as your DB backend you can do this in one-line with something like:
Reading.objects.order_by('type__pk', 'any_other_order_field').distinct('type__pk')
Note that the field on which distinct happens must always be the first argument in the order_by method. Feel free to change type__pk with the actuall field you want to order types on (e.g. type__name if the Type model has a name property). You can read more about distinct here https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct.
If you do not use PostgreSQL you could use the prefetch_related method for this purpose:
#reading_set could be replaced with whatever your reverse relation name actually is
for type in Type.objects.prefetch_related('reading_set').all():
readings = type.reading_set.all()
if len(readings):
reading_list.append(readings[0])
The above will perform only 2 queries in total. Note I use len() so that no extra query is performed when counting the objects. You can read more about prefetch_related here https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related.
On the downside of this approach is you first retrieve all related objects from the DB and then just get the first.
The above code is not tested, but I hope it will at least point you towards the right direction.

Django get objects for many IDs

I have a set of ID's that I'd like to retrieve all of the objects for. My current solution works, however it hammers the database with a a bunch of get queries inside a loop.
objects = [SomeModel.objects.get(id=id_) for id_ in id_set]
Is there a more efficient way of going about this?

There's an __in (documentation here) field lookup that you can use to get all objects for which a certain field matches one of a list of values
objects = SomeModel.objects.filter(id__in=id_set)
Works just the same for lots of different field types (e.g. CharFields), not just id fields.

to store the data in the tables in django app without default ordering

my code:
for name, count1 in list:
s = Keywords(file_name=name,frequency_count=count1)
s.save()
this is a section of code in views.py file of my app created in django. In this section, it is the way I'm storing the data in the table. This data is stored in the increasing order of the filenames which I do not want. I tried using order_by() function without any arguments but no effect. The data is still stored in increasing order. Please suggest some solution.
I'm new to django and sqlite3. So,
please help.

Order is inextricable. The queryset is always going to be ordered by something, even if it's just the default of the primary key. This is really a database thing more than a Django thing. Databases inherently order data by the primary key unless told to order by something else. If you want truly random ordering, then you can use order_by('?'), but that significantly increases the work the the database has to do.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.