Appengine - ndb query with unknown list size

Appengine - ndb query with unknown list size - python

I have an appengine project written in Python.
I use a model with a tags = ndb.StringProperty(repeated=True).
What I want is, given a list of tags, search for all the objects that have every tag in the list.
My problem is that the list may contain any number of tags.
What should I do?

When you make a query on a list property, it actually creates a set of subqueries at the datastore level. The maximum number of subqueries that can be spawned by a single query is 30. Thus, if your list has more that 30 elements, you will get an exception.
In order to tackle this issue, either you will have to change your database model or create multiple queries based on the number of list elements you have and then combine the results. Both these approaches need to be handled by your code.
Update: In case you need all the tags in the list to match the list property in your model, then you can create your basic query and then append AND operators in a loop (as marcadian describes). For example:
qry = YourModel.query()
qry = qry.filter(YourModel.tags == tag[i]) for enumerate(tags)
But, as I mentioned earlier you should be careful of the length of the list property in your model and your indexes configuration in order to avoid problems like index explosion. For more information about this, you may check:
Datastore Indexes
Index Selection and Advanced Search

Related

django haystack: how to iterate over all indexed elements

I am trying to iterate over a Search Queryset with haystack, but it throws me this error:
Result window is too large, from + size must be less than or equal to: [10000] but was [11010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.
Is there a way to iterate over all indexed elements? (let's say I have several million records).

max_result_window is an index setting that you can change if you want but most of the time you don't have to, because if you'd like to iterate on all your documents using the search api is not the way you should go.
Try with a scan and scroll api.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
And a personal note: I use elasticsearch with django and I found haystack difficult to use as opposed to elasticsearch-dsl. Try to have a look to elasticsearch-dsl-py. https://github.com/elastic/elasticsearch-dsl-py

Ndb default order does not preserve insertion order

I used GAE and NDB for a project. I just noticed that if I create several objects, and then I retrieve the list of these objects the order is not preserved (i use the fetch() on the object).
This is a screenshot of the admin page, which shows the same problem:
as you may (if it's too small here is the link) see i've several sessions. Now, i created the sessions that have as name day in order, from 0 to 7.
But as you see the order is not preserved.
I checked and actually the keys are not incremental. Neither the id (id should be incremental, shouldn't it? but anyway in some classes, not this one, I used a hand-made key, so there will be no id).
Is there a way to preserve insertion order?
(or it's just a strange behaviour? or it's my bad?)
PS: if you want to have a look at the code: this is the session model which extends this class i made

Neither keys nor ids are strictly incremental (and incremental by one) in ndb. You can set your own ids and assure they autoincrement properly.
Or you can add to your model(s) a DateTimeProperty:
created = ndb.DateTimeProperty(auto_now_add=True)
And in your view you can use a filter to sort the entities by the date of insertion, for ex:
posts = Post.query().order(-Post.created).fetch()
which will order and fetch your (let's say) Post entities in the descending order of insertion dates.

It's not expected that the order would be preserved unless you perform a query that would retrieve then in a particular order.
What makes you think they should be ordered?

Django get objects for many IDs

I have a set of ID's that I'd like to retrieve all of the objects for. My current solution works, however it hammers the database with a a bunch of get queries inside a loop.
objects = [SomeModel.objects.get(id=id_) for id_ in id_set]
Is there a more efficient way of going about this?

There's an __in (documentation here) field lookup that you can use to get all objects for which a certain field matches one of a list of values
objects = SomeModel.objects.filter(id__in=id_set)
Works just the same for lots of different field types (e.g. CharFields), not just id fields.

Pymongo - How to insert document to the front of the collection?

Can I insert document to the front of the collection ? Or is there method like col.find().reverse() that can reverse the sequence of the document set generated by col.find() ?

Unless you're worried about natural order, and in general you shouldn't be, ordering is done when you query. You should think about the documents as being stored without any specific order, but retrieved in an order you (optionally) specify (using .sort(...)).
Indexing can be used, not to force an order or the documents, but to speed up ordering when returning query results (and filtering).
This is true for databases in general, not only mongodb / nosql.
So to address your question: the term "front" is not well-defined.
If you use sort() on your query, to retrieve the documents in a specific order, you can reverse it using sort(field_to_sort_by, -1).

Quicker way of updating subdocuments

My JSON documents (called "i"), have sub documents (called "elements").
I am looping trhough these subdocuments and updating them one at a time. However, to do so (once the value i need is computed), I have mongo scan through all the documents in the database, then through all the subdocuments, and then find the subdocument it needs to update.
I am having major time issues, as I have ~3000 documents and this is taking about 4minutes.
I would like to know if there is a quicker way to do this, without mongo having to scan all the documents but by doing it within the loop.
Here is the code:
for i in db.stuff.find():
for element in i['counts']:
computed_value = element[a] + element[b]
db.stuff.update({'id':i['id'], 'counts.timestamp':element['timestamp']},
{'$set': {'counts.$.total':computed_value}})
I am identifying the overall document by "id" and then the subdocument by its timestamp (which is unique to each subdocument). I need to find a quicker way than this. Thank you for your help.

What indexes do you have on your collection ? This could probably be sped up by creating an index on your embedded documents. You can do this using dot notation -- there's a good explanation and example here.
In your case, you'd do something like
db.stuff.ensureIndex( { "i.elements.timestamp" : 1 });
This will make your searches through embedded documents run much faster.

Your update is based on id (and i assume it is diff from default _id of mongo)
Put index on your id field
You want to set new field for all documents within collection or want to do it only for some matching collection to given criteria? if only for matching collections, use query operator (with index if possible)
dont fetch full document, fetch only those fields which are being used.
What is your avg document size? Use explain and mongostat to understand what is actual bottleneck.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Appengine - ndb query with unknown list size - python

I have an appengine project written in Python. I use a model with a tags = ndb.StringProperty(repeated=True). What I want is, given a list of tags, search for all the objects that have every tag in the list. My problem is that the list may contain any number of tags. What should I do?

Related

django haystack: how to iterate over all indexed elements

Ndb default order does not preserve insertion order

Django get objects for many IDs

Pymongo - How to insert document to the front of the collection?

Quicker way of updating subdocuments

Categories

Resources