Pymongo - How to insert document to the front of the collection?

Pymongo - How to insert document to the front of the collection? - python

Can I insert document to the front of the collection ? Or is there method like col.find().reverse() that can reverse the sequence of the document set generated by col.find() ?

Unless you're worried about natural order, and in general you shouldn't be, ordering is done when you query. You should think about the documents as being stored without any specific order, but retrieved in an order you (optionally) specify (using .sort(...)).
Indexing can be used, not to force an order or the documents, but to speed up ordering when returning query results (and filtering).
This is true for databases in general, not only mongodb / nosql.
So to address your question: the term "front" is not well-defined.
If you use sort() on your query, to retrieve the documents in a specific order, you can reverse it using sort(field_to_sort_by, -1).

Related

PyMongo $regex across all text fields and subfields

I have a rather convoluted Mongo collection and I'm trying to implement detailed matching criteria. I have already created a text index across all fields as follows:
db.create_index([("$**", "text")], name='allTextFields')
I am using this for some straightforward search terms in PyMongo (e.g., "immigration") as follows:
db.find({'$text': {'$search': "immigration"}
However, there are certain terms I need to match that are generic enough as to require regex-type specifications. For instance, I want match all occurrences of "ice" without finding "police" and a variety of other exclusion terms.
Ideally, I could create a regex that would search all fields and subfields (see example below), but I can't figure out how to implement this in PyMongo (or Mongo for that matter).
db.find({all_fields_and_subfields: {'$regex': '^ice\s*', '$options': 'i'}
Does anyone know how to do so?

One way of doing this is to add another field to the documents which contains a concatenation of all the fields you want to search, and $regex on that.
Note that unless your regexes are anchored to the beginning of input, they won't be using indexes (so you'll be doing collection scans).
I am surprised that a full text query for "ice" finds "police", surely that's a bug somewhere.
You may also consider Atlas search instead of full-text search, which is more powerful but proprietary to Atlas.

How to randomise the order of a queryset

Consider the following query:
candidates = Candidate.objects.filter(ElectionID=ElectionIDx)
Objects in this query are ordered by their id field.
How do I randomise the order of the objects in the query? Can it be done using .order_by()?

Yes, you can use the special argument ? with order_by to get randomized queryset:
Candidate.objects.filter(ElectionID=ElectionIDx).order_by('?')
Doc
Note that, depending on the DB backend, the randomization might be slow and expensive. I would suggest you to do the benchmark first. If you feel it's slow, then try finding alternatives, before that go with ? first.

Appengine - ndb query with unknown list size

I have an appengine project written in Python.
I use a model with a tags = ndb.StringProperty(repeated=True).
What I want is, given a list of tags, search for all the objects that have every tag in the list.
My problem is that the list may contain any number of tags.
What should I do?

When you make a query on a list property, it actually creates a set of subqueries at the datastore level. The maximum number of subqueries that can be spawned by a single query is 30. Thus, if your list has more that 30 elements, you will get an exception.
In order to tackle this issue, either you will have to change your database model or create multiple queries based on the number of list elements you have and then combine the results. Both these approaches need to be handled by your code.
Update: In case you need all the tags in the list to match the list property in your model, then you can create your basic query and then append AND operators in a loop (as marcadian describes). For example:
qry = YourModel.query()
qry = qry.filter(YourModel.tags == tag[i]) for enumerate(tags)
But, as I mentioned earlier you should be careful of the length of the list property in your model and your indexes configuration in order to avoid problems like index explosion. For more information about this, you may check:
Datastore Indexes
Index Selection and Advanced Search

Getting a list of results, 1 for each foreign key

I have a model, Reading, which has a foreign key, Type. I'm trying to get a reading for each type that I have, using the following code:
for type in Type.objects.all():
readings = Reading.objects.filter(
type=type.pk)
if readings.exists():
reading_list.append(readings[0])
The problem with this, of course, is that it hits the database for each sensor reading. I've played around with some queries to try to optimize this to a single database call, but none of them seem efficient. .values for instance will provide me a list of readings grouped by type, but it will give me EVERY reading for each type, and I have to filter them with Python in memory. This is out of the question, as we're dealing with potentially millions of readings.

if you use PostgreSQL as your DB backend you can do this in one-line with something like:
Reading.objects.order_by('type__pk', 'any_other_order_field').distinct('type__pk')
Note that the field on which distinct happens must always be the first argument in the order_by method. Feel free to change type__pk with the actuall field you want to order types on (e.g. type__name if the Type model has a name property). You can read more about distinct here https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct.
If you do not use PostgreSQL you could use the prefetch_related method for this purpose:
#reading_set could be replaced with whatever your reverse relation name actually is
for type in Type.objects.prefetch_related('reading_set').all():
readings = type.reading_set.all()
if len(readings):
reading_list.append(readings[0])
The above will perform only 2 queries in total. Note I use len() so that no extra query is performed when counting the objects. You can read more about prefetch_related here https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related.
On the downside of this approach is you first retrieve all related objects from the DB and then just get the first.
The above code is not tested, but I hope it will at least point you towards the right direction.

Quicker way of updating subdocuments

My JSON documents (called "i"), have sub documents (called "elements").
I am looping trhough these subdocuments and updating them one at a time. However, to do so (once the value i need is computed), I have mongo scan through all the documents in the database, then through all the subdocuments, and then find the subdocument it needs to update.
I am having major time issues, as I have ~3000 documents and this is taking about 4minutes.
I would like to know if there is a quicker way to do this, without mongo having to scan all the documents but by doing it within the loop.
Here is the code:
for i in db.stuff.find():
for element in i['counts']:
computed_value = element[a] + element[b]
db.stuff.update({'id':i['id'], 'counts.timestamp':element['timestamp']},
{'$set': {'counts.$.total':computed_value}})
I am identifying the overall document by "id" and then the subdocument by its timestamp (which is unique to each subdocument). I need to find a quicker way than this. Thank you for your help.

What indexes do you have on your collection ? This could probably be sped up by creating an index on your embedded documents. You can do this using dot notation -- there's a good explanation and example here.
In your case, you'd do something like
db.stuff.ensureIndex( { "i.elements.timestamp" : 1 });
This will make your searches through embedded documents run much faster.

Your update is based on id (and i assume it is diff from default _id of mongo)
Put index on your id field
You want to set new field for all documents within collection or want to do it only for some matching collection to given criteria? if only for matching collections, use query operator (with index if possible)
dont fetch full document, fetch only those fields which are being used.
What is your avg document size? Use explain and mongostat to understand what is actual bottleneck.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pymongo - How to insert document to the front of the collection? - python

Can I insert document to the front of the collection ? Or is there method like col.find().reverse() that can reverse the sequence of the document set generated by col.find() ?

Related

PyMongo $regex across all text fields and subfields

How to randomise the order of a queryset

Appengine - ndb query with unknown list size

Getting a list of results, 1 for each foreign key

Quicker way of updating subdocuments

Categories

Resources