Mongo Query that always returns zero documents - python

It's possible to write a query that always returns all of the elements in a collection, to use pymongo as an example:
MongoClient()["database"]["collection"].find({})
However, due to the structure of my code, I would quite like to be able to construct a query that does the opposite, a query that will necessarily return zero elements in all situations:
MongoClient()["database"]["collection"].find(null_query)
How can I define null_query, such that this is correct?

You can ask for any field to be in an empty list. It seems reasonable to use the _id field for this:
db.collection.find({_id: {$in: []}})
If you want a shorter query you don't need to use the _id field
at all:
db.collection.find({_:{$in:[]}})
Alternative if MongoDB version >= 3.4:
Arguably one can also ask if the _id field does not exists, which has been suggested by #Marco13:
db.collection.find({_id: {$exists: false}})
However, this assumes that all documents have the _id field, which is not necessarily true for MongoDB versions before 3.4 where a collection could be created with db.createCollection("mycol", {autoIndexID : false}) so all documents were not automatically given an _id field.

Related

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

How to do ...ON DUPLICATE KEY UPDATE... in django

Is there a way to do the following in django's ORM?
INSERT INTO mytable
VALUES (1,2,3)
ON DUPLICATE KEY
UPDATE field=4
I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults. Usually I use the following approach, but it takes two queries instead of one:
item = Item(id=1)
item.update(**fields)
item.save()
Is there another way to do this?
I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults.
update_or_create should provide the behavior you're looking for.
Item.objects.update_or_create(
id=1,
defaults=fields,
)
It returns the same (object, created) tuple as get_or_create.
Note that this will still perform two queries, but only in the event the record does not already exist (as is the case with get_or_create). If that is for some reason unacceptable, you will likely be stuck writing raw SQL to handle this, which would be unfortunate in terms of readability and maintainability.
I think get_or_create() is still the answer, but only specify the pk field(s).
item, _ = Item.objects.get_or_create(id=1)
item.update(**fields)
item.save()
Django 4.1 has added the support for INSERT...ON DUPLICATE KEY UPDATE query. It will update the fields in case the unique validation fails.
Example of above in a single query:
# Let's say we have an Item model with unique on key
items = [
Item(key='foobar', value=10),
Item(key='foobaz', value=20),
]
# this function will create 2 rows in a single SQL query
Item.objects.bulk_create(items)
# this time it will update the value for foobar
# and create new row for barbaz
# all in a single SQL query
items = [
Item(key='foobar', value=30),
Item(key='barbaz', value=50),
]
Item.objects.bulk_create(
items,
update_conflicts=True,
update_fields=['rate']
)

How to check a document includes a list of fields

I have a mongodb collection that has documents that include both required and non-required data. I know how to create a query using the $exists operator to check if a field exists, however I do not want to define required field within the query, as the list is both long and subject to change (and is define elsewhere).
The following is great for checking a known field:
db.collectionofstuff.find({fieldIneed:{$exists:False}})
However I want something that function like this:
Using this Config file:
datadescriptorjson = {"thing1": {"count": 2,"range": 3},"thing2":{"pace": 12.5, "consistency": "angry"}}
create a query find/aggregation that looks something like this:
db.collectionofstuff.find({<list of fields from datadescriptorjson>:{$exists:Falze}})
I am not aware of anyway to do it directly with either the aggregation framework or using a simple find.
There is no such function, you will have to test each field manually. You can of course loop over your config data and recreate a query out of this. However, this should be something you do in your application.

Modify the order in which properties are displayed in MongoDB

I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.

'Stringing together' a pymongo query based on a set of conditions

I have a set of conditions that I need to use to retrieve some data from a mongodb database (using pymongo). Some of these conditions are optional, and others may have more than one possible value.
I'm wondering if there is a way of 'dynamically' constructing a pymongo query based on these conditions (instead of creating individual queries for each possible combination of conditions).
For example, assume that I have one query which has to be constrained to the following conditions:
tag contains any of this, is, a, tag
user is johnsmith
date_published is before today
...whereas another query may only be constrained to the following:
user is johnsmith
date_published is after today
Summary: Instead of having to create every possible combination of conditions, is there a way of stringing conditions together to form a query in pymongo?
A PyMongo query is just a Python dictionary, so you can use all the usual techniques to build one on the fly:
def find_things(tags=None, user=None, published_since=None):
# all queries begin with something common, which may
# be an empty dict, but here's an example
query = {
'is_published': True
}
if tags:
# assume that it is an array of strings
query['tags'] = {'$in': tags}
if user:
# assume that it is a string
query['user'] = user
if published_since:
# assume that it is a datetime.datetime
query['date_published'] = {'$gte': published_since}
# etc...
return db.collection.find(query)
The actual logic you implement is obviously dependent on what you want to vary your find calls by, these are just a few examples. You will also want to validate the input if it is coming from an untrusted source (e.g. a web application form, URL parameters, etc).

Categories

Resources