How to check a document includes a list of fields - python

I have a mongodb collection that has documents that include both required and non-required data. I know how to create a query using the $exists operator to check if a field exists, however I do not want to define required field within the query, as the list is both long and subject to change (and is define elsewhere).
The following is great for checking a known field:
db.collectionofstuff.find({fieldIneed:{$exists:False}})
However I want something that function like this:
Using this Config file:
datadescriptorjson = {"thing1": {"count": 2,"range": 3},"thing2":{"pace": 12.5, "consistency": "angry"}}
create a query find/aggregation that looks something like this:
db.collectionofstuff.find({<list of fields from datadescriptorjson>:{$exists:Falze}})
I am not aware of anyway to do it directly with either the aggregation framework or using a simple find.

There is no such function, you will have to test each field manually. You can of course loop over your config data and recreate a query out of this. However, this should be something you do in your application.

Related

Refactorable database queries

Say I have category="foo" and a NoSQL query={"category"=category}. Whenever I refactor my variable name of category, I need to manually change it inside the query if I want to adopt it.
In Python 3.8+ I'm able to get the variable name as a string via the variable itself.
Now I could use query={f"{category=}".split("=")[0]=category}. Now refactoring changes the query too. This applies to any database queries or statements (SQL etc.).
Would this be bad practice? Not just concerning Python but any language where this is possible.
Would this be bad practice?
Yes, the names of local variables do not need to correlate with the fields in data stores.
You should be able to retrieve a record and filter on its fields with any python variable, no matter its name or if its nested in a larger data structure.
In pseudocode:
connection = datastore.connect(...)
# passing a string directly
connection.fetch({"category": "fruit"})
# passing a string variable
category_to_fetch = "vegetable"
connection.fetch({"category": category_to_fetch})
# something more exotic like a previous list of records
r = [("fish",)]
connection.fetch({"category": r[0][0]})
# or even a premade filter dictionary
filter = {"category": "meat"}
connection.fetch(filter)

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

How to search for values in TinyDB

I would like to search my database for a value. I know you can do this for a key with db.search() but there does not seem to be any kind of similar function for searching for a value
I've tried using the contains() function but I have the same issue. It checks if the key is contained in the database. I would like to know if a certain value is contained in the database.
I would like to do something like this that would search for values in tinydb
db.search('value')
If I was able to execute the above command and get the value(if it does exist) or Nothing if it doesn't that would be ideal. Alternatively, if the able returned True or False accordingly, that would be fine as well
I don't know if this is what you are looking for but which the following command you can check for a specific field value:
from tinydb import Query
User = Query()
db.search(User.field_name == 'value')
I'm new here (doing some reading to see if TinyDB would even be applicable for my use case) so perhaps wrong, and also aware that this question is a little old. But I wonder if you can't address this by iterating over each field and searching within for your value. Then, you couldget the key or field wherein a value match was located.

Mongo Query that always returns zero documents

It's possible to write a query that always returns all of the elements in a collection, to use pymongo as an example:
MongoClient()["database"]["collection"].find({})
However, due to the structure of my code, I would quite like to be able to construct a query that does the opposite, a query that will necessarily return zero elements in all situations:
MongoClient()["database"]["collection"].find(null_query)
How can I define null_query, such that this is correct?
You can ask for any field to be in an empty list. It seems reasonable to use the _id field for this:
db.collection.find({_id: {$in: []}})
If you want a shorter query you don't need to use the _id field
at all:
db.collection.find({_:{$in:[]}})
Alternative if MongoDB version >= 3.4:
Arguably one can also ask if the _id field does not exists, which has been suggested by #Marco13:
db.collection.find({_id: {$exists: false}})
However, this assumes that all documents have the _id field, which is not necessarily true for MongoDB versions before 3.4 where a collection could be created with db.createCollection("mycol", {autoIndexID : false}) so all documents were not automatically given an _id field.

'Stringing together' a pymongo query based on a set of conditions

I have a set of conditions that I need to use to retrieve some data from a mongodb database (using pymongo). Some of these conditions are optional, and others may have more than one possible value.
I'm wondering if there is a way of 'dynamically' constructing a pymongo query based on these conditions (instead of creating individual queries for each possible combination of conditions).
For example, assume that I have one query which has to be constrained to the following conditions:
tag contains any of this, is, a, tag
user is johnsmith
date_published is before today
...whereas another query may only be constrained to the following:
user is johnsmith
date_published is after today
Summary: Instead of having to create every possible combination of conditions, is there a way of stringing conditions together to form a query in pymongo?
A PyMongo query is just a Python dictionary, so you can use all the usual techniques to build one on the fly:
def find_things(tags=None, user=None, published_since=None):
# all queries begin with something common, which may
# be an empty dict, but here's an example
query = {
'is_published': True
}
if tags:
# assume that it is an array of strings
query['tags'] = {'$in': tags}
if user:
# assume that it is a string
query['user'] = user
if published_since:
# assume that it is a datetime.datetime
query['date_published'] = {'$gte': published_since}
# etc...
return db.collection.find(query)
The actual logic you implement is obviously dependent on what you want to vary your find calls by, these are just a few examples. You will also want to validate the input if it is coming from an untrusted source (e.g. a web application form, URL parameters, etc).

Categories

Resources