Mongoengine... How can I compare two fields? - python

For example..
class Example(Document):
up = IntField()
down = IntField()
and.. I want to retrieve documents whose up field is greater or equal to down.
But.. this is issue.
My wrong query code would be..
Example.objects(up__gte=down)
How can I use a field that resides in mongodb not python code as a queryset value?

Simple answer: not possible. Something like WHERE A = B in SQL is not doable in an efficient way in MongoDB (apart from using the $where clause which should be avoided).

this may be what you wanted::
db.myCollection.find( { $where: "this.credits == this.debits" } );
have a look at: http://docs.mongodb.org/manual/reference/operator/query/where/
but I donot know how to use it in mongoengine.

Related

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

query.group_by in Django 1.9

I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?
You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.

Variable interpolation in python/django, django query filters [duplicate]

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

sort by count in json

I'm using tastypie to create json from my django models however I'm running into a problem that I think should have a simple fix.
I have an object Blogs wich has Comment object children. I want to be able to do something like this with my json:
/api/v1/blogs/?order_by=comment_count
But I can't figure out how to sort on a field that's not part of the original comment/ blog model. I create comment_count myself in a dehydrate method that just takes the array of comments and returns comments.count()
Any help would be much appreciated - I can't seem to find any explanation.
If I understood correctly this should help:
Blog.objects.annotate(comment_count=Count('comments')).order_by('comment_count')
You might be able to do it with extra like something like:
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
order_by = ['-entry_count'],
)
I haven't tested this, but it should work. The caveat is it will only work with a relational database.

How to query a couchdb view using a composite key?

I have a couchdb view "record_by_date_product" with the following definition:
function(doc) {
emit([doc.logtime, doc.product_id], doc);
}
I am trying to run a query which is something like:
(logtime > fromdate & logtime < todate) & product_id in (1,2,6)
Is this possible with this view?
I am also using couchdb python library to access couchdb. Here is a code snippet:
server = couchdb.Server()
db = server['mydb']
results = db.view('_design/record_by_date_product/_view/record_by_date_product')
This page http://packages.python.org/CouchDB/client.html#viewresults specifies that we can use a startkey and endkey. But I am not able to get it working.
Thanks
I think I just found the exact answer:
Design a view 'sampleview' which is like:
{
"records_by_date_product": {
"map": "function(doc) {\n emit([doc.prod_id, doc.logtime], doc);\n}"
}
}
Let us say that the query parameters are:
prod_id in [1,3]
from_date = '2010-01-01 00:00:00'
to_date = '2010-01-02 00:00:00'
Then you will have to run 2 separate queries on the same view:
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\["1,2010-01-01%2000:00:00"\]'&endkey='\[1,"2010-01-02%2000:00:00"\]'
http://localhost:5984/db/_design/sampleview/_view/records_by_date_product?startkey='\[2,"2010-01-01%2000:00:00"\]'&endkey='\[2,"2010-01-02%2000:00:00"\]'
Notice that the same query is run each time except that the prod_id is changed in the second query. The results have to be collated later. Hope this helps!
That exact query is not possible. As the documentation suggests, you can get everything in a view in a particular key range. Views are sorted data structures, so all CouchDB does to fulfill this request is locate the start key and begin returning items until you hit the end key.
The strategy you should use for this query depends on characteristics of the data itself. Most importantly, will you waste a lot of time weeding out items if you use only the first part of the key (logtime) and iterate through those in Python, weeding out items where product_id won't match? If so, you should consider writing another view that is primarily sorted by product_id. If not, go ahead and use the weed-out approach.
How about this solution:
I create a view for each product with logtime as the index.
Access each view if required and filter theresults using the range - [fromdate todate]
Do 3 for each product in the input parameters and collate the results
This has a drawback that for every product we will have to create a view and this looks like a manual process.
Just a thought! Let me know your views.

Categories

Resources