Making complex query with django models - python

I created a view in my database model with 6 joins and 10 columns, and at the moment it shows around 86.000 rows.
I try to query all the rows by objects.all() and then filter according to user interaction (form data sent by POST and then choosing appropriate .filter(*args) querying)
After that I tried to get the length of the queryset by using count() since this method doesnt evaluate the query. But since views don't have indexes on the columns, the count() method takes to long.
I searched for the solution of materializing the view but that isn't possible in mysql.
Then I searched for a solution that might be able to replace the initial .all() by just using the 6 joins and filtering arguments in django rather than creating a view, so the indexes would still be available. But I couldn't find a solution to that problem.
Or maybe combining every row from the view with another table so I can use the index of the other table for faster querying?:
SELECT * FROM View LEFT JOIN Table ON (View.id = Table.id)
I appreciate every answer

Try this below:
from django.db import models
# I think below is your table structure
class Table(models.Model):
pass
class View(models.Model):
table = models.ForeignKey(to=Table)
qs = View.objects.select_related('table').filter(table__isnull=True)
for iterator in qs:
print(qs)
Thanks !

Related

FastAPI in-memory filtering

I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.
The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Django, How to make multiple annotate in a single queryset

I am currently trying to annotate two different number of likes to a User model in Django.
Here's the code I'm using to return the desired querySet
def get_top_user(self):
return User.objects. \
annotate(guide_like=Count('guidelike')).\
annotate(news_like=Count('newslike')).\
values_list('first_name', 'last_name', 'guide_like','news_like').\
order_by('-guide_like')
However, the querySet returns ["Bob", "Miller", 612072, 612072]. As you can see, Django takes the two annotate values and multiply them together and that's why I'm getting 612072.
Is there a way to call multiple annotate in a single querySet without getting these multiplied values.
EDIT: Also tried to add distinct() at the end of the query or distinct=True in each count but the call simply gets into an infinite loop.
This is how django annotate produce sql code: it's do all necessary joins and then group by over all User fields, aggregating with annotation function(count in your case). So, it joins users with all their guide likes and then with all news likes and then simply counts number of rows produced per user.
If you can, you should use raw querysets, or extra Queryset method. E.g:
User.objects.all().extra(select={
'guide_likes': 'select count(*) from tbl_guide_likes where user_id=tbl_users.id',
'news_like': 'select count(*) from tbl_news_likes where user_id=tbl_users.id'
}).\
values_list('first_name', 'last_name', 'guide_like','news_like')
For more flexibility you can use select_params parameter of extra method for providing names of tables(which you can get through Model._meta). By the way this is very unconvenient and hackish method.
Sooner or later your logic become more complicated and then you should remove it from python code to sql(stored functions/procedures) and raw queries.

Django pagination. What if I have 1 million of rows?

The official Django Documentation gives us something like this:
from django.core.paginator import Paginator
my_list = MyModel.objects.all()
p = Paginator(my_list, 10)
But. What if I have to paginate 1 million of rows? It's not so efficient to load the 1 million rows with MyModel.objects.all() every time I want to view a single paginated page.
Is there a more efficient way to do this without the need of call objects.all() to make a simple pagination?
MyModel.objects.all() doesn't actually load all of the objects. It could potentially load all of them, but until you actually perform an action that requires it to be evaluated, it won't do anything.
The Paginator will almost certainly add some limits on that query set. For example, using array-slicing notation, it can create a new object, like this
my_list = MyModel.objects.all()
smaller_list = my_list[100:200]
That will create a different query set, which will only request 100 items from the database. Or calling .count() on the original query set, which will just instruct the database to return the number of rows in the table.
You would have to do something that requires all of the objects to be retrieved, like calling
list(my_list)
to get 1000000 rows to be transferred from the database to Python.

Variable interpolation in python/django, django query filters [duplicate]

Given a class:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=20)
Is it possible, and if so how, to have a QuerySet that filters based on dynamic arguments? For example:
# Instead of:
Person.objects.filter(name__startswith='B')
# ... and:
Person.objects.filter(name__endswith='B')
# ... is there some way, given:
filter_by = '{0}__{1}'.format('name', 'startswith')
filter_value = 'B'
# ... that you can run the equivalent of this?
Person.objects.filter(filter_by=filter_value)
# ... which will throw an exception, since `filter_by` is not
# an attribute of `Person`.
Python's argument expansion may be used to solve this problem:
kwargs = {
'{0}__{1}'.format('name', 'startswith'): 'A',
'{0}__{1}'.format('name', 'endswith'): 'Z'
}
Person.objects.filter(**kwargs)
This is a very common and useful Python idiom.
A simplified example:
In a Django survey app, I wanted an HTML select list showing registered users. But because we have 5000 registered users, I needed a way to filter that list based on query criteria (such as just people who completed a certain workshop). In order for the survey element to be re-usable, I needed for the person creating the survey question to be able to attach those criteria to that question (don't want to hard-code the query into the app).
The solution I came up with isn't 100% user friendly (requires help from a tech person to create the query) but it does solve the problem. When creating the question, the editor can enter a dictionary into a custom field, e.g.:
{'is_staff':True,'last_name__startswith':'A',}
That string is stored in the database. In the view code, it comes back in as self.question.custom_query . The value of that is a string that looks like a dictionary. We turn it back into a real dictionary with eval() and then stuff it into the queryset with **kwargs:
kwargs = eval(self.question.custom_query)
user_list = User.objects.filter(**kwargs).order_by("last_name")
Additionally to extend on previous answer that made some requests for further code elements I am adding some working code that I am using
in my code with Q. Let's say that I in my request it is possible to have or not filter on fields like:
publisher_id
date_from
date_until
Those fields can appear in query but they may also be missed.
This is how I am building filters based on those fields on an aggregated query that cannot be further filtered after the initial queryset execution:
# prepare filters to apply to queryset
filters = {}
if publisher_id:
filters['publisher_id'] = publisher_id
if date_from:
filters['metric_date__gte'] = date_from
if date_until:
filters['metric_date__lte'] = date_until
filter_q = Q(**filters)
queryset = Something.objects.filter(filter_q)...
Hope this helps since I've spent quite some time to dig this up.
Edit:
As an additional benefit, you can use lists too. For previous example, if instead of publisher_id you have a list called publisher_ids, than you could use this piece of code:
if publisher_ids:
filters['publisher_id__in'] = publisher_ids
Django.db.models.Q is exactly what you want in a Django way.
This looks much more understandable to me:
kwargs = {
'name__startswith': 'A',
'name__endswith': 'Z',
***(Add more filters here)***
}
Person.objects.filter(**kwargs)
A really complex search forms usually indicates that a simpler model is trying to dig it's way out.
How, exactly, do you expect to get the values for the column name and operation?
Where do you get the values of 'name' an 'startswith'?
filter_by = '%s__%s' % ('name', 'startswith')
A "search" form? You're going to -- what? -- pick the name from a list of names? Pick the operation from a list of operations? While open-ended, most people find this confusing and hard-to-use.
How many columns have such filters? 6? 12? 18?
A few? A complex pick-list doesn't make sense. A few fields and a few if-statements make sense.
A large number? Your model doesn't sound right. It sounds like the "field" is actually a key to a row in another table, not a column.
Specific filter buttons. Wait... That's the way the Django admin works. Specific filters are turned into buttons. And the same analysis as above applies. A few filters make sense. A large number of filters usually means a kind of first normal form violation.
A lot of similar fields often means there should have been more rows and fewer fields.

Categories

Resources