In the following code every amount = u.filter(email__icontains=email) django performs another query for my filter, how can I avoid these queries?
u = User.objects.all()
shares = Share.objects.all()
for o in shares:
email = o.email
type = "CASH"
amount = u.filter(email__icontains=email).count()
This whole piece of code is very inefficient and some more context could help.
What do you need u = User.objects.all() for?
calling QuerySet.filter() triggers a query. By calling filter() you just specify some criteria for recordset you want to obtain. How else are you supposed to get the records matching your conditions if not via running a DB query? If you want Django not to run a DB query then you probably dont know what are you doing.
filtering with filter(email__icontains=email) is very inefficient - database cant use any index and your query will be very slow. Cant you just replace that by filter(email=email)?
calling a bunch of queries in a loop is suboptimal.
So again - some context of what are you trying to do would be helpful as someone could find a better solution for your problem.
Related
I am new to sqlalchemy and I have a question regarding my code:
query = db.query(Purchase.name,
func.sum(Purchase.price).label('total'),
func.count(Purchase.name).label('count'))
if date_start and date_end:
query = query.filter(Purchase.date >= date_start,
Purchase.date <= date_end)
query = query.group_by(Purchase.name)\
.order_by(sqlalchemy.desc('total'))[:limit]
result = [ItemDict(name=item.name, total=item.total,
count=item.count) for item in query]
Do I understand correctly that:
In this program there will be only one query to the database?
When we work with Query objects, we do NOT make additional queries to the database (i.e. the expression in the list does not make additional queries)?
Ad. 1: Yes, there should be only one query (there may also be a small query that does the "ping" command depending on your pool configuration)
Ad. 2: Additional queries depends on joining strategy. If you filter only one model without joining, you should always have single query. However, if you join other models and use lazy joining strategy, you can have many implicit additional queries (my short post about it)
You can use this smart context manager to count number of queries: How to count sqlalchemy queries in unit tests.
I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.
The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.
Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()
I have an operation in one of my views
order_details = [order.get_order_details() for order in orders]
Now order.get_order_details() runs one database query. So for current situation. Depending on size of orders the number of database access will be huge.
Before having to use cache, is there anything that can speed this up?
Is it possible to merge all the select operations into one single database operation?
Will making it an atomic transaction using transaction.atomic() increase any performance? because technically the query will be sent at once instead of individually, right?
Edit: is there any design changes/ pattern that will avoid this situation?
Edit:
def get_order_details(self):
items = Item.objects.filter(order=self)
item_list = [item.serialize for item in items]
return {
'order_details': self.serialize,
'item_list': item_list
}
Assuming orders is a QuerySet, e.g. the result of Order.objects.filter(...), add:
.prefetch_related(Prefetch('item_set'))
to the end of the query. Then use:
items = self.item_set
in get_order_details.
See the docs here: https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related
The follow results in 4 db hits. Since lines 3 & 4 are just filtering what I grabbed in line 2, what do I need to change so it doesn't hit the db again?
page = get_object_or_404(Page, url__iexact = page_url)
installed_modules = page.module_set.all()
navigation_links = installed_modules.filter(module_type=ModuleTypeCode.MODAL)
module_map = dict([(m.module_static_object.key, m) for m in installed_modules])
Django querysets are lazy, so the following line doesn't hit the database:
installed_modules = page.module_set.all()
The query isn't executed until you iterate over the queryset in this line:
module_map = dict([(m.module_static_object.key, m) for m in installed_modules])
So the code you posted only looks like 3 database queries hits to me, not 4.
Since you are fetching all of the modules from the database already, you could filter the navigation links using a list comprehension instead of another query:
navigation_links = [m for m in installed_modules if m.module_type == ModuleTypeCode.MODAL]
You would have to do some benchmarking to see if this improved performance. It looks like it could be premature optimisation to me.
You might be doing one database query for each module where you fetch module_static_object.key. In this case, you could use select_related.
This is a case of premature optimization. 4 DB queries for a page load is not bad. The idea is to use as few queries as possible, but you're never going to get it down to 1 in every scenario. The code you have there doesn't seem off-the-wall in terms of needlessly creating queries, so it's highly probable that it's already as optimized as you'll be able to make it.