Reducing database access when same query on multiple similar objects

Reducing database access when same query on multiple similar objects - python

I have an operation in one of my views
order_details = [order.get_order_details() for order in orders]
Now order.get_order_details() runs one database query. So for current situation. Depending on size of orders the number of database access will be huge.
Before having to use cache, is there anything that can speed this up?
Is it possible to merge all the select operations into one single database operation?
Will making it an atomic transaction using transaction.atomic() increase any performance? because technically the query will be sent at once instead of individually, right?
Edit: is there any design changes/ pattern that will avoid this situation?
Edit:
def get_order_details(self):
items = Item.objects.filter(order=self)
item_list = [item.serialize for item in items]
return {
'order_details': self.serialize,
'item_list': item_list
}

Assuming orders is a QuerySet, e.g. the result of Order.objects.filter(...), add:
.prefetch_related(Prefetch('item_set'))
to the end of the query. Then use:
items = self.item_set
in get_order_details.
See the docs here: https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related

Related

Why accessing Django QuerySet became very slow?

I have a model query in Django:
Query = Details.objects.filter(name__iexact=nameSelected)
I filter it later:
Query2 = Query .filter(title__iexact=title0)
Then I access it using:
...Query2[0][0]...
A few days ago it worked very fast. But now it became at least 20 times slower.
I test it on other PC, it works very fast.
Update: filtering is not the reason of the delay, Query[0][0] is the reason.
Besides that, it became super slow suddenly not over time.
What can make it so slow on my first PC?

Maybe you could try to make a list out of the Queryset when you create it so that you have a real list not only a lazy QS
Query2 = list(Query .filter(title__iexact=title0))

The best way is to avoid loop for filtering the query. What I did is to create a hashmap dictionary
dict0 = {}
Then I added list of items and data that corresponds to that item in the query:
dict0 = dict(zip(title0List, DataList))
Finally I use dict0 instead of query, It boosts the speed at least 10 times for me)

How to do ...ON DUPLICATE KEY UPDATE... in django

Is there a way to do the following in django's ORM?
INSERT INTO mytable
VALUES (1,2,3)
ON DUPLICATE KEY
UPDATE field=4
I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults. Usually I use the following approach, but it takes two queries instead of one:
item = Item(id=1)
item.update(**fields)
item.save()
Is there another way to do this?

I'm familiar with get_or_create, which takes default values, but that doesn't update the record if there are differences in the defaults.
update_or_create should provide the behavior you're looking for.
Item.objects.update_or_create(
id=1,
defaults=fields,
)
It returns the same (object, created) tuple as get_or_create.
Note that this will still perform two queries, but only in the event the record does not already exist (as is the case with get_or_create). If that is for some reason unacceptable, you will likely be stuck writing raw SQL to handle this, which would be unfortunate in terms of readability and maintainability.

I think get_or_create() is still the answer, but only specify the pk field(s).
item, _ = Item.objects.get_or_create(id=1)
item.update(**fields)
item.save()

Django 4.1 has added the support for INSERT...ON DUPLICATE KEY UPDATE query. It will update the fields in case the unique validation fails.
Example of above in a single query:
# Let's say we have an Item model with unique on key
items = [
Item(key='foobar', value=10),
Item(key='foobaz', value=20),
]
# this function will create 2 rows in a single SQL query
Item.objects.bulk_create(items)
# this time it will update the value for foobar
# and create new row for barbaz
# all in a single SQL query
items = [
Item(key='foobar', value=30),
Item(key='barbaz', value=50),
]
Item.objects.bulk_create(
items,
update_conflicts=True,
update_fields=['rate']
)

Django ORM values_list with '__in' filter performance

What is the preferred way to filter query set with '__in' in Django?
providers = Provider.objects.filter(age__gt=10)
consumers = Consumer.objects.filter(consumer__in=providers)
or
providers_ids = Provider.objects.filter(age__gt=10).values_list('id', flat=True)
consumers = Consumer.objects.filter(consumer__in=providers_ids)

These should be totally equivalent. Underneath the hood Django will optimize both of these to a subselect query in SQL. See the QuerySet API reference on in:
This queryset will be evaluated as subselect statement:
SELECT ... WHERE consumer.id IN (SELECT id FROM ... WHERE _ IN _)
However you can force a lookup based on passing in explicit values for the primary keys by calling list on your values_list, like so:
providers_ids = list(Provider.objects.filter(age__gt=10).values_list('id', flat=True))
consumers = Consumer.objects.filter(consumer__in=providers_ids)
This could be more performant in some cases, for example, when you have few providers, but it will be totally dependent on what your data is like and what database you're using. See the "Performance Considerations" note in the link above.

I Agree with Wilduck. However couple of notes
You can combine a filter such as these into one like this:
consumers = Consumer.objects.filter(consumer__age__gt=10)
This would give you the same result set - in a single query.
The second thing, to analyze the generated query, you can use the .query clause at the end.
Example:
print Provider.objects.filter(age__gt=10).query
would print the query the ORM would be generating to fetch the resultset.

How can I reuse objects in django without adding more queries?

In the following code every amount = u.filter(email__icontains=email) django performs another query for my filter, how can I avoid these queries?
u = User.objects.all()
shares = Share.objects.all()
for o in shares:
email = o.email
type = "CASH"
amount = u.filter(email__icontains=email).count()

This whole piece of code is very inefficient and some more context could help.
What do you need u = User.objects.all() for?
calling QuerySet.filter() triggers a query. By calling filter() you just specify some criteria for recordset you want to obtain. How else are you supposed to get the records matching your conditions if not via running a DB query? If you want Django not to run a DB query then you probably dont know what are you doing.
filtering with filter(email__icontains=email) is very inefficient - database cant use any index and your query will be very slow. Cant you just replace that by filter(email=email)?
calling a bunch of queries in a loop is suboptimal.
So again - some context of what are you trying to do would be helpful as someone could find a better solution for your problem.

Django query based on FK — get all, not any

I need to find an order with all order items with status = completed. It looks like this:
FINISHED_STATUSES = [17,18,19]
if active_tab == 'outstanding':
orders = orders.exclude(items__status__in=FINISHED_STATUSES)
However, this query only gives me orders with any order item with a completed status. How would I do the query such that I retrieve only those orders with ALL order items with a completed status?

I think that you need to do raw query here:
Set you orders and items model as Orders and Items:
# raw query
sql = """\
select `orders`.* from `%{orders_table}s` as `orders`
join `%{items_table}s` as `items`
on `items`.`%{item_order_fk}s` = `orders`.`%{order_pk}s`
where `items`.`%{status_field}s` in (%{status_list}s)
group by `orders`.`%{orders_pk}s`
having count(*) = %{status_count)s;
""" % {
"orders_table": Orders._meta.db_table,
"items_table": Items._meta.db_table,
"order_pk": Orders._meta.pk.colum,
"item_order_fk":Items._meta.get_field("order").colum,
"status_field": Items._meta.get_field("status").colum,
"status_list": str(FINISHED_STATUSES)[1:-1],
"status_count": len(FINISHED_STATUSES),
}
orders = Orders.objects.raw(sql)

I was able to get this done by a sort of hackish way. First, I added an additional Boolean column, is_finished. Then, to find an order with at least one non-finished item:
orders = orders.filter(items__status__is_finished=False)
This gives me all un-finished orders.
Doing the opposite of that gets the finished orders:
orders = orders.exclude(items__status__is_finished=False)

Adding the boolean field is a good idea. That way you have your business rules clearly defined in the model.
Now, let's say that you still wanted to do it without resorting to adding fields. This may very well be a requirement given a different set of circumstances. Unfortunately, you can't really use subqueries or arbitrary joins in the Django ORM. You could, however, build up Q objects and make an implicit join in the having clause using filter() and annotate().
from django.db.models.aggregates import Count
from django.db.models import Q
from functools import reduce
from operator import or_
total_items_by_orders = Orders.objects.annotate(
item_count=Count('items'))
finished_items_by_orders = Orders.objects.filter(
items__status__in=FINISHED_STATUSES).annotate(
item_count=Count('items'))
orders = total_items_by_orders.exclude(
reduce(or_, (Q(id=o.id, item_count=o.item_count)
for o in finished_items_by_orders)))
Note that using raw SQL, while less elegant, would usually be more efficient.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reducing database access when same query on multiple similar objects - python

Assuming orders is a QuerySet, e.g. the result of Order.objects.filter(...), add: .prefetch_related(Prefetch('item_set')) to the end of the query. Then use: items = self.item_set in get_order_details. See the docs here: https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related

Related

Why accessing Django QuerySet became very slow?

How to do ...ON DUPLICATE KEY UPDATE... in django

Django ORM values_list with '__in' filter performance

How can I reuse objects in django without adding more queries?

Django query based on FK — get all, not any

Categories

Resources