Django pagination. What if I have 1 million of rows? - python

The official Django Documentation gives us something like this:
from django.core.paginator import Paginator
my_list = MyModel.objects.all()
p = Paginator(my_list, 10)
But. What if I have to paginate 1 million of rows? It's not so efficient to load the 1 million rows with MyModel.objects.all() every time I want to view a single paginated page.
Is there a more efficient way to do this without the need of call objects.all() to make a simple pagination?

MyModel.objects.all() doesn't actually load all of the objects. It could potentially load all of them, but until you actually perform an action that requires it to be evaluated, it won't do anything.
The Paginator will almost certainly add some limits on that query set. For example, using array-slicing notation, it can create a new object, like this
my_list = MyModel.objects.all()
smaller_list = my_list[100:200]
That will create a different query set, which will only request 100 items from the database. Or calling .count() on the original query set, which will just instruct the database to return the number of rows in the table.
You would have to do something that requires all of the objects to be retrieved, like calling
list(my_list)
to get 1000000 rows to be transferred from the database to Python.

Related

Making complex query with django models

I created a view in my database model with 6 joins and 10 columns, and at the moment it shows around 86.000 rows.
I try to query all the rows by objects.all() and then filter according to user interaction (form data sent by POST and then choosing appropriate .filter(*args) querying)
After that I tried to get the length of the queryset by using count() since this method doesnt evaluate the query. But since views don't have indexes on the columns, the count() method takes to long.
I searched for the solution of materializing the view but that isn't possible in mysql.
Then I searched for a solution that might be able to replace the initial .all() by just using the 6 joins and filtering arguments in django rather than creating a view, so the indexes would still be available. But I couldn't find a solution to that problem.
Or maybe combining every row from the view with another table so I can use the index of the other table for faster querying?:
SELECT * FROM View LEFT JOIN Table ON (View.id = Table.id)
I appreciate every answer
Try this below:
from django.db import models
# I think below is your table structure
class Table(models.Model):
pass
class View(models.Model):
table = models.ForeignKey(to=Table)
qs = View.objects.select_related('table').filter(table__isnull=True)
for iterator in qs:
print(qs)
Thanks !

What is the difference between with_entities and load_only in SQLAlchemy?

When querying my database, I only want to load specified columns. Creating a query with with_entities requires a reference to the model column attribute, while creating a query with load_only requires a string corresponding to the column name. I would prefer to use load_only because it is easier to create a dynamic query using strings. What is the difference between the two?
load_only documentation
with_entities documentation
There are a few differences. The most important one when discarding unwanted columns (as in the question) is that using load_only will still result in creation of an object (a Model instance), while using with_entities will just get you tuples with values of chosen columns.
>>> query = User.query
>>> query.options(load_only('email', 'id')).all()
[<User 1 using e-mail: n#d.com>, <User 2 using e-mail: n#d.org>]
>>> query.with_entities(User.email, User.id).all()
[('n#d.org', 1), ('n#d.com', 2)]
load_only
load_only() defers loading of particular columns from your models.
It removes columns from query. You can still access all the other columns later, but an additional query (in the background) will be performed just when you try to access them.
"Load only" is useful when you store things like pictures of users in your database but you do not want to waste time transferring the images when not needed. For example, when displaying a list of users this might suffice:
User.query.options(load_only('name', 'fullname'))
with_entities
with_entities() can either add or remove (simply: replace) models or columns; you can even use it to modify the query, to replace selected entities with your own function like func.count():
query = User.query
count_query = query.with_entities(func.count(User.id)))
count = count_query.scalar()
Note that the resulting query is not the same as of query.count(), which would probably be slower - at least in MySQL (as it generates a subquery).
Another example of the extra capabilities of with_entities would be:
query = (
Page.query
.filter(<a lot of page filters>)
.join(Author).filter(<some author filters>)
)
pages = query.all()
# ok, I got the pages. Wait, what? I want the authors too!
# how to do it without generating the query again?
pages_and_authors = query.with_entities(Page, Author).all()

query.group_by in Django 1.9

I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?
You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.

Django ORM values_list with '__in' filter performance

What is the preferred way to filter query set with '__in' in Django?
providers = Provider.objects.filter(age__gt=10)
consumers = Consumer.objects.filter(consumer__in=providers)
or
providers_ids = Provider.objects.filter(age__gt=10).values_list('id', flat=True)
consumers = Consumer.objects.filter(consumer__in=providers_ids)
These should be totally equivalent. Underneath the hood Django will optimize both of these to a subselect query in SQL. See the QuerySet API reference on in:
This queryset will be evaluated as subselect statement:
SELECT ... WHERE consumer.id IN (SELECT id FROM ... WHERE _ IN _)
However you can force a lookup based on passing in explicit values for the primary keys by calling list on your values_list, like so:
providers_ids = list(Provider.objects.filter(age__gt=10).values_list('id', flat=True))
consumers = Consumer.objects.filter(consumer__in=providers_ids)
This could be more performant in some cases, for example, when you have few providers, but it will be totally dependent on what your data is like and what database you're using. See the "Performance Considerations" note in the link above.
I Agree with Wilduck. However couple of notes
You can combine a filter such as these into one like this:
consumers = Consumer.objects.filter(consumer__age__gt=10)
This would give you the same result set - in a single query.
The second thing, to analyze the generated query, you can use the .query clause at the end.
Example:
print Provider.objects.filter(age__gt=10).query
would print the query the ORM would be generating to fetch the resultset.

how to pull value from queryset by going to database once django

I have a function that sets up the data for another method. It does this to limit calls to the database.
The setup method looks like so:
def get_customers(request):
customer_list = Customer.objects.filter(pk=request.user)
populated_customer = get_customer(request, customer_list)
The method that does the processing looks like this:
def get_customer(request):
for customer in customer_list:
if customer.id == 3:
# do something with this customer
Instead of doing the for loop to find the customer I need, how can I pull it out of the list without going to the database because I am dealing with millions of records.
It seems what you are doing here is pulling all the users records into python (django) memory, then filtering them out by looping through this QuerySet.
A better approach might be chaining these Queries to your database, which is supported by the django QuerySet language.
It would be wise to look at the documentation concerning chained filter methods, but a sample query might look like this:
customer_list = Customer.objects.filter(pk=request.user).filter(id=3)
As shown here, this is not the same as the following:
customer_list = Customer.objects.filter(pk=request.user,id=3)

Categories

Resources