I have a function that sets up the data for another method. It does this to limit calls to the database.
The setup method looks like so:
def get_customers(request):
customer_list = Customer.objects.filter(pk=request.user)
populated_customer = get_customer(request, customer_list)
The method that does the processing looks like this:
def get_customer(request):
for customer in customer_list:
if customer.id == 3:
# do something with this customer
Instead of doing the for loop to find the customer I need, how can I pull it out of the list without going to the database because I am dealing with millions of records.
It seems what you are doing here is pulling all the users records into python (django) memory, then filtering them out by looping through this QuerySet.
A better approach might be chaining these Queries to your database, which is supported by the django QuerySet language.
It would be wise to look at the documentation concerning chained filter methods, but a sample query might look like this:
customer_list = Customer.objects.filter(pk=request.user).filter(id=3)
As shown here, this is not the same as the following:
customer_list = Customer.objects.filter(pk=request.user,id=3)
Related
I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.
The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.
I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?
You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.
What is the preferred way to filter query set with '__in' in Django?
providers = Provider.objects.filter(age__gt=10)
consumers = Consumer.objects.filter(consumer__in=providers)
or
providers_ids = Provider.objects.filter(age__gt=10).values_list('id', flat=True)
consumers = Consumer.objects.filter(consumer__in=providers_ids)
These should be totally equivalent. Underneath the hood Django will optimize both of these to a subselect query in SQL. See the QuerySet API reference on in:
This queryset will be evaluated as subselect statement:
SELECT ... WHERE consumer.id IN (SELECT id FROM ... WHERE _ IN _)
However you can force a lookup based on passing in explicit values for the primary keys by calling list on your values_list, like so:
providers_ids = list(Provider.objects.filter(age__gt=10).values_list('id', flat=True))
consumers = Consumer.objects.filter(consumer__in=providers_ids)
This could be more performant in some cases, for example, when you have few providers, but it will be totally dependent on what your data is like and what database you're using. See the "Performance Considerations" note in the link above.
I Agree with Wilduck. However couple of notes
You can combine a filter such as these into one like this:
consumers = Consumer.objects.filter(consumer__age__gt=10)
This would give you the same result set - in a single query.
The second thing, to analyze the generated query, you can use the .query clause at the end.
Example:
print Provider.objects.filter(age__gt=10).query
would print the query the ORM would be generating to fetch the resultset.
The official Django Documentation gives us something like this:
from django.core.paginator import Paginator
my_list = MyModel.objects.all()
p = Paginator(my_list, 10)
But. What if I have to paginate 1 million of rows? It's not so efficient to load the 1 million rows with MyModel.objects.all() every time I want to view a single paginated page.
Is there a more efficient way to do this without the need of call objects.all() to make a simple pagination?
MyModel.objects.all() doesn't actually load all of the objects. It could potentially load all of them, but until you actually perform an action that requires it to be evaluated, it won't do anything.
The Paginator will almost certainly add some limits on that query set. For example, using array-slicing notation, it can create a new object, like this
my_list = MyModel.objects.all()
smaller_list = my_list[100:200]
That will create a different query set, which will only request 100 items from the database. Or calling .count() on the original query set, which will just instruct the database to return the number of rows in the table.
You would have to do something that requires all of the objects to be retrieved, like calling
list(my_list)
to get 1000000 rows to be transferred from the database to Python.
I have a model that looks something like this:
class Item(models.Model):
name = models.CharField()
type = models.CharField()
tags = models.models.ManyToManyField(Tags)
In order to render a given view, I have a view that presents a list of Items based on type. So in my view, there's a query like:
items = Item.objects.filter(type='type_a')
So that's easy and straight forward. Now I have an additional requirement for the view. In order to fulfill that requirement, I need to build a dictionary that relates Tags to Items. So the output i am looking for would be something like:
{
'tag1': [item1, item2, item5],
'tag2': [item1, item4],
'tag3': [item3, item5]
}
What would be the most efficient way to do this? Is there any way to do this without going to the database with a new query for each tag?
You can check prefetch_related it might help you:
This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different... prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related...
So in the end you will either do multiple queries or use prefetch_related and it will do some Python joins on the objects.
You might do something like this:
# This should require two database queries, one for the items
# and one for all the associated tags.
items = Item.objects.filter(type='type_a').prefetch_related('tags')
# Now massage the data into your desired data structure.
from collections import defaultdict
tag_dict = defaultdict(list)
for item in items:
# Thanks to prefetch_related this will not hit the database.
for tag in item.tags.all():
tag_dict[tag].append(item)