Efficient Multiple Row and Columns Lookup Django - python

I need an efficient query that can enable lookup on multiple table rows and different columns. I have the below model:
class Vahala(models.Model):
tourist = models.ForeignKey(User)
name = models.CharField(max_length=100)
can_visit = models.BooleanField(default=False)
can_move_out = models.BooleanField(default=False)
I need to confirm if Mr A who is a tourist can visit a location named 'bali' and can also move out of a location named 'cape_verde'.
I want to believe the naive approach would be
check_a = Vahala.objects.filter(name='bali', can_visit=True, tourist__email='mra#mail.com')
check_b = Vahala.objects.filter(name='cape_verde', can_move_out=True,
tourist__email='mra#mail.com')
check_a and check_b must exists() before Mr A can complete the process.
I need an efficient approach. I don't want to keep hitting the database multiple times. Is it possible to confirm the conditions via a single DB hit or at most two if the conditions are much? What am I missing?

You can filter from the Users side and chain filters to make successive joins on a related model:
queryset = User.objects.filter(
email='mra#mail.com'
).filter(
vahala_set__name='bali',
vahala_set__can_visit=True
).filter(
vahala_set__name='cape_verde',
vahala_set__can_move_out=True
)
This will perform a join between User, Vahala and Vahala (twice, due to chained filter on the related model) and give you only users that match the given conditions.

Related

Django models' best solution for this reverse ForeignKey

I'm making a personal project to manage restaurants.
Two of my models are facing a problem, these models are DiningRoom and Table.
DiningRoom is the representation of any area that the restaurant could have (e.g. we could have one area inside and other area in the terrace of the building).
And in every DiningRoom we can set a layout of Tables.
So, the more object-oriented way I find to map this is by many-to-one relationship (ForeignKey). Since one DiningRoom can have many Tables, and one Table can be only in one DiningRoom. Right?
So my models are:
class DiningRoom(models.Model):
account = models.ForeignKey(Account, on_delete=models.CASCADE, null=False)
name = models.CharField(max_length=50, null=False, blank=False)
rows = models.IntegerField(max=15, null=False)
cols = models.IntegerField(max=15, null=False) # rows and columns are for the room's grid layout.
class Table(models.Model):
row = models.IntegerField(max=15, null=False) # The row in the room's grid where the table is at.
col = models.IntegerField(max=15, null=False) # the column in the room's grid where the table is at.
dining_room = models.ForeignKey(DiningRoom, on_delete=models.CASCADE, null=False) # Here is the problem.
The problem is that when I am querying DiningRooms of the account in the view, I need to fetch also the Tables that are related to each DiningRoom in the queryset result.
def dining_rooms(request):
try:
account = Account.objects.get(id=request.session['account_id'])
except Account.DoesNotExists:
return response(request, "error.html", {'error': 'Account.DoesNotExists'})
dining_rooms = DiningRoom.objects.filter(account=account)
But also I need the Tables of the results in dining_rooms!
I found two possible solutions but none seem to be "correct" to me. One is to make a Many-to-many relationship and validate that any Table is only in one DiningRoom in the view. And the second and worse one could be fetching the Tables once for each DiningRoom obtained in the queryset (but imagine a restaurant with 5 or 6 different areas (DiningRooms), it would be needed to fetch the database six times for every time).
Doing it in vice-versa and fetching all the Tables and select_related DiningRooms is not possible since it's possible to have a DiningRoom with no Tables in it (and in this case we will have missing DiningRooms).
What could be the best way to handle this? Thanks!
You can use the related_name or relationships backwards, an acceptable solution would be to create a method in the DiningRoom model that is called associated_tables() and return using the related_name (modelname_set, in this case it would be table_set). It is the name of the lowercase child model followed by the suffix _set
class DiningRoom(models.Model):
#your fields
def associated_tables(self):
return self.table_set.all()
In addition, this video tutorial could clear your days and give you a better idea about reverse relationships:
https://youtu.be/7tAZdYRA8Sw

Django: remove duplicates (group by) from queryset by related model field

I have a Queryset with a couple of records, and I wan't to remove duplicates using the related model field. For example:
class User(models.Model):
group = models.ForeignKey('Group')
...
class Address(models.Model):
...
models.ForeignKey('User')
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
This returns a QuerySet of Address records, and I want to group by the User ID.
I can't use .annotate because I need all fields from Address, and the relationship between Address and User
I can't use .distinct() because it doesn't work, since all addresses are distinct, and I want distinct user addresses.
I could:
addresses = Address.objects.filter(user__group__id=1).order_by('-id')
unique_users_ids = []
unique_addresses = []
for address in addresses:
if address.user.id not in unique_users_ids:
unique_addresses.append(address)
unique_users_ids.append(address.user.id)
print unique_addresses # TA-DA!
But it seems too much for a simple thing like a group by (damn you Django).
Is there a easy way to achieve this?
By using .distinct() with a field name
Django has also a .distinct(..) function that takes as input column the column names that should be unique. Alas most database systems do not support this (only PostgreSQL to the best of my knowledge). But in PostgreSQL we can thus perform:
# Limited number of database systems support this
addresses = (Address.objects
.filter(user__group__id=1)
.order_by('-id')
.distinct('user_id'))
By using two queries
Another way to handle this is by first having a query that works over the users, and for each user obtains the largest address_id:
from django.db.models import Max
address_ids = (User.objects
.annotate(address_id=Max('address_set__id'))
.filter(address_id__isnull=False)
.values_list('address_id'))
So now for every user, we have calculated the largest corresponding address_id, and we eliminate Users that have no address. We then obtain the list of ids.
In a second step, we then fetch the addresses:
addresses = Address.objects.filter(pk__in=address_ids)

Optimise Django query with large subquery

I have a database containing Profile and Relationship models. I haven't explicitly linked them in the models (because they are third party IDs and they may not yet exist in both tables), but the source and target fields map to one or more Profile objects via the id field:
from django.db import models
class Profile(models.Model):
id = models.BigIntegerField(primary_key=True)
handle = models.CharField(max_length=100)
class Relationship(models.Model):
id = models.AutoField(primary_key=True)
source = models.BigIntegerField(db_index=True)
target = models.BigIntegerField(db_index=True)
My query needs to get a list of 100 values from the Relationship.source column which don't yet exist as a Profile.id. This list will then be used to collect the necessary data from the third party. The query below works, but as the table grows (10m+), the SubQuery is getting very large and slow.
Any recommendations for how to optimise this? Backend is PostgreSQL but I'd like to use native Django ORM if possible.
EDIT: There's an extra level of complexity that will be contributing to the slow query. Not all IDs are guaranteed to return success, which would mean they continue to "not exist" and get the program in an infinite loop. So I've added a filter and order_by to input the highest id from the previous batch of 100. This is going to be causing some of the problem so apologies for missing it initially.
from django.db.models import Subquery
user = Profile.objects.get(handle="philsheard")
qs_existing_profiles = Profiles.objects.all()
rels = TwitterRelationship.objects.filter(
target=user.id,
).exclude(
source__in=Subquery(qs_existing_profiles.values("id"))
).values_list(
"source", flat=True
).order_by(
"source"
).filter(
source__gt=max_id_from_previous_batch # An integer representing a previous `Relationship.source` id
)
Thanks in advance for any advice!
For future searchers, here's how I bypassed the __in query and was able to speed up the results.
from django.db.models import Subquery
from django.db.models import Count # New
user = Profile.objects.get(handle="philsheard")
subq = Profile.objects.filter(profile_id=OuterRef("source")) # New queryset to use within Subquery
rels = Relationship.objects.order_by(
"source"
).annotate(
# Annotate each relationship record with a Count of the times that the "source" ID
# appears in the `Profile` table. We can then filter on those that have a count of 0
# (ie don't appear and therefore haven't yet been connected)
prof_count=Count(Subquery(subq.values("id")))
).filter(
target=user.id,
prof_count=0
).filter(
source__gt=max_id_from_previous_batch # An integer representing a previous `Relationship.source` id
).values_list(
"source", flat=True
)
I think this is faster because the query will complete once it reaches it's required 100 items (rather than comparing against a list of 1m+ IDs each time).

Django Queryset compare two different models with multiple rows

I have these two models that I would like to return the sum of. I get an database error about the subquery returning more than one row. What would be the best way to compare both without using a for statement?
AuthorizationT(models.Model)
ar_id = models.BigIntegerField(blank=True, null=True)
status_flag = models.BigIntegerField(blank=True, null=True)
BillT(models.Model)
paid_id = models.BigIntegerField(blank=True, null=True)
recvd = models.FloatField(blank=True, null=True)
Query I tried
paidbill= BillT.objects.values_list('paid_id', flat=true)
AuthorizationT.objects.values().filter(ar_id=paidbill, status_flag=0).aggregate(Sum('recvd'))
In SQL I know it would be
select sum(recvd) from authorization_t a, bill_t b where a.ar_billid0= b.paid_id and a.status_flag=0
I'm looking for the equivalent in queryset
I think you won't be able to achieve without a for loop because I think you need to join the tables as there is a filtration on both tables and you want to sum a field from the first table. The way to join tables would be prefetch_related() or select_related() but they utilize foreign keys.
This leads me to a suggestion that the id fields: bill_id and ar_id should be normalized as it looks like there will be data duplication. Using a relationship would also make making queries simpler.
Since paidbill is a list, you have to use the __in suffix in your query:
AuthorizationT.objects.filter(ar_id__in=paidbill,status_flag=0).aggregate(Sum('recvd'))
If you model the relation of the models (ar_id, paid_id) via a ForeignKey or ManyToMany, you will be able to do this trivially in a single ORM statement

How to sort a Django queryset by a conditional aggregation of related items?

How can I use the Django ORM to sort all objects in a queryset by a conditional aggregation of related items? Django 1.8 and higher explicitly have conditional aggregation support, I need an answer for Django < 1.8. This will obviously involve 3 or 4 queries. Here are example models an answerer can use to illustrate her answer:
class Group(models.Model):
owner = models.ForeignKey(User)
created_at = models.DateTimeField(auto_now_add=True)
class GroupTraffic(models.Model):
visitor = models.ForeignKey(User)
which_group = models.ForeignKey(Group)
time_of_visit = models.DateTimeField(auto_now_add=True)
Users own chat groups, and other users can visit the said chat groups. The question to answer is: How can one produce a sorted list of all groups, such that it's sorted by the unique traffic each group has seen in the last 60 mins? Groups that have seen a lot of unique visitors in the last 60 mins get sorted to the top, groups with almost zero (or zero) such traffic appear at the bottom of the list.
This is a conditional aggregation question because, essentially, we need to annotate to each group object, a Count of all related, unique grouptraffic objects that were logged in the past 60 mins.
Can someone show me how to use the Django ORM to solve this for < 1.8? I already know how to do it for >= 1.8 via conditional aggregation (thanks to this), and I don't want to do it via SQL queries in raw or extra.
Just filter on the time_of_visit and it should work fine:
one_hour_ago = datetime.now()-timedelta(hours=1)
recent_groups = Group.objects.filter(grouptraffic__time_of_visit>=one_hour_ago)
visitors = recent_groups.annotate(views=Count('grouptraffic__visitor', distinct=True))
Then get all of the older groups, with an extra field for the empty views:
older_groups = Group.objects.filter(grouptraffic__time_of_visit < one_hour_ago).extra(select={'visits':0})
Then concatenate them together with a pipe:
all_groups = visitors | older_groups

Categories

Resources