comparing model A's fields values in model B's field django - python

I have 100k records in both model 'A' and in model 'B'
Ex:
class A(models.Model):
user_email = models.EmailField(null=True, blank=True)
user_mobile = models.CharField(max_length=30, null=True, blank=True)
book_id = models.CharField(max_length=255, null=True, blank=True)
payment_gateway_response = JSONField(blank=True, null=True)
class B(models.Model):
order = models.ForeignKey(A, null=True, blank=True)
pay_id = models.CharField(max_length=250, null=True, blank=True)
user_email = models.EmailField(null=True, blank=True)
user_mobile = models.CharField(max_length=30, null=True, blank=True)
created = models.DateTimeField(blank=True, null=True)
total_payment = models.DecimalField(decimal_places=3, max_digits=20, blank=True, null=True)
I want to get B's objects using A's values
for example
all_a = A.objects.all()
for a in all_a:
b = B.objects.filter(user_email=a.user_email, user_mobile=a.user_mobile)
This is fine, I am getting the results. But as it's 100k records it's taking too much time. for loop iteration is taking time. Is there any faster way to do this in django?

You can get a list of each value in a and filter b with those values.
a = A.objects.all()
emails = list(a.values_list('user_email', flat=True))
mobiles = list(a.values_list('user_mobile', flat=True))
b = B.objects.filter(user_email__in=emails, user_mobile__in=mobiles)
How ever results may have pair of email and mobile that are not pair in A. But if you make sure that emails and mobiles will be unique in A and the email and mobile in each B are based in one of the A' models, then you won't have any problems.

If you're not interested in caching the A model, you may have a performance increase using iterator() (see for reference https://docs.djangoproject.com/en/1.11/ref/models/querysets/#iterator):
for a in A.objects.all().iterator():
b = B.objects.filter(user_email=a.user_email, user_mobile=a.user_mobile)

You can do
import operator
from django.db.models import Q
q = A.objects.all().values('user_email', 'user_mobile')
B.objects.filter(reduce(operator.or_, [Q(**i) for i in q]))
If you want to do with some operations with every b object depends on a.This is not the way.

Related

Solution to filter by Cumulative sum and error: Window is disallowed in the filter clause

This is follow-up to a question to:
Filter Queries which sum of their amount field are greater or lesser than a number
which is supposed to be solved. Answer suggests using Window function with filter but this results in a error:
django.db.utils.NotSupportedError: Window is disallowed in the filter clause.
Comment from #atabak hooshangi suggests removing the Window function, but query doesn't work in intended way after that. Any ideas to solve this problem?
let's say we have these 2 models:
class Developer(models.Model):
first_name = models.CharField(max_length=40, null=False, blank=False,
unique=True)
last_name = models.CharField(max_length=40, null=False, blank=False,
unique=True)
profession = models.CharField(max_length=100, null=False)
cv = models.FileField(upload_to=upload_cv_location, null=True, blank=True)
description = models.TextField()
img = models.ImageField(upload_to=upload_location, null=True, blank=True)
class Meta:
verbose_name = 'Developer'
verbose_name_plural = 'Developers'
ordering = ('first_name',)
def __str__(self):
return f'{self.first_name} {self.last_name}'
class Skill(models.Model):
developer = models.ForeignKey(to=Developer, on_delete=models.CASCADE)
category = models.ForeignKey(to=SkillCategory, on_delete=models.SET_NULL,
null=True)
name = models.CharField(max_length=50, null=False)
priority = models.PositiveIntegerField()
class Meta:
ordering = ('-priority',)
def __str__(self):
return f'{self.name} skill of {self.developer}'
As you can see , we have a developer model which has relationship with skill. Each developer can have multiple skills.
now consider we want to get the developers whos sum of their priorities are greater than a number.
The orm query should work this way :
from django.db.models import Sum
developers = Developer.objects.annotate(tot=Sum('skill__priority')).filter(tot__gt=250).all()
The output will be the developers who has greater than 250 priority_sum .
You can filter tot which is an annotated variable in any way you want.
like .filter(tot__lte)
or
.filter(tot__lt)
I hope this is what you were looking for.

What's the most efficient way to retrieve django queryset with the hightest number of posts for a related name?

I'm currently working on a website where advertisements will be posted to display vehicles for sale and rent. I would like to retrieve a queryset that highlights only one car brand (i.e. Audi) which has the highest number of posts for the respective model. Example:
Displaying the Audi brand because it has the highest number of related posts.
My question is, what's the most efficient way of doing this? I've done some work here but I'm pretty sure this is not the most efficient way. What I have is the following:
# Algorithm that is currently retrieving the name of the brand and the number of related posts it has.
def top_brand_ads():
queryset = Advertisement.objects.filter(status__iexact="Published", owner__payment_made="True").order_by('-publish', 'name')
result = {}
for ad in queryset:
# Try to update an existing key-value pair
try:
count = result[ad.brand.name.title()]
result[ad.brand.name.title()] = count + 1
except KeyError:
# If the key doesn't exist then create it
result[ad.brand.name.title()] = 1
# Getting the brand with the highest number of posts from the result dictionary
top_brand = max(result, key=lambda x: result[x]) # Returns for i.e. (Mercedes Benz)
context = {
top_brand: result[top_brand] # Retrieving the value for the top_brand from the result dict.
}
print(context) # {'Mercedes Benz': 7} -> Mercedes Benz has seven (7) related posts.
return context
Is there a way I could return a queryset instead without doing what I did here or could this be way more efficient?
If the related models are needed, please see below:
models.py
# Brand
class Brand(models.Model):
name = models.CharField(max_length=255, unique=True)
image = models.ImageField(upload_to='brand_logos/', null=True, blank=True)
slug = models.SlugField(max_length=250, unique=True)
...
# Methods
# Owner
class Owner(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
telephone = models.CharField(max_length=30, blank=True, null=True)
alternate_telephone = models.CharField(max_length=30, blank=True, null=True)
user_type = models.CharField(max_length=50, blank=True, null=True)
payment_made = models.BooleanField(default=False)
expiring = models.DateTimeField(default=timezone.now)
...
# Methods
# Advertisement (Post)
class Advertisement(models.Model):
STATUS_CHOICES = (
('Draft', 'Draft'),
('Published', 'Published'),
)
owner = models.ForeignKey(Owner, on_delete=models.CASCADE, blank=True, null=True)
name = models.CharField(max_length=150, blank=True, null=True)
brand = models.ForeignKey(Brand, on_delete=models.CASCADE, blank=True, null=True)
publish = models.DateTimeField(default=timezone.now)
status = models.CharField(max_length=10, choices=STATUS_CHOICES, default='Draft')
...
# Other fields & methods
Any help would be greatly appreciated.
Since you need brands, let's query on Brand model:
Brand.objects.filter(advertisement__status__iexact="Published").\
filter(advertisement__owner__payment_made=True).\
annotate(published_ads=Count('advertisement__id')).\
order_by('-published_ads')
However, even in your proposed solution, you can improve a little bit:
Remove the order_by method from your queryset. It doesn't affect the final result but adds some overhead, especially if your Advertisement model is not indexed on those fields.
Every time you call ad.brand you are hitting the database. This is called the N+1 problem. You are in a loop of n, you make n extra db access. You can use select_related to avoid such problems. In your case: Advertisement.objects.select_related('brand')...
Did you try the count method?
from django.db.models import Count
Car.objects.annotate(num_views=Count('car_posts_related_name')).order_by('num_views')

How to join 3 or more than 3 models in one single query ORM?

I am having 4 models linked with a foreign key,
class CustomUser(AbstractUser):
username = None
email = models.EmailField(('email address'), unique=True)
phone_no = models.CharField(max_length=255, unique=True)
USERNAME_FIELD = 'email'
REQUIRED_FIELDS = []
objects = CustomUserManager()
def __str__(self):
return self.email
class personal_profile(models.Model):
custom_user = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
picture = models.ImageField(default='profile_image/pro.png', upload_to='profile_image', blank=True)
role = models.CharField(max_length=255, blank=True, null=True)
gender = models.CharField(max_length=255, blank=True, null=True)
date_of_birth = models.DateField(blank=True, null=True)
def __str__(self):
return str(self.pk)
class academia_profile(models.Model):
custom_user = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
education_or_certificate = models.CharField(max_length=255, blank=True, null=True)
university = models.CharField(max_length=255, blank=True, null=True)
def __str__(self):
return str(self.pk)
class contact_profile(models.Model):
custom_user = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
country = models.CharField(max_length=255, blank=True, null=True)
state = models.CharField(max_length=255, blank=True, null=True)
city = models.CharField(max_length=255, blank=True, null=True)
def __str__(self):
return str(self.pk)
For extracting the data of those four models, I need to extract it by querying 4 times differently and then by passsing for different variables to HTML templates it something a hectic plus would be reducing the performance speed (I am sure!)
My current queries be like
user_base = CustomUser.objects.get(id=user_id)
user_personal = personal_profile.objects.get(custom_user=user_id)
academia = academia_profile.objects.get(custom_user=user_id)
contact = contact_profile.objects.get(custom_user=user_id)
Is it possible to get all of the four queries values in a single variable by hitting a single join query in ORM ?
also, I want to extract just the country from contact_profile and picture from personal_profile in the join query.
Select_related() can able to work here but how? that's what I am not getting.
You are looking for prefetch_related:
Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.
user_base = (
CustomUser
.objects
.prefetch_related( #<-- THIS!
"personal_profile_set",
"academia_profile_set",
"contact_profile_set")
.get(id=user_id))
personal_profile = user_base.personal_profile_set.all()[0]
academia_profile = user_base.academia_profile_set.all()[0]
contact_profile = user_base.contact_profile_set.all()[0]
Btw, if you have only one personal_profile, academia_profile, contact_profile per CustomUser, consider changing ForeignKey by OneToOneField and use select_related.

distinct on large table takes too long time in django

I have a large dataset with over 1m records. It has a manytomany field that causes duplicate returns on filtering.
models.py:
class Type(models.Model):
name = models.CharField(max_length=100, db_index=True)
class Catalogue(models.Model):
link = models.TextField(null=False)
image = models.TextField(null=True)
title = models.CharField(max_length=100, null=True)
city = models.CharField(db_index=True,max_length=100, null=True)
district = models.CharField(db_index=True,max_length=100, null=True)
type = models.ManyToManyField(Type, db_index=True)
datetime = models.CharField(db_index=True, max_length=100, null=True)
views.py:
last2week_q = Q(datetime__gte=last2week)
type_q = Q(type__in=intersections)
city_district_q = (Q(*[Q(city__contains=x) for x in city_district], _connector=Q.OR) |
Q(*[Q(district__contains=x) for x in city_district], _connector=Q.OR))
models.Catalogue.objects.filter(last2week_q & type_q & city_district_q).order_by('-datetime').distinct()
distinct() is too slow and I'm looking for a different solution to remove duplicates.
P.S:
I also tried to use this query instead of type_q, but it's slower than distinct! because type_ids is a very large list.
typ_ids = models.Catalogue.objects.only('type').filter(type__in=intersections).values_list('id', flat=True)
type_q = Q(id__in=typ_ids)

Django multiple foreign key to a same table

I need to log the transaction of the item movement in a warehouse. I've 3 tables as shown in the below image. However Django response error:
ERRORS:
chemstore.ItemTransaction: (models.E007) Field 'outbin' has column name 'bin_code_id' that is used by another field.
which is complaining of multiple uses of the same foreign key. Is my table design problem? or is it not allowed under Django? How can I achieve this under Django? thankyou
DB design
[Models]
class BinLocation(models.Model):
bin_code = models.CharField(max_length=10, unique=True)
desc = models.CharField(max_length=50)
def __str__(self):
return f"{self.bin_code}"
class Meta:
indexes = [models.Index(fields=['bin_code'])]
class ItemMaster(models.Model):
item_code = models.CharField(max_length=20, unique=True)
desc = models.CharField(max_length=50)
long_desc = models.CharField(max_length=150, blank=True)
helper_qty = models.DecimalField(max_digits=10, decimal_places=4)
unit = models.CharField(max_length=10, blank=False)
def __str__(self):
return f"{self.item_code}"
class Meta:
verbose_name = "Item"
verbose_name_plural = "Items"
indexes = [models.Index(fields=['item_code'])]
class ItemTransaction(models.Model):
trace_code = models.CharField(max_length=20, unique=False)
item_code = models.ForeignKey(
ItemMaster, related_name='trans', on_delete=models.CASCADE, null=False)
datetime = models.DateTimeField(auto_now=False, auto_now_add=False)
qty = models.DecimalField(max_digits=10, decimal_places=4)
unit = models.CharField(max_length=10, blank=False)
action = models.CharField(
max_length=1, choices=ACTION, blank=False, null=False)
in_bin = models.ForeignKey(
BinLocation, related_name='in_logs', db_column='bin_code_id', on_delete=models.CASCADE, null=False)
out_bin = models.ForeignKey(
BinLocation, related_name='out_logs', db_column='bin_code_id', on_delete=models.CASCADE, null=False)
remarks = models.TextField(blank=True)
def __str__(self):
return f"{self.trace_code} {self.datetime} {self.item_code} {dict(ACTION)[self.action]} {self.qty} {self.unit} {self.in_bin} {self.out_bin}"
you have same db_column in two fields so change it
in_bin = models.ForeignKey(
BinLocation, related_name='in_logs', db_column='bin_code_id', on_delete=models.CASCADE, null=False)
out_bin = models.ForeignKey(
BinLocation, related_name='out_logs', db_column='other_bin_code', on_delete=models.CASCADE, null=False) /*change db_column whatever you want but it should be unique*/
If are linked to the same model name, You should use different related_name for each foreign_key filed . here is the exemple :
address1 = models.ForeignKey(Address, verbose_name=_("Address1"),related_name="Address1", null=True, blank=True,on_delete=models.SET_NULL)
address2 = models.ForeignKey(Address, verbose_name=_("Address2"),related_name="Address2", null=True, blank=True,on_delete=models.SET_NULL)
thank you for everyone helped. According to Aleksei and Tabaane, it is my DB design issue (broken the RDBMS rule) rather than Django issue. I searched online and find something similar: ONE-TO-MANY DB design pattern
In my case, I should store in bin and out bin as separated transaction instead of both in and out in a single transaction. This is my solution. thankyou.
p.s. alternative solution: I keep in bin and out bin as single transaction, but I don't use foreign key for bins, query both in bin and out bin for the bin selection by client application.

Categories

Resources