Django searching multiple models and removing duplicates

Django searching multiple models and removing duplicates - python

I am trying to build a search function for my blog that searches through the two models Articles and Spots. The two models are connected via the pivot table ArticleSpots.
My blog is structured so that there are multiple spots in each article.
When a query is searched I want the query to be searched within both models but only display clickable articles.
I have a html page for each article but not for each spots, so all the spots that resulted from the search have to be shown as the article which contains the spot. Hope that makes sense!
This is the code that I came up with but the problem is that I get a lot of duplicates in the variable results. There are duplicates within each articles_from_spots and articles_from_query, and there are also overlaps between them.
Is this the right way to accomplish this ? How can I remove the duplicates from the results?
Any help would be appreciated!
views.py
def search(request):
query = request.GET.get("q")
articles_from_query = Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query)
)
spots_from_query = Spots.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(content__icontains=query)
)
articles_from_spots = []
for x in spots_from_query:
article = Articles.objects.filter(articlespots__spot=x)
articles_from_spots.extend(article)
results = chain(articles_from_spots, articles_from_query)
context = {
'results': results,
}
return render(request, "Search.html", context)
models.py
class Articles(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
class ArticleSpots(models.Model):
article = models.ForeignKey('Articles', models.DO_NOTHING)
spot = models.ForeignKey('Spots', models.DO_NOTHING)
class Spots(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
content = models.TextField(blank=True, null=True)

You should be able to do this in a single query following the relationship from article to spot
Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(articlespots__spot__title__icontains=query) |
Q(articlespots__spot__summary__icontains=query) |
Q(articlespots__spot__content__icontains=query)
).distinct()
If you were to add a ManyToManyField from Article to Spots it would simplify this a bit and makes sense from a design POV
class Articles(models.Model):
...
spots = models.ManyToManyField('Spots', through='ArticleSpots')
Articles.objects.filter(
Q(title__icontains=query) |
Q(summary__icontains=query) |
Q(spots__title__icontains=query) |
Q(spots__summary__icontains=query) |
Q(spots__content__icontains=query)
).distinct()

The main issue is the inefficient for-loop, but I have to suggest something else first.
I would highly recommend a model design change:
class Articles(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
spots = models.ManyToManyField(Spot, blank=True, related_name='articles')
class Spots(models.Model):
title = models.CharField(max_length=155)
summary = models.TextField(blank=True, null=True)
content = models.TextField(blank=True, null=True)
The function is exactly the same (plus you can call spot.articles.all() and article.spots.all()). You can still access your ArticleSpots model as Article.spots.through if you need. In case you later need more fields per connection, you can do this (together with your original ArticleSpots class, maybe with on_delete=models.CASCADE there instead):
spots = models.ManyToManyField(Spot, blank=True, through= 'ArticleSpots')
The for-loop is inefficient (think dozens of seconds when you get to just thousands of objects or if joins happen), because it triggers a query for every item in the result. Instead you should get the articles_from_spots with a direct query. IE.:
article_ids = spots_from_query.values_list('articles__id', flat=True)
articles_from_spots = Article.objects.filter(id__in=article_ids)
That will guarantee only 2 db queries per run. Then you'll need to do something like to turn the querysets into lists before combining them:
results = chain(map(list, [articles_from_spots, articles_from_query]))
There might still be issues with mixing two model querysets together, but that all depends on your template. It's generally a bad practice, but there's no acute issue as far as you're aware of it.

Related

How To Join Two Models with Different Column Names and Return All Instances?

I aim to create a dataframe of the Top 3 Selling menu_items in my Purchases table. My thoughts are to create a join on the Purchases model with the Menu_Item model where Purchases.menu_item = Menu_Item.title. I will convert the QuerySet to a DataFrame using django_pandas.io. I plan to use the sum of Menu_Item.price associated with each distinct Purchases.menu_item to determine the Top 3 menu_items of all the records in the Purchases table.
My problem is that I cannot join the two tables successfully. I’ve scoured the interwebz for a working solution to join two models with different field names, which returns all instances, and I tried various solutions, but the scarce articles on this topic yielded no joy.
models.py
...
class MenuItem(models.Model):
title = models.CharField(max_length=100, unique=True,
verbose_name="Item Name")
price = models.FloatField(default=0.00, verbose_name="Price")
description = models.CharField(max_length=500,
verbose_name="Item Description")
def __str__(self):
return f"title={self.title}; price={self.price}"
def get_absolute_url(self):
return "/menu"
def available(self):
return all(X.enough() for X in self.reciperequirement_set.all())
class Meta:
ordering = ["title"]
class Purchase(models.Model):
menu_item = models.ForeignKey(MenuItem, on_delete=models.CASCADE,
verbose_name="Menu Item")
timestamp = models.DateTimeField(auto_now_add=True,
verbose_name="DateTime")
def __str__(self):
return f"menu_item=[{self.menu_item.__str__()}]; time={self.timestamp}"
def get_absolute_url(self):
return "/purchases"
class Meta:
ordering = ["menu_item"]
I tried adapting too many unsuccessful code fragments to reproduce here, so I am looking at starting with a clean slate. I'm hoping you have an effective solution to share. Your help is greatly appreciated. Thanks.

You didn't mention what you have tried, so it is hard for me (and other developers) to give precise suggestions.
Anyway, have you tried something like
purchases = Purchase.objects.values(
'timestamp',
item_title=F('menu_item__title'),
item_price=F('menu_item__price'),
item_desc=F('menu_item__description'))
This queryset will fetch all values in one sql connection.

Count and Sum objects from different models - Django

I'm working on my Django project and I'm triying to order my posts by the sum of 2 related models.
So this query should take the attendance count from the Post model and the attendant count from the Attending model and get the total sum.
These are the models:
class Post(models.Model):
title = models.CharField(max_length=100)
attendance = models.ManyToManyField(User, related_name='user_event_attendance')
class Attending(models.Model):
attendant = models.ForeignKey(User, related_name='events_attending', on_delete=models.CASCADE, null=True)
post = models.ForeignKey('Post', on_delete=models.CASCADE, null=True)
By now I have the following code but is not working properly:
views.py
new_query = Post.objects.filter(
Q(status='NOT STARTED', post_options='PUBLIC') | Q(status='IN PROGRESS',post_options='PUBLIC')).distinct().annotate(total_attendance=Count('attendance')).annotate(total_attendant=Count("attending__attendant")).annotate(total=F("total_attendant") + F("total_attendance")).order_by('-total')

You are getting duplicates in your counts, because of this documented behaviour:
Combining multiple aggregations with annotate() will yield the wrong results because joins are used instead of subqueries.
... For most aggregates, there is no way to avoid this problem, however, the Count aggregate has a distinct parameter that may help.
You can resolve this by adding the distinct parameter to both your count annotations:
new_query = Post.objects.filter(
Q(status='NOT STARTED', post_options='PUBLIC') | Q(status='IN PROGRESS',post_options='PUBLIC')
).distinct().annotate(
total_attendance=Count('attendance', distinct=True), # Add distinct
total_attendant=Count("attending__attendant", distinct=True) # Add distinct
).annotate(
total=F("total_attendant") + F("total_attendance")
).order_by('-total')
Also, Count("attending__attendant", distinct=True) can be simplified to just Count("attending", distinct=True), since there can only ever be one attendant for each Attending instance.

Loop through and Merge Querysets

I am sure this one is straight forward but I cannot seem to get my head around it.
I have "Users" who can post "Posts" on my site.
Each user can follow other users.
The idea is to display all the posts posted by the users that current user is following.
Example : Foo followed Bar and Baz. I need to retrieve all the posts from Bar and Baz.
Bar = Post.objects.filter(user=3)
Baz = Post.objects.filter(user=4)
totalpost= list(chain(Bar, Baz))
print(totalpost)
On this occasion, when both variables userXposts and temp are hardcoded, I can easily retrieve ONE list of QuerySets neeatly by chaining both QuerySets.
However, I cannot have those hardcoded. As such, I am attempted to loop through each user posts and add it in a list since my user can follow X amount of users :
QuerySet = Profile.objects.filter(follower=1)
for x in QuerySet:
userXposts = Post.objects.filter(user=x.user.id)
temp = userXposts
totalpost= list(chain(userXposts, temp))
temp = []
print("Totalpost after union of userpost and temp: ", totalpost)
Here, Profile.objects.filter(follower=1) return two sets of QuerySets, one for Baz and one for Bar.
The problem that I have so far is that totalpost endup being a "list of list" (I believe) which forces me to call totalpost[0] for Bar posts and totalpost[1] for Baz posts.
Since I am attempting to use Pagination with Django, I am forced to pass ONE Variable only in p= Paginator(totalpost, 200)
Would you be able to assist in the loop so that I can fetch the data for the first user, add it to a variable, then go to the second user and ADD the second QuerySet data to the list where the First User data is?
Thanks a lot !
EDIT :
Here are the Models :
class User(AbstractUser):
pass
class Profile(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
following = models.ManyToManyField(User, blank=True, related_name="following_name")
follower = models.ManyToManyField(User, blank=True, related_name="follower_name")
def __str__(self):
return f'"{self.user.username}" is followed by {self.follower.all()} and follows {self.following.all()}'
class Post(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
timestamp = models.DateTimeField(default=datetime.now)
post = models.CharField(max_length=350, null=True, blank=True)
like = models.ManyToManyField(User, blank=True, related_name="like_amount")
def __str__(self):
return f'#{self.id}: "{self.user.username}" posted "{self.post}" on "{self.timestamp}". Like : "{self.like.all()}" '

Post.objects.filter(user__ following_name__id=1)

How to reduce quantity of an item in main table when it is being used in another table - django

I am creating my model in Django and I have a many to many relationship between supplies and van kits. The idea is that an "item" can belong to many "van kits" and a "van kit" can have many " items. I created an intermediary model that will hold the relationship, but I am struggling to figure out a way to relate the quantity in the van kit table to the quantity in the main supplies table. For example, if I wanted to mark an item in the van kit as damaged and reduce the quantity of that supply in the van kit, I would also want to reduce the total count of that supply in the main "supplies" table until it has been replenished. I am thinking that maybe I'll have to create a function in my views file to carry out that logic, but I wanted to know if it could be implemented in my model design instead to minimize chances of error. Here's my code:
class supplies(models.Model):
class Meta:
verbose_name_plural = "supplies"
# limit the user to selecting a pre-set category
choices = (
('CREW-GEAR','CREW-GEAR'),
('CONSUMABLE','CONSUMABLE'),
('BACK-COUNTRY','BACK-COUNTRY')
)
supplyName = models.CharField(max_length=30, blank=False) # if they go over the max length, we'll get a 500 error
category = models.CharField(max_length=20, choices = choices, blank=False)
quantity = models.PositiveSmallIntegerField(blank=False) # set up default
price = models.DecimalField(max_digits=5, decimal_places=2, null=True, blank=True) # inputting price is optional
def __str__(self):
return self.supplyName
class van_kit(models.Model):
supply_name = models.ManyToManyField(supplies, through='KitSupplies',through_fields=('vanKit','supplyName'), related_name="supplies")
van_kit_name = models.CharField(max_length=100)
vanName = models.ForeignKey(vans, on_delete=models.CASCADE)
def __str__(self):
return self.van_kit_name
class KitSupplies(models.Model):
supplyName = models.ForeignKey(supplies, on_delete=models.CASCADE)
vanKit = models.ForeignKey(van_kit, on_delete=models.CASCADE)
quantity = models.PositiveSmallIntegerField(blank=False)
def __str__(self):
return str(self.supplyName)
class Meta:
verbose_name_plural = 'Kit Supplies'
I am fairly new to django, I have to learn it for a class project so if my logic is flawed or if a better way to do it is obvious, please respectfully let me know. I'm open to new ways of doing it. Also, I've read through the documentation on using "through" and "through_fields" to work with the junction table, but I'm worried I may not be using it correctly. Thanks in advance.

One option would be to drop/remove the field quantity from your supplies model and just use a query to get the total quantity.
This would be a bit more expensive, as the query would need to be run each time you want to know the number, but on the other hand it simplifies your design as you don't need any update logic for the field supplies.quantity.
The query could look as simple as this:
>>> from django.db.models import Sum
>>> supplies_instance.kitsupplies_set.aggregate(Sum('quantity'))
{'quantity__sum': 1234}
You could even make it a property on the model for easy access:
class supplies(models.Model):
...
#property
def quantity(self):
data = self.kitsupplies_set.aggregate(Sum('quantity'))
return data['quantity__sum']

Updating Many-to-Many relation

I have 3 models (simplified):
class Product(models.Model):
category = models.ForeignKey('Category', related_name='products', to_field='category_name')
brand = models.ForeignKey('Brand', related_name='products', to_field='brand_name')
class Brand(models.Model):
brand_name = models.CharField(max_length=50)
categories = models.ManyToManyField('Category', related_name='categories')
class Category(models.Model):
category_name = models.CharField(max_length=128)
I want to change a Category in admin to a bunch of products, i have a custom admin function written for that. After that I need to update Brand-Categories Many-to-Many relation to check if that Category is still available for a specific Brand. I have written this function:
def brand_refresh():
brands = Brand.objects.all().prefetch_related('shops', 'categories')
products = Product.objects.select_related('shop', 'brand', 'category')
for brand in list(brands):
for category in brand.categories.all():
if not products.filter(category=category).exists():
brand.categories.remove(category)
for product in list(products.filter(brand=brand).distinct('category')):
if product.category not in [None, category]:
brand.categories.add(product.category)
Seems to me this monstro is working, but it takes 2 hours to loop over all cycles (i have ~220k products, 4k+ brands, and ~500 categories). I there any better way to update M2M relation here? I think .prefetch_related() should help here, but what I have now seems have no effect.

Here's a solution for the first part of your loop:
You should try this on a disposable local copy of your database and check that everything works well before running these in production:
from django.db.models import Count
# get a list of all categories which have no products
empty_categories = Category.objects.annotate(product_count=Count('products')).filter(product_count=0).values_list('id', flat=True)
# delete association of empty categories in all brands
Brand.categories.through.objects.filter(category_id__in=list(empty_categories)).delete()
For the second part, perhaps you can do something like this, though I'm not convinced if it's any faster (or even correct as it is):
for brand in Brand.objects.all():
# get a list of categories of all products in the brand
brand_product_categories = brand.products.all().value_list('category__id', flat=True).distinct()
# get the brand's categories
brand_categories = Category.objects.filter(category__brand=brand).value_list('id', flat=True)
# get elements from a not in b
categories_to_add = set(brand_product_categories) - set(brand_categories)
for category_id in categories_to_add:
brand.categories.add(category_id)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django searching multiple models and removing duplicates - python

Related

How To Join Two Models with Different Column Names and Return All Instances?

Count and Sum objects from different models - Django

Loop through and Merge Querysets

How to reduce quantity of an item in main table when it is being used in another table - django

Updating Many-to-Many relation

Categories

Resources