Group queryset by related field values - python

I have following models structure:
class Chapter:
name = ...
class Atom:
name = ...
chapter = FK(Chapter)
class Question:
title = ...
atom = FK(Atom)
I want to get all Questions related to Chapter, grouped by Atoms in this chapter.
How to do it with many queries I know:
atoms = Question.objects.filter(atom__chapter__id=1).values_list("atom", flat=True).distinct()
result = {}
for atom in atoms:
result["atom"] = Question.objects.filter(atom__id=atom)
I suppose that it could be done with one query using annotate, Subquery and other Django's staff. Is this possible?

You can fetch this in one query, but you need to do the grappig yourself:
from itertools import groupby
from operator import attrgetter
questions = Question.objects.filter(
atom__chapter_id=1
).order_by('atom')
result = {
k: list(vs)
for k, vs in groupby(questions, attrgetter('atom_id'))
}
Here result is a dictionary that maps the primary keys of atom_ids on a list of Question objects with that atom_id.

Related

Problem with aggregation by annotated fields

I have models:
class Publisher(Model):
name = TextField()
class Author(Model):
name = TextField()
class Book(Model):
publisher = ForeignKey("Publisher")
author = ForeignKey("Author")
class Magazine(Model):
publisher = ForeignKey("Publisher")
writer = ForeignKey("Author")
I want to know which authors wrote for publishers. My version is this:
from django.db.models import TextField, F, Subquery, OuterRef
from django.contrib.postgres.aggregates import StringAgg # I use postgres
# to lead to the same name
books = Book.objects.annotate(author_name=F("author__name"))
magazines = Magazine.objects.annotate(author_name=F("writer__name"))
books = books.values("publisher_id", "author_name")
magazines = magazines.values("publisher_id", "author_name")
product = books.union(magazines)
# !! here I have a problem with grouping
product = product.group_by(
"publisher_id"
).annonate(
author_names=StringAgg("author_name", ";")
)
publishers = Publisher.objects.all().annotate(
author_names=Subquery(
product.filter(publisher_id=OuterRef("id")).values("author_names")[:1],
output_field=TextField()
)
)
# I was expecting something like
# name | author_names
# ------------------------------------------
# Publisher1 | Author1;Author2;Author3
# Publisher2 | Author2
# Publisher3 | Author2;Author3
The problem is that QuerySet has no .group_by() method, instead the .values() method is suggested (product.values("publisher_id").annonate(...)).
But this is complicated by the fact that I had previously called .values("publisher_id", "author_name") to bring two different models into the same view.
I also tried using .only("publisher_id", "author_name"), but (maybe it's a Django bug) this method can't work together with annotated and normal fields.
Is there any way to fix this problem or some other way to get a list of authors for a publisher?

Django reverse m2m query

Using the models from https://docs.djangoproject.com/en/dev/topics/db/queries/#making-queries with minor modifications:
from django.db import models
class Blog(models.Model):
name = models.CharField(max_length=100)
class Author(models.Model):
name = models.CharField(max_length=200)
joined = models.DateField()
def __str__(self):
return self.name
class Entry(models.Model):
blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
headline = models.CharField(max_length=255)
authors = models.ManyToManyField(Author)
rating = models.IntegerField()
I would like to create a dictionary from Author to Entries, where the Author joined this year, and the Entry has a rating of 4 or better. The structure of the resulting dict should look like:
author_entries = {author1: [set of entries], author2: [set of entries], etc.}
while hitting the database less than 3'ish times (or at least not proportional to the number of Authors or Entries).
My first attempt (db hits == number of authors, 100 authors 100 db-hits):
res = {}
authors = Author.objects.filter(joined__year=date.today().year)
for author in authors:
res[author] = set(author.entry_set.filter(rating__gte=4))
second attempt, trying to read entries in one go:
res = {}
authors = Author.objects.filter(joined__year=date.today().year)
entries = Entry.objects.select_related().filter(rating__gte=4, authors__in=authors)
for author in authors:
res[author] = {e for e in entries if e.authors.filter(pk=author.pk)}
this one is even worse, 100 authors, 198 db-hits (the original second attempt used {e for e in entries if author in e.authors}, but Django wouldn't have it.
The only method I've found involves raw-sql (4 db-hits):
res = {}
_authors = Author.objects.filter(joined__year=date.today().year)
_entries = Entry.objects.select_related().filter(rating__gte=4, authors__in=_authors)
authors = {a.id: a for a in _authors}
entries = {e.id: e for e in _entries}
c = connection.cursor()
c.execute("""
select entry_id, author_id
from sampleapp_entry_authors
where author_id in (%s)
""" % ','.join(str(v) for v in authors.keys()))
res = {a: set() for a in _authors}
for eid, aid in c.fetchall():
if eid in entries:
res[authors[aid]].add(entries[eid])
(apologies for using string substitutions in the c.execute(..) call -- I couldn't find the syntax sqlite wanted for a where in ? call).
Is there a more Djangoesque way to do this?
I've created a git repo with the code I'm using (https://github.com/thebjorn/revm2m), the tests are in https://github.com/thebjorn/revm2m/blob/master/revm2m/sampleapp/tests.py
You can use a Prefetch-object [Django-doc] for that:
from django.db.models import Prefetch
good_ratings = Prefetch(
'entry_set',
queryset=Entry.objects.filter(rating__gte=4),
to_attr='good_ratings'
)
authors = Author.objects.filter(
joined__year=date.today().year
).prefetch_related(
good_ratings
)
Now the Author objects in authors will have an extra attribute good_ratings (the value of the to_attr of the Prefetch object) that is a preloaded QuerySet containing the Entrys with a rating greater than or equal to four.
So you can post-process these like:
res = {
author: set(author.good_ratings)
for author in authors
}
Although since the Author objects (from this QuerySet, not in general), already carry the attribute, so there is probably not much use anyway.

icontains and getlist django python

We are trying to return a list of titles for the Django API, in which the title can have a few keywords.
So for instance, if we use the __icontains method to search for "money" and "world" (api.com/?keyworld=money&keyword=world) this will return all records that contain money, world or both.
The related SQL statement is:
select * from news
where news_source = 1 or news_source = 2
and news_title like '%money%' or news_title like '%world%'
We are trying to use this code to allow the user to have multiple keywords for the __icontains as well as multiple sources, so the end goal is:
api.com/?keyworld=money&keyword=world&source=1&source=2
Our code:
def get_queryset(self):
queryset = News.objects.all()
title = self.request.query_params.getlist('title')
source = self.request.query_params.getlist('source')
if title:
queryset = queryset.filter(news_title__icontains=title, news_source__in=source)
return queryset
The issue is that this is only returning the second keyword if a second keyword is used, and not other keywords prior to what is typed in &keyword=.
You can not perform an __icontains with a list, but you can for example design a function that, for a list constructs the logical or of these values. For example:
from django.db.models import Q
from functools import reduce
from operator import or_
def or_fold(list_of_qs):
if list_of_qs:
return reduce(or_, list_of_qs)
else:
return Q()
def unroll_lists_or(qs, **kwargs):
return qs.filter([
or_fold(Q(**{k: vi}) for vi in v)
for k, v in kwargs.items()
])
You can then call the unroll_lists_or with a queryset, and each item should be an iterable (for example a list). It will then perform or-logic between the items of the list, and and-logic between different keys. In case an iterable is empty, it is ignored.
So we can then write the check as:
unroll_lists_or(queryset, news_title__icontains=title, news_source=source)
In case title contains two items (so title == [title1, title2]), and source contains three items (so source = [source1, source2, source3]), then this will result in:
qs.filter(
Q(news_title__icontains=title1) | Q(news_title__icontains=title2),
Q(news_source=source1) | Q(news_source=source2) | Q(news_source=source3)
)
You can however combine it with an .filter(..) for the __in check. For example:
queryset = News.objects.all()
if source:
queryset = queryset.filter(news_source__in=source)
queryset = unroll_lists_or(queryset, news_title__icontains=title)
I was able to solve this by creating 2 separate functions within the get_querset() function, which is called when a GET request is made.
def get_queryset(self):
queryset = News.objects.all()
source_list = self.request.query_params.getlist('source')
keyword_list = self.request.query_params.getlist('title')
if source_list or keyword_list:
def create_q_source(*args):
list = [*args]
source = Q()
for value in list:
source.add(Q(news_source=value), Q.OR)
return source
def create_q_keyword(*args):
list = [*args]
keyword = Q()
for value in list:
keyword.add(Q(news_title__icontains=value), Q.OR)
return keyword
queryset = queryset.filter(create_q_source(*source_list),create_q_keyword(*keyword_list))
return queryset
Edit:
When you go to the api link and pass in the parameters, filtering will occur based on what is passed in:
http://127.0.0.1:8000/api/notes/?keyword=trump&keyword=beyond&keyword=money&source=1
SQL Equivalent:
select * from news where news_source = 1 AND news_title like '%beyond%' OR news_title like '%money%'

Creating a queryset which represents a union of querysets

Let's say I have the following models:
class House(models.Model):
address = models.CharField(max_length=255)
class Person(models.Model):
name = models.CharField(max_length=50)
home = models.ForeignKey(House, null=True, related_name='tenants')
class Car(models.Model):
make = models.CharField(max_length=50)
owner = models.ForeignKey(Person)
Let's say I have a need (strange one, albeit) to get:
list of people who live in a house or are named 'John'
list of cars of the above people
I would like to have two functions:
get_tenants_or_johns(house)
get_cars_of_tenants_or_johns(house)
I could define them as:
from django.db.models.query_utils import Q
def get_cars_of_tenants_or_johns(house):
is_john = Q(owner__in=Person.objects.filter(name='John'))
is_tenant = Q(owner__in=house.tenants.all())
return Car.filter(is_john | is_tenant)
def get_tenants_or_johns(house):
johns = Person.objects.filter(name='John')
tenants = house.tenants.all()
return set(johns) | set(tenants)
The problem is that the logic is repeated in the above examples. If I could get get_tenants_or_johns(house) to return a queryset I could define get_cars_of_tenants_or_johns(house) as:
def get_cars_of_tenants_or_johns(house):
return Car.objects.filter(owner__in=get_tenants_or_johns(house))
In order to do that, get_tenants_or_johns(house) would need to return a union of querysets, without turning them into other collections.
I cannot figure out how to implement get_tenants_or_johns(house) so that it would return a queryset containing a SQL UNION. Is there a way to do that? If not, is there an alternate way to achieve what I am trying to do?
The | operator on two querysets will return a new queryset representing a union.
The function will need to change to (got rid of set() wrappers):
def get_tenants_or_johns(house):
johns = Person.objects.filter(name='John')
tenants = house.tenants.all()
return johns | tenants
and everything will work exactly like needed.
You mention users who live in a house, but have no mention of your User model.
I think you really need to take a long look at the structure of your application - there are probably much easier ways to accomplish your goal.
But to answer your question let's set up three helper functions. Since, as I mentioned above, you haven't outlined what you want to do with the User class - I've assumed that the house that will be passed to these functions is an address:
helpers.py
def get_johns(house):
is_john = Person.objects.filter(name='John')
return is_john
def get_cars_of_tenants(house):
cars = Car.objects.filter(owner__home__address=house)
return cars
def get_tenants(house):
tenants = Person.objects.filter(home__address=house)
return tenants
Now you could create a view for each of your combination queries:
views.py:
import helpers.py
from itertools import chain
def get_cars_of_tenants_or_johns(request, house):
results = list(chain(get_cars_of_tenants(house), get_johns(house)))
return render_to_response('cars_or_johns.html', {"results": results,})
def get_tenants_or_johns(request, house):
results = list(chain(get_tenants(house), get_johns(house)))
return render_to_response('tenants_or_johns.html', {"results": results,})
And this can go on for all of the various combinations. What is returned is results which is a list of all of the matches that you can iterate over.

Python search list of objects that contain objects, partial matches

I'm trying to build a simple search engine for a small website. My initial thought is to avoid using larger packages such as Solr, Haystack, etc. because of the simplistic nature of my search needs.
My hope is that with some guidance I can make my code more pythonic, efficient, and most importantly function properly.
Intended functionality: return product results based on full or partial matches of item_number, product name, or category name (currently no implementation of category matching)
Some code:
import pymssql
import utils #My utilities
class Product(object):
def __init__(self, item_number, name, description, category, msds):
self.item_number = str(item_number).strip()
self.name = name
self.description = description
self.category = category
self.msds = str(msds).strip()
class Category(object):
def __init__(self, name, categories):
self.name = name
self.categories = categories
self.slug = utils.slugify(name)
self.products = []
categories = (
Category('Food', ('123', '12A')),
Category('Tables', ('354', '35A', '310', '31G')),
Category('Chemicals', ('845', '85A', '404', '325'))
)
products = []
conn = pymssql.connect(...)
curr = conn.cursor()
for Category in categories:
for c in Category.categories:
curr.execute('SELECT item_number, name, CAST(description as text), category, msds from tblProducts WHERE category=%s', c)
for row in curr:
product = Product(row[0], row[1], row[2], row[3], row[4])
products.append(product)
Category.products.append(product)
conn.close()
def product_search(*params):
results = []
for product in products:
for param in params:
name = str(product.name)
if (name.find(param.capitalize())) != -1:
results.append(product)
item_number = str(product.item_number)
if (item.number.find(param.upper())) != -1:
results.append(product)
print results
product_search('something')
MS SQL database with tables and fields I cannot change.
At most I will pull in about 200 products.
Some things that jump out at me. Nested for loops. Two different if statements in the product search which could result in duplicate products being added to the results.
My thought was that if I had the products in memory (the products will rarely change) I could cache them, reducing database dependence and possibly providing an efficient search.
...posting for now... will come back and add more thoughts
Edit:
The reason I have a Category object holding a list of Products is that I want to show html pages of Products organized by Category. Also, the actual category numbers may change in the future and holding a tuple seemed like simple painless solution. That and I have read-only access to the database.
The reason for a separate list of products was somewhat of a cheat. I have a page that shows all products with the ability to view MSDS (safety sheets). Also it provided one less level to traverse while searching.
Edit 2:
def product_search(*params):
results = []
lowerParams = [ param.lower() for param in params ]
for product in products:
item_number = (str(product.item_number)).lower()
name = (str(product.name)).lower()
for param in lowerParams:
if param in item_number or param in name:
results.append(product)
print results
Prepare all variables outside of the loops and use in instead of .find if you don't need the position of the substring:
def product_search(*params):
results = []
upperParams = [ param.upper() for param in params ]
for product in products:
name = str(product.name).upper()
item_number = str(product.item_number).upper()
for upperParam in upperParams:
if upperParam in name or upperParam in item_number:
results.append(product)
print results
If both the name and number matches the search parameters, the product will appear twice on the result list.
Since the products count is a small number, I recommend constructing a SELECT query like:
def search(*args):
import operator
cats = reduce(operator.add, [list(c.categories) for c in categories], [])
query = "SELECT * FROM tblProducts WHERE category IN (" + ','.join('?' * len(cats)) + ") name LIKE '%?%' or CAST(item_number AS TEXT) LIKE '%?%' ..."
curr.execute(query, cats + list(args)) # Not actual code
return list(curr)

Categories

Resources