Hay all, i have a simple model like this
def Article(models.Model):
upvotes = models.ManyToManyField(User, related_name='article_upvotes')
downvotes = models.ManyToManyField(User, related_name='article_downvotes')
def votes(self):
return self.upvotes - self.downvotes
With the view i can do things like
article_votes = article.votes
Am i able to order by the votes function? Something like
article = Article.objects.order_by('votes')
EDIT
I'm not near my dev system at the moment, so the syntax might be a little off.
You can sort the list after the query returns the results:
article_votes = sorted(article.votes, key=lambda a: a.votes())
sorted takes a list and sorts it. You can provide a custom function that takes an element and returns the value to use when comparing elements. lambda a: a.votes() is an anonymous function that takes an article and returns the number of votes on the article.
If you are going to retrieve all the articles anyway, there's no downside to this solution. On the other hand, if you wanted only the top 10 articles by votes, then you're pulling all the articles from the db instead of letting the db do the sort, and only returning the top ten. Compared to a pure SQL solution, this is retrieving much more data from the database.
This is a faster version of what Ned Batchelder suggested - as it does the counting in the database:
articles = list(Article.objects.annotate(upvote_num=models.Count('upvotes'), downvote_num=models.Count('downvotes')))
articles.sort(key=lambda a: a.upvotes - a.downvotes)
You can also do this completely inside the database:
articles = Article.objects.raw("""
SELECT DISTINCT id from articles_article,
COUNT(DISTINCT upv) AS num_upvotes,
COUNT(DISTINCT downv) AS num_downvotes,
(num_upvotes - num_downvotes) AS score
INNER JOIN [article_user_upvotes_m2m_table_name] AS upv
ON upv.article_id = id
INNER JOIN [article_user_downvotes_m2m_table_name] AS downv
ON downv.article_id = id
GROUP BY id
ORDER BY score
""")
-- but I'm not sure if the double join is a good idea in your case. Also, I'm not sure if all those DISCTINCTs are needed. It's quite likely that this query can be rewritten in some better way, but I don't have an idea at the moment..
Related
I have two examples of code which accomplish the same thing. One is using python, the other is in SQL.
Exhibit A (Python):
surveys = Survey.objects.all()
consumer = Consumer.objects.get(pk=24)
for ballot in consumer.ballot_set.all()
consumer_ballot_list.append(ballot.question_id)
for survey in surveys:
if survey.id not in consumer_ballot_list:
consumer_survey_list.append(survey.id)
Exhibit B (SQL):
SELECT * FROM clients_survey WHERE id NOT IN (SELECT question_id FROM consumers_ballot WHERE consumer_id=24) ORDER BY id;
I want to know how I can make exhibit A much cleaner and more efficient using Django's ORM and subqueries.
In this example:
I have ballots which contain a question_id that refers to the survey which a consumer has answered.
I want to find all of the surveys that the consumer hasn't answered. So I need to check each question_id(survey.id) in the consumer's set of ballots against the survey model's id's and make sure that only the surveys that the consumer does NOT have a ballot of are returned.
You more or less have the correct idea. To replicate your SQL code using Django's ORM you just have to break the SQL into each discrete part:
1.create table of question_ids the consumer 24 has answered
2.filter the survey for all ids not in the aformentioned table
consumer = Consumer.objects.get(pk=24)
# step 1
answered_survey_ids = consumer.ballot_set.values_list('question_id', flat=True)
# step 2
unanswered_surveys_ids = Survey.objects.exclude(id__in=answered_survey_ids).values_list('id', flat=True)
This is basically what you did in your current python based approach except I just took advantage of a few of Django's nice ORM features.
.values_list() - this allows you to extract a specific field from all the objects in the given queryset.
.exclude() - this is the opposite of .filter() and returns all items in the queryset that don't match the condition.
__in - this is useful if we have a list of values and we want to filter/exclude all items that match those values.
Hope this helps!
What I want to accomplish is merge an unknown amount of querysets in the admin. I have a list with the authors a user can view and depending on the authors a user has in the list, he should be capable of seeing only their articles. What I have is:
def get_queryset(self, request):
#getting all the lists and doing not important stuff
return (qs.filter(author__name = list(list_of_authors)[0]) | qs.filter(author__name = list(list_of_authors)[len(list_of_authors)-1])).distinct()
This works if the user can view articles from two authors, however, for three it does not work. I tried using:
for index in list_of_authors:
return qs.filter(author__name = list(list_of_authors)[index])
The Author class has a name = Charfield(max_length=50).
Sadly I got only the last queryset. Is it even possible to merge querysets when the amount is unknown, because after a decent amount of searching I did not end up finding anything.
You are looking for for __in lookup.
You name field is not a container and you're comparing it with a container. As you can tell, doing the hard work is not as easy, so Django has done it for you with that lookup.
The short version: qs.filter(author__name__in=list_of_authors)
I'm running into an issue that I can't find an explanation for.
Given one object (in this case, an "Article"), I want to use another type of object (in this case, a "Category") to determine which other articles are most similar to article X, as measured by the number of categories they have in common. The relationship between Article and Category is Many-to-Many. The use case is to get a quick list of related Objects to present as links.
I know exactly how I would write the SQL by hand:
select
ac.article_id
from
Article_Category ac
where
ac.category_id in
(
select
category_id
from
Article_Category
where
article_id = 1 -- get all categories for article in question
)
and ac.article_id <> 1
group by
ac.article_id
order by
count(ac.category_id) desc, random() limit 5
What I'm struggling with is how to use the Django Model aggregation to match this logic and only run one query. I'd obv. prefer to do it within the framework if possible. Does anybody have pointers on this?
Adding this in now that I've found a way within the model framework to do this.
related_article_list = Article.objects.filter(category=self.category.all())\
.exclude(id=self.id)
related_article_ids = related_article_list.values('id')\
.annotate(count=models.Count('id'))\
.order_by('-count','?')
In the related_article_list part, other Article objects that match on 2 or more Categories will be included separate times. Thus, when using annotation to count them the number will be > 1 and they can be ordered that way.
I think the correct answer if you really want to filter articles on all category should look like this:
related_article_list = Article.objects.filter(category__in=self.category.all())\
.exclude(id=self.id)
I need to find an order with all order items with status = completed. It looks like this:
FINISHED_STATUSES = [17,18,19]
if active_tab == 'outstanding':
orders = orders.exclude(items__status__in=FINISHED_STATUSES)
However, this query only gives me orders with any order item with a completed status. How would I do the query such that I retrieve only those orders with ALL order items with a completed status?
I think that you need to do raw query here:
Set you orders and items model as Orders and Items:
# raw query
sql = """\
select `orders`.* from `%{orders_table}s` as `orders`
join `%{items_table}s` as `items`
on `items`.`%{item_order_fk}s` = `orders`.`%{order_pk}s`
where `items`.`%{status_field}s` in (%{status_list}s)
group by `orders`.`%{orders_pk}s`
having count(*) = %{status_count)s;
""" % {
"orders_table": Orders._meta.db_table,
"items_table": Items._meta.db_table,
"order_pk": Orders._meta.pk.colum,
"item_order_fk":Items._meta.get_field("order").colum,
"status_field": Items._meta.get_field("status").colum,
"status_list": str(FINISHED_STATUSES)[1:-1],
"status_count": len(FINISHED_STATUSES),
}
orders = Orders.objects.raw(sql)
I was able to get this done by a sort of hackish way. First, I added an additional Boolean column, is_finished. Then, to find an order with at least one non-finished item:
orders = orders.filter(items__status__is_finished=False)
This gives me all un-finished orders.
Doing the opposite of that gets the finished orders:
orders = orders.exclude(items__status__is_finished=False)
Adding the boolean field is a good idea. That way you have your business rules clearly defined in the model.
Now, let's say that you still wanted to do it without resorting to adding fields. This may very well be a requirement given a different set of circumstances. Unfortunately, you can't really use subqueries or arbitrary joins in the Django ORM. You could, however, build up Q objects and make an implicit join in the having clause using filter() and annotate().
from django.db.models.aggregates import Count
from django.db.models import Q
from functools import reduce
from operator import or_
total_items_by_orders = Orders.objects.annotate(
item_count=Count('items'))
finished_items_by_orders = Orders.objects.filter(
items__status__in=FINISHED_STATUSES).annotate(
item_count=Count('items'))
orders = total_items_by_orders.exclude(
reduce(or_, (Q(id=o.id, item_count=o.item_count)
for o in finished_items_by_orders)))
Note that using raw SQL, while less elegant, would usually be more efficient.
I've enjoyed building out a couple simple applications on the GAE, but now I'm stumped about how to architect a music collection organizer on the app engine. In brief, I can't figure out how to filter on multiple properties while sorting on another.
Let's assume the core model is an Album that contains several properties, including:
Title
Artist
Label
Publication Year
Genre
Length
List of track names
List of moods
Datetime of insertion into database
Let's also assume that I would like to filter the entire collection using those properties, and then sorting the results by one of:
Publication year
Length of album
Artist name
When the info was added into the database
I don't know how to do this without running into the exploding index conundrum. Specifically, I'd love to do something like:
Albums.all().filter('publication_year <', 1980).order('artist_name')
I know that's not possible, but what's the workaround?
This seems like a fairly general type of application. The music albums could be restaurants, bottles of wine, or hotels. I have a collection of items with descriptive properties that I'd like to filter and sort.
Is there a best practice data model design that I'm overlooking? Any advice?
There's a couple of options here: You can filter as best as possible, then sort the results in memory, as Alex suggests, or you can rework your data structures for equality filters instead of inequality filters.
For example, assuming you only want to filter by decade, you can add a field encoding the decade in which the song was recorded. To find everything before or after a decade, do an IN query for the decades you want to span. This will require one underlying query per decade included, but if the number of records is large, this can still be cheaper than fetching all the results and sorting them in memory.
Since storage is cheap, you could create your own ListProperty based indexfiles with key_names that reflect the sort criteria.
class album_pubyear_List(db.Model):
words = db.StringListProperty()
class album_length_List(db.Model):
words = db.StringListProperty()
class album_artist_List(db.Model):
words = db.StringListProperty()
class Album(db.Model):
blah...
def save(self):
super(Album, self).save()
# you could do this at save time or batch it and do
# it with a cronjob or taskqueue
words = []
for field in ["title", "artist", "label", "genre", ...]:
words.append("%s:%s" %(field, getattr(self, field)))
word_records = []
now = repr(time.time())
word_records.append(album_pubyear_List(parent=self, key_name="%s_%s" %(self.pubyear, now)), words=words)
word_records.append(album_length_List(parent=self, key_name="%s_%s" %(self.album_length, now)), words=words)
word_records.append(album_artist_List(parent=self, key_name="%s_%s" %(self.artist_name, now)), words=words)
db.put(word_records)
Now when it's time to search you create an appropriate WHERE clause and call the appropriate model
where = "WHERE words = " + "%s:%s" %(field-a, value-a) + " AND " + "%s:%s" %(field-b, value-b) etc.
aModel = "album_pubyear_List" # or anyone of the other key_name sorted wordlist models
indexes = db.GqlQuery("""SELECT __key__ from %s %s""" %(aModel, where))
keys = [k.parent() for k in indexes[offset:numresults+1]] # +1 for pagination
object_list = db.get(keys) # returns a sorted by key_name list of Albums
As you say, you can't have an inequality condition on one field and an order by another (or inequalities on two fields, etc, etc). The workaround is simply to use the "best" inequality condition to get data in memory (where "best" means the one that's expected to yield the least data) and then further refine it and order it by Python code in your application.
Python's list comprehensions (and other forms of loops &c), list's sort method and the sorted built-in function, the itertools module in the standard library, and so on, all help a lot to make these kinds of tasks quite simple to perform in Python itself.