What is most efficient way to get ranking in QuerySet?

What is most efficient way to get ranking in QuerySet? - python

I'm trying to do ranking of a QuerySet efficiently (keeping it a QuerySet so I can keep the filter and order_by functions), but cannot seem to find any other way then to iterate through the QuerySet and tack on a rank. I dont want to add rank to my model if I don't have to.
I know how I can get the values I need through SQL query, but can't seem to translate that into Django:
SET #rank = 0, #prev_val = NULL;
SELECT rank, name, school, points FROM
(SELECT #rank := IF(#prev_val = points, #rank, #rank+1) AS rank, #prev_val := points, points, CONCAT(users.first_name, ' ', users.last_name) as name, school.name as school
FROM accounts_userprofile
JOIN schools_school school ON school_id = school.id
JOIN auth_user users ON user_id = users.id
ORDER BY points DESC) as profile
ORDER BY rank DESC
I found that if I did iterate through the QuerySet and tacked on 'rank' manually and then further filtered the results, my 'rank' would disappear - unless is turned it into a list (which made filtering and sorting a bit of pain). Is there any other way you can think of to add rank to my QuerySet? Is there any way I could do the above query and get a QuerySet with filter and order_by functions still intact? I'm currently using the jQuery DataTables with Django to generate a leaderboard with pagination (which is why I need to preserver filtering and order_by).
Thanks in advance! Sorry if I did not post my question correctly - any help would be much appreciated.

I haven't used it myself, but I'm pretty sure you can do that with the extra() method.

Looking at your raw SQL I don't see anything special in your logic. You are simply enumerating all SQL rows on the join ordered by points with a counter from 1, while collapsing the same point values to a same rank.
My suggestion would be to write a custom manager that uses raw() or extra() method. In your manager you would use python's enumerate on all model instances as a rank previously ordered by points. Of course you would have to keep current max value of points and override what enumerate returns to you if they have the same amount of points. Look here for an example of something similar.
Then you could do something like:
YourQuerySet.objects.with_rankings().all()

Related

How to do a Django subquery

I have two examples of code which accomplish the same thing. One is using python, the other is in SQL.
Exhibit A (Python):
surveys = Survey.objects.all()
consumer = Consumer.objects.get(pk=24)
for ballot in consumer.ballot_set.all()
consumer_ballot_list.append(ballot.question_id)
for survey in surveys:
if survey.id not in consumer_ballot_list:
consumer_survey_list.append(survey.id)
Exhibit B (SQL):
SELECT * FROM clients_survey WHERE id NOT IN (SELECT question_id FROM consumers_ballot WHERE consumer_id=24) ORDER BY id;
I want to know how I can make exhibit A much cleaner and more efficient using Django's ORM and subqueries.
In this example:
I have ballots which contain a question_id that refers to the survey which a consumer has answered.
I want to find all of the surveys that the consumer hasn't answered. So I need to check each question_id(survey.id) in the consumer's set of ballots against the survey model's id's and make sure that only the surveys that the consumer does NOT have a ballot of are returned.

You more or less have the correct idea. To replicate your SQL code using Django's ORM you just have to break the SQL into each discrete part:
1.create table of question_ids the consumer 24 has answered
2.filter the survey for all ids not in the aformentioned table
consumer = Consumer.objects.get(pk=24)
# step 1
answered_survey_ids = consumer.ballot_set.values_list('question_id', flat=True)
# step 2
unanswered_surveys_ids = Survey.objects.exclude(id__in=answered_survey_ids).values_list('id', flat=True)
This is basically what you did in your current python based approach except I just took advantage of a few of Django's nice ORM features.
.values_list() - this allows you to extract a specific field from all the objects in the given queryset.
.exclude() - this is the opposite of .filter() and returns all items in the queryset that don't match the condition.
__in - this is useful if we have a list of values and we want to filter/exclude all items that match those values.
Hope this helps!

Django filter and get the whole record back when using a .values() column-based annotation

This may be a common query but I've struggled to find an answer. This answer to an earlier question gets me half-way using .annotate() and Count but I can't figure out how then to get the full record for the filtered results.
I'm working with undirected networks and would like to limit the query based on a subset of target nodes.
Sample model:
class Edges(Model):
id = models.AutoField(primary_key=True)
source = models.BigIntegerField()
target = models.BigIntegerField()
I want to get a queryset of Edges where the .target exists within a list passed to filter. I then want to exclude any Edges where the source is not greater than a number (1 in the example below but may change).
Here's the query so far (parenthesis added just for better legibility):
(Edges.objects.filter(target__in=[1234,5678, 9012])
.values('source')
.annotate(source_count=Count("source"))
.filter(source_count__gt=1)
)
This query just delivers the source and new source_count fields but I want the whole record (id, source and target) for the subset.
Should I be using this as a subquery or am I missing some obvious Django-foo?

I would suggest either
Edges.objects.filter(target__in=[1234,5678, 9012], source_count__gt=1)
.annotate(source_count=Count('source'))
.values('id', 'source', 'target', 'source_count')
to get only the values of id, source, target and source_count, or
Edges.objects.filter(target__in=[1234,5678, 9012], source_count__gt=1)
.annotate(source_count=Count('source'))
to get a QuerySet of Edges instances, where not only you get the above values but you can call any methods you have defined on them (might be a db consuming, though).

Django postgres order_by distinct on field

We have a limitation for order_by/distinct fields.
From the docs: "fields in order_by() must start with the fields in distinct(), in the same order"
Now here is the use case:
class Course(models.Model):
is_vip = models.BooleanField()
...
class CourseEvent(models.Model):
date = models.DateTimeField()
course = models.ForeignKey(Course)
The goal is to fetch the courses, ordered by nearest date but vip goes first.
The solution could look like this:
CourseEvent.objects.order_by('-course__is_vip', '-date',).distinct('course_id',).values_list('course')
But it causes an error since the limitation.
Yeah I understand why ordering is necessary when using distinct - we get the first row for each value of course_id so if we don't specify an order we would get some arbitrary row.
But what's the purpose of limiting order to the same field that we have distinct on?
If I change order_by to something like ('course_id', '-course__is_vip', 'date',) it would give me one row for course but the order of courses will have nothing in common with the goal.
Is there any way to bypass this limitation besides walking through the entire queryset and filtering it in a loop?

You can use a nested query using id__in. In the inner query you single out the distinct events and in the outer query you custom-order them:
CourseEvent.objects.filter(
id__in=CourseEvent.objects\
.order_by('course_id', '-date').distinct('course_id')
).order_by('-course__is_vip', '-date')
From the docs on distinct(*fields):
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

Django - Following ForeignKey relationships "backward" for entire QuerySet

is it possible to follow ForeignKey relationships backward for entire querySet?
i mean something like this:
x = table1.objects.select_related().filter(name='foo')
x.table2.all()
when table1 hase ForeignKey to table2.
in
https://docs.djangoproject.com/en/1.2/topics/db/queries/#following-relationships-backward
i can see that it works only with get() and not filter()
Thanks

You basically want to get QuerySet of different type from data you start with.
class Kid(models.Model):
mom = models.ForeignKey('Mom')
name = models.CharField…
class Mom(models.Model):
name = models.CharField…
Let's say you want to get all moms having any son named Johnny.
Mom.objects.filter(kid__name='Johnny')
Let's say you want to get all kids of any Lucy.
Kid.objects.filter(mom__name='Lucy')

You should be able to use something like:
for y in x:
y.table2.all()
But you could also use get() for a list of the unique values (which will be id, unless you have a different specified), after finding them using a query.
So,
x = table1.objects.select_related().filter(name='foo')
for y in x:
z=table1.objects.select_related().get(y.id)
z.table2.all()
Should also work.

You can also use values() to fetch specific values of a foreign key reference. With values the select query on the DB will be reduced to fetch only those values and the appropriate joins will be done.
To re-use the example from Krzysztof Szularz:
jonny_moms = Kid.objects.filter(name='Jonny').values('mom__id', 'mom__name').distinct()
This will return a dictionary of Mom attributes by using the Kid QueryManager.

SQLAlchemy filter query by related object

Using SQLAlchemy, I have a one to many relation with two tables - users and scores. I am trying to query the top 10 users sorted by their aggregate score over the past X amount of days.
users:
id
user_name
score
scores:
user
score_amount
created
My current query is:
top_users = DBSession.query(User).options(eagerload('scores')).filter_by(User.scores.created > somedate).order_by(func.sum(User.scores).desc()).all()
I know this is clearly not correct, it's just my best guess. However, after looking at the documentation and googling I cannot find an answer.
EDIT:
Perhaps it would help if I sketched what the MySQL query would look like:
SELECT user.*, SUM(scores.amount) as score_increase
FROM user LEFT JOIN scores ON scores.user_id = user.user_id
WITH scores.created_at > someday
ORDER BY score_increase DESC

The single-joined-row way, with a group_by added in for all user columns although MySQL will let you group on just the "id" column if you choose:
sess.query(User, func.sum(Score.amount).label('score_increase')).\
join(User.scores).\
filter(Score.created_at > someday).\
group_by(User).\
order_by("score increase desc")
Or if you just want the users in the result:
sess.query(User).\
join(User.scores).\
filter(Score.created_at > someday).\
group_by(User).\
order_by(func.sum(Score.amount))
The above two have an inefficiency in that you're grouping on all columns of "user" (or you're using MySQL's "group on only a few columns" thing, which is MySQL only). To minimize that, the subquery approach:
subq = sess.query(Score.user_id, func.sum(Score.amount).label('score_increase')).\
filter(Score.created_at > someday).\
group_by(Score.user_id).subquery()
sess.query(User).join((subq, subq.c.user_id==User.user_id)).order_by(subq.c.score_increase)
An example of the identical scenario is in the ORM tutorial at: http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#selecting-entities-from-subqueries

You will need to use a subquery in order to compute the aggregate score for each user. Subqueries are described here: http://www.sqlalchemy.org/docs/05/ormtutorial.html?highlight=subquery#using-subqueries

I am assuming the column (not the relation) you're using for the join is called Score.user_id, so change it if this is not the case.
You will need to do something like this:
DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]
However this will result in tuples of (user_id, total_score). I'm not sure if the computed score is actually important to you, but if it is, you will probably want to do something like this:
users_scores = []
q = DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]
for user_id, total_score in q:
user = DBSession.query(User)
users_scores.append((user, total_score))
This will result in 11 queries being executed, however. It is possible to do it all in a single query, but due to various limitations in SQLAlchemy, it will likely create a very ugly multi-join query or subquery (dependent on engine) and it won't be very performant.
If you plan on doing something like this often and you have a large amount of scores, consider denormalizing the current score onto the user table. It's more work to upkeep, but will result in a single non-join query like:
DBSession.query(User).order_by(User.computed_score.desc())
Hope that helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is most efficient way to get ranking in QuerySet? - python

I haven't used it myself, but I'm pretty sure you can do that with the extra() method.

Related

How to do a Django subquery

Django filter and get the whole record back when using a .values() column-based annotation

Django postgres order_by distinct on field

Django - Following ForeignKey relationships "backward" for entire QuerySet

SQLAlchemy filter query by related object

Categories

Resources