django sort queryset by newest dates from different fields

django sort queryset by newest dates from different fields - python

How can I sort queryset just like forums do?
I mean ordering by one date field, and there will be another order date
to use for sorting, for example in forums: subjects get sorted by on created date (or updated date), but if any old subject got a new reply, then it will be shown before other subjects. This is exactly what I'm trying to do.
What I tried so far:
subjects = Subjects.objects.filter(active=True).order_by('-sticky', '-updated_date', 'reply__updated_date')
but in this case results are repeated as reply counts.
I've also tried:
subjects = Subjects.objects.filter(active=True).order_by('-sticky', '-updated_date').order_by('reply__updated_date')
but in this case the second order_by overrides the first one.
Is there any guideline to follow?
BTW, I'm pretty sure you got it, but just to be clear, Subject is a model and Reply is another model, connected by a foreign key.

If I got it correctly, you can achieve this by using annotations and database functions.
One thing at a time.
You want to sort Subjects by reply date in descending order: in Django's own terms, you need to annotate() each Subjects with the maximun updated_date of its related model reply, and sort Subjects by the new field in reverse order, like this:
from django.db.models import Max
subjects = Subject.objects.annotate(lr=Max('reply__updated_date'))\
.order_by('-lr')
Now you have all subjects with at least one reply sorted, but what about the subjects without replies?
We can rephrase our sorting like this: "sort by last reply date if there are replies, and by last subject update otherwise".
The "otherwise" part can be achieved by the Coalesce() function, which replaces NULLs with another expression of our choice. In our case it'll replace what's inside Max() with the updated_date field of the Subjects:
from django.db.models import Max
from django.db.models.functions import Coalesce
subjects = Subject.objects\
.annotate(lr=Coalesce(Max('reply__updated_date'), 'updated_date'))\
.order_by('-lr')

Related

django-tables2 doesn't sort columns properly

In the project, I have used RequestConfig(request).configure(table) to apply sorting across the columns. All of them are defined as ArrayField(models.CharField(max_length=50), null= True) . The problem is that the title and year can be sorted, but the other three cannot be sorted properly. I get larger values among the smaller ones. I suppose the Title and Year are strings, but the others are lists containing integers. Can some one have a lead on how to properly sort the three columns in the middle ?Table Headers
Second column not sorted properly

This appears to be the result of sorting integers that are stored as strings. You have pointed out that the columns of your table are defined using a CharField model, which according to the Django docs, is for storing strings.
You should probably define each of the CoAuthors, Citations, and Ncitations columns using IntegerField.

Django - Time aggregates of DatetimeField across queryset

(using django 1.11.2, python 2.7.10, mysql 5.7.18)
If we imagine a simple model:
class Event(models.Model):
happened_datetime = DateTimeField()
value = IntegerField()
What would be the most elegant (and quickest) way to run something similar to:
res = Event.objects.all().aggregate(
Avg('happened_datetime')
)
But that would be able to extract the average time of day for all members of the queryset. Something like:
res = Event.objects.all().aggregate(
AvgTimeOfDay('happened_datetime')
)
Would it be possible to do this on the db directly?, i.e., without running a long loop client-side for each queryset member?
EDIT:
There may be a solution, along those lines, using raw SQL:
select sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) from event_event;
Performance-wise, this runs in 0.015 second for ~23k rows on a laptop, not optimised, etc. Assuming that could yield accurate/correct results and since time is only a secondary factor, could I be using that?

Add another integer field to your model that contains only the hour of the day extracted from the happened_datetime.
When creating/updating a model instance you need to update this new field accordingly whenever the happened_datetime is set/updated. You can extract the hours of the day for example by reading datetime.datetime.hour. Or use strftime to create a value to your liking.
Aggregation should then work as proposed by yourself.
EDIT:
Django's ORM has Extract() as a function. Example from the docs adapted to your use case:
>>> # How many experiments completed in the same year in which they started?
>>> Event.objects.aggregate(
... happenend_datetime__hour=Extract('happenend_datetime', 'hour'))
(Not tested!)
https://docs.djangoproject.com/en/1.11/ref/models/database-functions/#extract

So after a little search and tries.. the below seems to work. Any comments on how to improve (or hinting as to why it is completely wrong), are welcome! :-)
res = Event.objects.raw('''
SELECT id, sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) AS average_time_of_day
FROM event_event
WHERE happened_datetime BETWEEN %s AND %s;''', [start_datetime, end_datetime])
print res[0].__dict__
# {'average_time_of_day': datetime.time(18, 48, 10, 247700), '_state': <django.db.models.base.ModelState object at 0x0445B370>, 'id': 9397L}
Now the ID returned is that of the last object falling in the datetime range for the WHERE clause. I believe Django just inserts that because of "InvalidQuery: Raw query must include the primary key".
Quick explanation of the SQL series of function calls:
Extract HH:MM:SS from all datetime fields
Convert the time values to seconds via time_to_sec.
average all seconds values
convert averaged seconds value back into time format (HH:MM:SS)
Don't know why Django insists on returning microseconds but that is not really relevant. (maybe the local ms at which the time object was instantiated?)
Performance note: this seems to be extremely fast but then again I haven't tested that bit. Any insight would be kindly appreciated :)

Extracting a year from a DateField in django in the database

I know its possible to use lookups to filter a DateField in the database by year like so:
MyModel.objects.filter(date__year=2000) # Returns all objects with a year of 2000
But, I want to the extract that when using a values() call:
MyModel.objects.all().values('label','date__year') # Fails!
FieldError: Cannot resolve keyword 'year' into field. Join on 'date' not permitted.
I've tried making a custom lookup, but that doesn't apply to a values call.
How can I extract just the year in this kind of query?
edit: To be clear, I know I could do this a million ways in Python iterating over the queryset once its back from the database. But I don't want to do this.

You can use list comprehension if you want.
The syntax would be this:
[(x['label'], x['date'].year) for MyModel.objects.all().values('label','date')]

Django aggregate count of records per day

I've got a django app that is doing some logging. My model looks like this:
class MessageLog(models.Model):
logtime = models.DateTimeField(auto_now_add=True)
user = models.CharField(max_length=50)
message = models.CharField(max_length=512)
What a want to do is get the average number of messages logged per day of the week so that I can see which days are the most active. I've managed to write a query that pulls the total number of messages per day which is:
for i in range(1, 8):
MessageLog.objects.filter(logtime__week_day=i).count()
But I'm having trouble calculating the average in a query. What I have right now is:
for i in range(1, 8):
MessageLog.objects.filter(logtime__week_day=i).annotate(num_msgs=Count('id')).aggregate(Avg('num_msgs'))
For some reason this is returning 1.0 for every day though. I looked at the SQL it is generating and it is:
SELECT AVG(num_msgs) FROM (
SELECT
`myapp_messagelog`.`id` AS `id`, `myapp_messagelog`.`logtime` AS `logtime`,
`myapp_messagelog`.`user` AS `user`, `myapp_messagelog`.`message` AS `message`,
COUNT(`myapp_messagelog`.`id`) AS `num_msgs`
FROM `myapp_messagelog`
WHERE DAYOFWEEK(`myapp_messagelog`.`logtime`) = 1
GROUP BY `myapp_messagelog`.`id` ORDER BY NULL
) subquery
I think the problem might be coming from the GROUP BY id but I'm not really sure. Anyone have any ideas or suggestions? Thanks in advance!

The reason your listed query always gives 1 is because you're not grouping by date. Basically, you've asked the database to take the MessageLog rows that fall on a given day of the week. For each such row, count how many ids it has (always 1). Then take the average of all those counts, which is of course also 1.
Normally, you would need to use a values clause to group your MessageLog rows prior to your annotate and aggregate parts. However, since your logtime field is a datetime rather than just a date, I am not sure you can express that directly with Django's ORM. You can definitely do it with an extra clause, as shown here. Or if you felt like it you could declare a view in your SQL with as much of the aggregating and average math as you liked and declare an unmanaged model for it, then just use the ORM normally.
So an extra field works to get the total number of records per actual day, but doesn't handle aggregating the average of the computed annotation. I think this may be sufficiently abstracted from the model that you'd have to use a raw SQL query, or at least I can't find anything that makes it work in one call.
That said, you already know how you can get the total number of records per weekday in a simple query as shown in your question.
And this query will tell you how many distinct date records there are on a given weekday:
MessageLog.objects.filter(logtime__week_day=i).dates('logtime', day').count()
So you could do the averaging math in Python instead, which might be simpler than trying get the SQL right.
Alternately, this query will get you the raw number of messages for all weekdays in one query rather than a for loop:
MessageLog.objects.extra({'weekday': "dayofweek(logtime)"}).values('weekday').annotate(Count('id'))
But I haven't been able to get a nice query to give you the count of distinct dates for each weekday annotated to that - dates querysets lose the ability to handle annotate calls, and annotating over an extra value doesn't seem to work either.
This has been surprisingly tricky, given that it's not that hard a SQL expression.

I do something similar with a datetime field, but annotating over extra values does work for me. I have a Record model with a datetime field "created_at" and a "my_value" field I want to get the average for.
from django.db.models import Avg
qs = Record.objects.extra({'created_day':"date(created_at)"}).\
values('created_day').\
annotate(count=Avg('my_value'))
The above will group by the day of the datetime value in "created_at" field.

queryset.extra(select={'day': 'date(logtime)'}).values('day').order_by('-day').annotate(Count('id'))

group by in django

How can i create simple group by query in trunk version of django?
I need something like
SELECT name
FROM mytable
GROUP BY name
actually what i want to do is simply get all entries with distinct names.

If you need all the distinct names, just do this:
Foo.objects.values('name').distinct()
And you'll get a list of dictionaries, each one with a name key. If you need other data, just add more attribute names as parameters to the .values() call. Of course, if you add in attributes that may vary between rows with the same name, you'll break the .distinct().
This won't help if you want to get complete model objects back. But getting distinct names and getting full data are inherently incompatible goals anyway; how do you know which row with a given name you want returned in its entirety? If you want to calculate some sort of aggregate data for all the rows with a given name, aggregation support was recently added to Django trunk and can take care of that for you.

Add .distinct to your queryset:
Entries.objects.filter(something='xxx').distinct()

this will not work because every row have unique id. So every record is distinct..
To solve my problem i used
foo = Foo.objects.all()
foo.query.group_by = ['name']
but this is not official API.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.