Django - SQL Query - Timestamp - python

Can anyone turn me to a tutorial, code or some kind of resource that will help me out with the following problem.
I have a table in a mySQL database. It contains an ID, Timestamp, another ID and a value. I'm passing it the 'main' ID which can uniquely identify a piece of data. However, I want to do a time search on this piece of data(therefore using the timestamp field). Therefore what would be ideal is to say: between the hours of 12 and 1, show me all the values logged for ID = 1987.
How would I go about querying this in Django? I know in mySQL it'd be something like less than/greater than etc... but how would I go about doing this in Django? i've been using Object.Filter for most of database handling so far. Finally, I'd like to stress that I'm new to Django and I'm genuinely stumped!

If the table in question maps to a Django model MyModel, e.g.
class MyModel(models.Model):
...
primaryid = ...
timestamp = ...
secondaryid = ...
valuefield = ...
then you can use
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=<min_timestamp>
).exclude(
timestamp__gt=<max_timestamp>
).values_list('valuefield', flat=True)
This selects entries with the primaryid 1987, with timestamp values between <min_timestamp> and <max_timestamp>, and returns the corresponding values in a list.
Update: Corrected bug in query (filter -> exclude).

I don't think Vinay Sajip's answer is correct. The closest correct variant based on his code is:
MyModel.objects.filter(
primaryid=1987
).exclude(
timestamp__lt=min_timestamp
).exclude(
timestamp__gt=max_timestamp
).values_list('valuefield', flat=True)
That's "exclude the ones less than the minimum timestamp and exclude the ones greater than the maximum timestamp." Alternatively, you can do this:
MyModel.objects.filter(
primaryid=1987
).filter(
timestamp__gte=min_timestamp
).exclude(
timestamp__gte=max_timestamp
).values_list('valuefield', flat=True)
exclude() and filter() are opposites: exclude() omits the identified rows and filter() includes them. You can use a combination of them to include/exclude whichever you prefer. In your case, you want to exclude() those below your minimum time stamp and to exclude() those above your maximum time stamp.
Here is the documentation on chaining QuerySet filters.

Related

Is it possible to use queryset in the FROM clause

I have a model for user's points collection:
class Rating(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='rating')
points = models.IntegerField()
Each user could have multiple records in this model. I need to calculate a rank of each user by sum of collected points. For the listing it's easy:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).order_by('-total_points')
But how to get rank for a single user by his user_id? I added annotation with numbers of rows:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).annotate(
rank=Window(
expression=RowNumber(),
order_by=[F('total_points').desc()]
)
)
it really added correct ranking numbers, but when I try to get a single user by user_id it returns a row with rank=1. It's because the filter condition goes to the WHERE clause and there is a single row with the number 1. I mean this:
Rating.objects.values('user__username').annotate(
total_points=Sum('points')
).annotate(
rank=Window(
expression=RowNumber(),
order_by=[F('total_points').desc()]
)
).filter(user_id=1)
I got the SQL query of this queryset (qs.query) like
SELECT ... FROM rating_rating WHERE ...
and inserted it into another SQL query as "rank_table" and added a condition into the outside WHERE clause:
SELECT * FROM (SELECT ... FROM rating_rating WHERE ...) AS rank_table WHERE user_id = 1;
and executed within the MySQL console. And this works exactly as I need. The question is: how to implement the same using Django ORM?
I have one solution to get what I need. I could add another field to mark records as "correct" or "incorrect" user, sort result by this field and then get the first row:
qs.annotate(
required_user=Case(
When(user_id=1, then=1),
default=0,
output_field=IntegerField(),
)
).order_by('-required_user').first()
This works. But SELECT within another SELECT seems more elegant and I would like to know is it possible with Django.
somehow someone just recently asked something about filtering on windows functions. While what you want is basically subquery (select in select), using annotation with the window function is not supported :
https://code.djangoproject.com/ticket/28333 because the annotated fields will inside the subquery :'(. One provides raw sql with query_with_params, but it is not really elegant.

Django / Postgres Group By Aggregate

I've have an issue regarding queries with a group by clause.
Lets assume I have the following Django-Model:
class SomeModel(models.Model):
date = models.DateField()
value = models.FloatField()
relation = models.ForeignKey('OtherModel')
If I want to do a query where I group SomeModel instances by OtherModel and annotate the latest date:
SomeModel.objects.values('relation').annotate(Max('date'))
This is all great, but as soon as I want to add a filter on the already annotated queryset I am getting nowhere:
SomeModel.objects.values('relation').annotate(Max('date')).filter(value__gt=0)
This would indeed filter out all the value != 0, however I only want it after the annotations took place. If the latest date of a relation has the value 0, I want it to be filtered out!
You need to add the value in the annotate filed and then you can filter over it. Your ORM query becomes
SomeModel.objects.values('relation').annotate(Max('date'), value=F('value')).filter(value__gt=0)
This should give the value you require

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?
This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)
I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

Django postgres order_by distinct on field

We have a limitation for order_by/distinct fields.
From the docs: "fields in order_by() must start with the fields in distinct(), in the same order"
Now here is the use case:
class Course(models.Model):
is_vip = models.BooleanField()
...
class CourseEvent(models.Model):
date = models.DateTimeField()
course = models.ForeignKey(Course)
The goal is to fetch the courses, ordered by nearest date but vip goes first.
The solution could look like this:
CourseEvent.objects.order_by('-course__is_vip', '-date',).distinct('course_id',).values_list('course')
But it causes an error since the limitation.
Yeah I understand why ordering is necessary when using distinct - we get the first row for each value of course_id so if we don't specify an order we would get some arbitrary row.
But what's the purpose of limiting order to the same field that we have distinct on?
If I change order_by to something like ('course_id', '-course__is_vip', 'date',) it would give me one row for course but the order of courses will have nothing in common with the goal.
Is there any way to bypass this limitation besides walking through the entire queryset and filtering it in a loop?
You can use a nested query using id__in. In the inner query you single out the distinct events and in the outer query you custom-order them:
CourseEvent.objects.filter(
id__in=CourseEvent.objects\
.order_by('course_id', '-date').distinct('course_id')
).order_by('-course__is_vip', '-date')
From the docs on distinct(*fields):
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

Django: Order by evaluation of whether or not a date is empty

In Django, is it possible to order by whether or not a field is None, instead of the value of the field itself?
I know I can send the QuerySet to python sorted() but I want to keep it as a QuerySet for subsequent filtering. So, I'd prefer to order in the QuerySet itself.
For example, I have a termination_date field and I want to first sort the ones without a termination_date, then I want to order by a different field, like last_name, first_name.
Is this possible or am I stuck using sorted() and then having to do an entire new Query with the included ids and run sorted() on the new QuerySet? I can do this, but would prefer not to waste the overhead and use the beauty of QuerySets that they don't run until evaluated.
Translation, how can I get this SQL from Django assuming my app is employee, my model is Employee and it has three fields 'first_name (varchar)', 'last_name (varchar)', and 'termination_date (date)':
SELECT
"employee_employee"."last_name",
"employee_employee"."first_name",
"employee_employee"."termination_date"
FROM "employee_employee"
ORDER BY
"employee_employee"."termination_date" IS NOT NULL,
"employee_employee"."last_name",
"employee_employee"."first_name"
You should be able to order by query expressions, like this:
from django.db.models import IntegerField, Case, Value, When
MyModel.objects.all().order_by(
Case(
When(some_field=None, then=Value(1)),
default=Value(0),
output_field=IntegerField(),
).asc(),
'some_other_field'
)
I cannot test here so it might require a bit a fiddling around, but this should put rows that have a NULL some_field after those that have a some_field. And each set of rows should be sorted by some_other_field.
Granted, the CASE/WHEN is be a bit more cumbersome that what you put in your question, but I don't know how to get Django ORM to output that. Maybe someone else will have a better answer.
Spectras' answer works fine, but it only orders your records by 'null or not'. There is a shorter way that allows you to put empty dates wherever you want them in your date ordering - Coalesce:
from django.db.models import Value
from django.db.models.functions import Coalesce
wayback = datetime(year=1, month=1, day=1) # or whatever date you want
MyModel.objects
.annotate(null_date=Coalesce('date_field', Value(wayback)))
.order_by('null_date')
This will essentially sort by the field 'date_field' with all records with date_field == None will be in the order as if they had the date wayback. This works perfectly with PostgreSQL, but might need some raw sql casting in MySQL as described in the documentation.

Categories

Resources