SELECT DISTINCT ON (geometry column) equivalent with GeoDjango

SELECT DISTINCT ON (geometry column) equivalent with GeoDjango - python

I'm trying to create a Django query that will do the equivalent of the following PostgreSQL/PostGIS query:
SELECT DISTINCT ON (site) * FROM some_table;
site is a POINT type geometry column. How can this be done?
Basically, many of the records in some_table share the same POINT geometry; I just want a list of the geometries with no duplicates. I don't care about the rest of the some_table columns.
The rest of my query is pretty simple; it looks something like this:
qs = models.SomeTable.objects.filter(foo='bar', site__contained=some_polygon)
Side note:
The 'manager' for SomeTable (SomeTable.objects) is a django.contrib.gis.db.models.GeoManger type. I don't know if that helps at all.
Relevant version info:
Django 1.3
PostgreSQL 9.1.1
PostGIS 1.5.3

I figured it out. I had overlooked distinct: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Here's the django query that does exactly what I need:
qs = models.SomeTable.objects.filter(foo='bar', site__contained=some_polygon).values('site').distinct()

Related

How to write subquery in django

Is it possible to make following sql query in django
select * from (
select * from users
) order by id
It is just minimal example. I have a long subquery instead of select * from users. But I can't understand how insert it into subquery.
UPDATED:
Subquery from doc doesn't suits because it build following request
SELECT "post"."id", (
SELECT U0."email"
FROM "comment" U0
WHERE U0."post_id" = ("post"."id")
ORDER BY U0."created_at" DESC LIMIT 1
) AS "newest_commenter_email" FROM "post"
and this subquery can return only one value (.values('email')).
Construction select (subquery) as value from table instead of select value from (subquery)

i would use a python connector to postgreSQL - http://www.postgresqltutorial.com/postgresql-python/query/, that is what i do for the mysql, thought did not try for the postgresql

Making a subquery is essentially setting up two queries and using one query to "feed" another:
from django.db.models import Subquery
all_users = User.objects.all()
User.objects.annotate(the_user=Subquery(all_users.values('email')[:1]))
This is more or less the same as what you provided. You can get about as complicated as you'd like here but the best source to get going with subqueries is the docs

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?

This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)

I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

SQLAlchemy: How to select max from several tables

I am starting to use sqlalchemy in an ORM way rather than in an SQL way. I have been through the doc quickly but I don't find how to easily do the equivalent of SQL:
select max(Table1.Date) from Table1, Table2
where...
I can do:
session.query(Table1, Table2)
...
order_by(Table1.c.Date.desc())
and then select the first row but it must be quite inefficient. Could anyone tell me what is the proper way to select the max?
Many thanks

Ideally one would know the other parts of the query. But without any additional information, below should do it
import sqlalchemy as sa
q = (
session
.query(sa.func.max(Table1.date))
.select_from(Table1, Table2) # or any other `.join(Table2)` would do
.filter(...)
.order_by(Table1.c.Date.desc())
)

Django query with AVG and GROUP BY

My Django-foo isn't quite up to par to translate certain raw sql into the ORM.
Currently I am executing:
SELECT avg(<value_to_be_averaged>), <id_to group_on>
FROM <table_name>
WHERE start_time >= <timestamp>
GROUP BY <id_to group_on>;
In Django I can do:
Model.objects.filter(start_time__gte=<timestamp>).aggregate(Avg('<value_to_be_averaged>'))
but that is for all objects in the query and doesn't return a query set that is grouped by the id like in the raw SQL above. I've been fiddling with .annotate() but haven't made much progress. Any help would be appreciated!

Can I get table names along with column names using .description() in Python's DB API?

I am using Python with SQLite 3. I have user entered SQL queries and need to format the results of those for a template language.
So, basically, I need to use .description of the DB API cursor (PEP 249), but I need to get both the column names and the table names, since the users often do joins.
The obvious answer, i.e. to read the table definitions, is not possible -- many of the tables have the same column names.
I also need some intelligent behaviour on the column/table names for aggregate functions like avg(field)...
The only solution I can come up with is to use an SQL parser and analyse the SELECT statement (sigh), but I haven't found any SQL parser for Python that seems really good?
I haven't found anything in the documentation or anyone else with the same problem, so I might have missed something obvious?
Edit: To be clear -- the problem is to find the result of an SQL select, where the select statement is supplied by a user in a user interface. I have no control of it. As I noted above, it doesn't help to read the table definitions.

Python's DB API only specifies column names for the cursor.description (and none of the RDBMS implementations of this API will return table names for queries...I'll show you why).
What you're asking for is very hard, and only even approachable with an SQL parser...and even then there are many situations where even the concept of which "Table" a column is from may not make much sense.
Consider these SQL statements:
Which table is today from?
SELECT DATE('now') AS today FROM TableA FULL JOIN TableB
ON TableA.col1 = TableB.col1;
Which table is myConst from?
SELECT 1 AS myConst;
Which table is myCalc from?
SELECT a+b AS myCalc FROM (select t1.col1 AS a, t2.col2 AS b
FROM table1 AS t1
LEFT OUTER JOIN table2 AS t2 on t1.col2 = t2.col2);
Which table is myCol from?
SELECT SUM(a) as myCol FROM (SELECT a FROM table1 UNION SELECT b FROM table2);
The above were very simple SQL statements for which you either have to make up a "table", or arbitrarily pick one...even if you had an SQL parser!
What SQL gives you is a set of data back as results. The elements in this set can not necessarily be attributed to specific database tables. You probably need to rethink your approach to this problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SELECT DISTINCT ON (geometry column) equivalent with GeoDjango - python

Related

How to write subquery in django

Django ORM: Get latest record for distinct field

SQLAlchemy: How to select max from several tables

Django query with AVG and GROUP BY

Can I get table names along with column names using .description() in Python's DB API?

Categories

Resources