How to create SQL Pypika Query with "Min()" - python

I am trying to create a Pypika Query which uses the MIN('') function of SQL. Pypika supports the function but I don't know how to use it.
Basically I want to create this SQL statement in Pypika:
select
"ID","Car","Road","House"
from "thingsTable"
where "ID" not in
(
select MIN("ID")
from "thingsTable"
GROUP BY
"Car","Road","House"
)
order by "ID"
I have tried something like this:
from pypika import Query, Table, Field, Function
query = Query.from_(table).select(min(table.ID)).groupby(table.Car, table.Road, table.House)
And variations of it, but can't figure out how to use this function. There are not a lot of examples around.
Thanks in advance.

Try this one
the code based on Selecting Data with pypika
from pypika import functions as fn
tbl = Table('thingsTable')
q = Query.from_(tbl).where(
tbl.ID.isin(tbl.groupby(tbl.Car, tbl.Road, tbl.House).select(fn.Min(tbl.Id)))
).select(
tbl.Id,tbl.Car,tbl.House,tbl.Road
).orderby(tbl.Id)

Related

Python peewee: How to select distinct values on one column before a join?

I try to join a second table (PageLikes) on a first Table (PageVisits) after selecting only distinct values on one column of the first table with the python ORM peewee.
In pure SQL I can do this:
SELECT DISTINCT(pagevisits.visitor_id), pagelikes.liked_item FROM pagevisits
INNER JOIN pagelikes on pagevisits.visitor_id = pagelikes.user_id
In peewee with python I have tried:
query = (Page.select(
fn.Distinct(Pagevisits.visitor_id),
PageLikes.liked_item)
.join(PageLIkes)
This gives me an error:
distinct() takes 1 positional argument but 2 were given
The only way I can and have used distinct with peewee is like this:
query = (Page.select(
Pagevisits.visitor_id,
PageLikes.liked_item)
.distinct()
which does not seem to work for my scenario.
So how can I select only distinct values in one table based on one column before I join another table with peewee?
I don't believe you should be encountering an error using fn.DISTINCT() in that way. I'm curious to see the full traceback. In my testing locally, I have no problems running something like:
query = (PageVisits
.select(fn.DISTINCT(PageVisits.visitor_id), PageLikes.liked_item)
.join(PageLikes))
Which produces SQL equivalent to what you're after. I'm using the latest peewee code btw.
As Papooch suggested, calling distinct on the Model seems to work:
distinct_visitors = (Pagevisits
.select(
Pagevisits.visitor_id.distinct().alias("visitor")
)
.where(Pagevisits.page_id == "Some specifc page")
.alias('distinct_visitors')
)
query = (Pagelikes
.select(fn.Count(Pagelikes.liked_item),
)
.join(distinct_visitors, on=(distinct_visitors.c.visitor = Pagelikes.user_id))
.group_by(Pagelikes.liked_item)
)

Selecting the first item of an ARRAY with PostgreSQL/SqlAlchemy

Trying to move some queries I run daily into an automated script. I have one in Postgres like the below:
SELECT regexp_split_to_array(col1, "|")[1] AS item, COUNT(*) AS itemcount FROM Tabel1 GROUP BY item ORDER BY itemcount
In SqlAlchemy I have this:
session.query((func.regexp_split_to_array(model.table1.col1, "|")[1]).label("item"), func.count().label("itemcount")).group_by("item").order_by("itemcount")
Python can't "get_item" since it's not actually a collection. I've looked through the docs and can't seem to find something that would let me do this without running raw SQL using execute (which I can do and works, but was looking for a solution for next time).
SQLAlchemy does support indexing with [...]. If you declare a type of a column that you have to be of type postgresql.ARRAY, then it works:
table2 = Table("table2", meta, Column("col1", postgresql.ARRAY(String)))
q = session.query(table2.c.col1[1])
print(q.statement.compile(dialect=postgresql.dialect()))
# SELECT table2.col1[%(col1_1)s] AS anon_1
# FROM table2
The reason why your code doesn't work is that SQLAlchemy does not know that func.regexp_split_to_array(...) returns an array, since func.foo produces a generic function for convenience. To make it work, we need to make sure SQLAlchemy knows the return type of the function, by specifying the type_ parameter:
q = session.query(func.regexp_split_to_array(table1.c.col1, "|", type_=postgresql.ARRAY(String))[1].label("item"))
print(q.statement.compile(dialect=postgresql.dialect()))
# SELECT (regexp_split_to_array(table1.col1, %(regexp_split_to_array_1)s))[%(regexp_split_to_array_2)s] AS item
# FROM table1

How to make a subquery in sqlalchemy

SELECT *
FROM Residents
WHERE apartment_id IN (SELECT ID
FROM Apartments
WHERE postcode = 2000)
I'm using sqlalchemy and am trying to execute the above query. I haven't been able to execute it as raw SQL using db.engine.execute(sql) since it complains that my relations doesn't exist... But I succesfully query my database using this format: session.Query(Residents).filter_by(???).
I cant not figure out how to build my wanted query with this format, though.
You can create subquery with subquery method
subquery = session.query(Apartments.id).filter(Apartments.postcode==2000).subquery()
query = session.query(Residents).filter(Residents.apartment_id.in_(subquery))
I just wanted to add, that if you are using this method to update your DB, make sure you add the synchronize_session='fetch' kwarg. So it will look something like:
subquery = session.query(Apartments.id).filter(Apartments.postcode==2000).subquery()
query = session.query(Residents).\
filter(Residents.apartment_id.in_(subquery)).\
update({"key": value}, synchronize_session='fetch')
Otherwise you will run into issues.

SQLAlchemy: How to select max from several tables

I am starting to use sqlalchemy in an ORM way rather than in an SQL way. I have been through the doc quickly but I don't find how to easily do the equivalent of SQL:
select max(Table1.Date) from Table1, Table2
where...
I can do:
session.query(Table1, Table2)
...
order_by(Table1.c.Date.desc())
and then select the first row but it must be quite inefficient. Could anyone tell me what is the proper way to select the max?
Many thanks
Ideally one would know the other parts of the query. But without any additional information, below should do it
import sqlalchemy as sa
q = (
session
.query(sa.func.max(Table1.date))
.select_from(Table1, Table2) # or any other `.join(Table2)` would do
.filter(...)
.order_by(Table1.c.Date.desc())
)

sqlalchemy: get max/min/avg values from a table

I have this query:
mps = (
session.query(mps) .filter_by(idc = int(c.idc))
.filter_by(idmp = int(m.idmp))
.group_by(func.day(mps.tschecked))
).all()
My problem is, that I don't know how to extract (with sqlalchemy) the max/min/avg value from a table...
I find this: Database-Independent MAX() Function in SQLAlchemy
But I don't know where to use this func.max/min/avg...
Can someone tell me how to do this? Can you give me an example?
The following functions are available with from sqlalchemy import func:
func.min
func.max
func.avg
Documentation is available here.
You can use them i.e. in the query() method.
Example:
session.query(self.stats.c.ID, func.max(self.stats.c.STA_DATE))
(just like you use aggregate functions in plain SQL)
Or just use an order_by() and select the first or last element. . .

Categories

Resources