Using CTE in Python with Postgresql and psycopg2

Using CTE in Python with Postgresql and psycopg2 - python

I'm trying to create a query using CTE where I am creating 2 subtables and then the select statement. I believe the following syntax would work for full SQL, but it isn't working in this situation using psycopg2 in Python.
The idea is that I should be able to pull a query that shows the Name of all events (E.Event), the E.EDate, the E.ETemp and SmithTime. So it should have the full list of Events but the time column only shows times recorded for Smith (not in all Events).
query = ("""WITH cte AS (SELECT E.Event, O.Time AS "SmithTime"
FROM event E JOIN outcome O ON E.EventID = O.EventID
JOIN name N ON N.ID = O.ID
WHERE Name = 'Smith'),
WITH cte2 AS (SELECT E.Event, O.Time, E.EDate, E.ETemp
FROM event E JOIN outcome O ON E.EventID = O.EventID
JOIN name N ON N.ID = O.ID)
SELECT cte2.Event, cte2.EDate, cte2.ETemp, cte.SmithTime
FROM cte JOIN cte2 ON cte.Event = cte2.Event
ORDER BY 2 ASC""")
query = pd.read_sql(query, conn)
print(query)
This is just my latest iteration, I'm not sure what else to try. It is currently generating a DatabaseError:
DatabaseError: Execution failed on sql 'WITH cte AS (SELECT E.Event, O.Time AS "SmithTime"
FROM event E JOIN outcome O ON E.EventID = O.EventID
JOIN name N ON N.ID = O.ID
WHERE Name = 'Smith'),
WITH cte2 AS (SELECT E.Event, O.Time, E.EDate, E.ETemp
FROM event E JOIN outcome O ON E.EventID = O.EventID
JOIN name N ON N.ID = O.ID)
SELECT cte2.Event, cte2.EDate, cte2.ETemp, cte.SmithTime
FROM cte JOIN cte2 ON cte.Event = cte2.Event
ORDER BY 2 ASC': syntax error at or near "WITH"
LINE 6: WITH cte2 AS (SELECT E.Event, O.Time, E.EDate, E.ETemp

I have no idea whether or not your current query even be logically correct. But we can get around the SQL error by inlining the common table expressions:
SELECT cte2.Event, cte2.EDate, cte2.ETemp, cte.SmithTime
FROM (
SELECT E.Event, O.Time AS "SmithTime"
FROM event E
INNER JOIN outcome O ON E.EventID = O.EventID
INNER JOIN name N ON N.ID = O.ID
WHERE Name = 'Smith'
) cte
INNER JOIN (
SELECT E.Event, O.Time, E.EDate, E.ETemp
FROM event E
INNER JOIN outcome O ON E.EventID = O.EventID
INNER JOIN name N ON N.ID = O.ID
) cte2
ON cte.Event = cte2.Event
ORDER BY 2;

It's an SQL syntax error, nothing specific to psycopg2.
There's only one WITH in a CTE query. It should be WITH cte AS (...), cte2 AS (...) SELECT ..., not WITH cte AS (...), WITH cte2 AS (...) SELECT ....

Related

Peewee: Relation does not exist when querying with CTE

I want to query the count of bookings for a given event- if the event has bookings, I want to pull the name of the "first" person to book it.
The table looks something like: Event 1-0 or Many Booking, Booking.attendee is a 1:1 with User Table. In pure SQL I can easily do what I want by using Window Functions + CTE. Something like:
WITH booking AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY b.event_id ORDER BY b.created DESC) rn,
COUNT(*) OVER (PARTITION BY b.event_id) count
FROM
booking b JOIN "user" u on u.id = b.attendee_id
WHERE
b.status != 'cancelled'
)
SELECT e.*, a.vcount, a.first_name, a.last_name FROM event e LEFT JOIN attendee a ON a.event_id = e.id WHERE (e.seats > COALESCE(a.count, 0) and (a.rn = 1 or a.rn is null) and e.cancelled != true;
This gets everything I want. When I try to turn this into a CTE and use Peewee however, I get errors about: Relation does not exist.
Not exact code, but I'm doing something like this with some dynamic where clauses for filtering based on params.
cte = (
BookingModel.select(
BookingModel,
peewee.fn.ROW_NUMBER().over(partition_by=[BookingModel.event_id], order_by=[BookingModel.created.desc()]).alias("rn),
peewee.fn.COUNT(BookingModel.id).over(partition_by=[BookingModel.event_id]).alias("count),
UserModel.first_name,
UserModel.last_name
)
.join(
UserModel,
peewee.JOIN.LEFT_OUTER,
on(UserModel.id == BookingModel.attendee)
)
.where(BookingModel.status != "cancelled")
.cte("test")
query = (
EventModel.select(
EventModel,
UserModel,
cte.c.event_id,
cte.c.first_name,
cte.c.last_name,
cte.c.rn,
cte.c.count
)
.join(UserModel, on=(EventModel.host == UserModel.id))
.switch(EventModel)
.join(cte, peewee.JOIN.LEFT_OUTER, on=(EventModel.id == cte.c.event_id))
.where(where_clause)
.order_by(EventModel.start_time.asc(), EventModel.id.asc())
.limit(10)
.with_cte(cte)
After reading the docs twenty+ times, I can't figure out what isn't right about this. It looks like the samples... but the query will fail, because "relation "test" does not exist". I've played with "columns" being explicitly defined, but then that throws an error that "rn is ambiguous".
I'm stuck and not sure how I can get Peewee CTE to work.

Can I intersect two Queryset of same table but with different query?

minimum_likes_queryset = PostInLanguages.objects.annotate(likes=Count('like_model', distinct=True)).filter(likes__gte=minimum_likes)
recouched_posts_ids = PostInLanguages.objects.values('parent_post_language_id').annotate(recouch_count=Count('parent_post_language_id')).filter(recouch_count__gte=minimum_recouch, is_post_language=False).order_by().values_list('parent_post_language_id', flat=True)
recouched_post_queryset = PostInLanguages.objects.filter(id__in=recouched_posts_ids)
this is the query
SELECT "api_postinlanguages"."id", "api_postinlanguages"."post_in_language_uuid", "api_postinlanguages"."post_id", "api_postinlanguages"."language_id", "api_postinlanguages"."is_post_language", "api_postinlanguages"."parent_post_language_id", "api_postinlanguages"."description", "api_postinlanguages"."created_on", COUNT(DISTINCT "api_postlanguagelike"."id") AS "likes" FROM "api_postinlanguages" LEFT OUTER JOIN "api_postlanguagelike" ON ("api_postinlanguages"."id" = "api_postlanguagelike"."post_language_id") GROUP BY "api_postinlanguages"."id" HAVING COUNT(DISTINCT "api_postlanguagelike"."id") >= 1
SELECT "api_postinlanguages"."id", "api_postinlanguages"."post_in_language_uuid", "api_postinlanguages"."post_id", "api_postinlanguages"."language_id", "api_postinlanguages"."is_post_language", "api_postinlanguages"."parent_post_language_id", "api_postinlanguages"."description", "api_postinlanguages"."created_on" FROM "api_postinlanguages" WHERE "api_postinlanguages"."id" IN (SELECT U0."parent_post_language_id" FROM "api_postinlanguages" U0 WHERE NOT U0."is_post_language" GROUP BY U0."parent_post_language_id" HAVING COUNT(U0."parent_post_language_id") >= 1)
this is the exception
An exception occurred: column "api_postinlanguages.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT COUNT(*) FROM (SELECT "api_postinlanguages"."id" AS "...
^

How to query multiple tables using join in sqlalchemy

select count(DISTINCT(a.cust_id)) as count ,b.code, b.name from table1 as a inner join table2 as b on a.par_id = b.id where a.data = "present" group by a.par_id order by b.name asc;
How to write this in sqlalchemy to get as expected results
The above query which is writen in sql should be right in sqlalchemy.
Thanks for inputs

Hope this works...
session.query(
func.count(distinct(table1.cust_id)).label('count'),
table2.code,
table2.name
).join(
table2,
table1.par_id == table2.id
).filter(
table1.data == "present"
).group_by(
table1.par_id
).order_by(
table2.name.asc()
).all()

Building a psycopg2 query using a list of the column names to fetch

A rather simple question but for which we surprisingly didn't found a solution.
Here is my current code, for executing a simple SQL query on a PostgreSQL database from Python 3.6.9 using psycopg2 ('2.9.1 (dt dec pq3 ext lo64)'):
import psycopg2
myid = 100
fields = ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', )
sql_query = ("SELECT " + ', '.join(fields) + " FROM product p "
"INNER JOIN owner o ON p.id = o.product_id "
"WHERE p.id = {} AND (o.dateof_purchase IS NOT NULL "
"OR o.state = 'checked_out' );"
).format(myid)
try:
with psycopg2.connect(**DB_PARAMS) as conn:
with conn.cursor(cursor_factory=DictCursor) as curs:
curs.execute(sql_query, )
row = curs.fetchone()
except psycopg2.Error as error:
raise ValueError(f"ERR: something went wrong with the query :\n{sql_query}") from None
We're more and more thinking that this is... not good. (awfully bad to be honest).
Therefore, we're trying to use a modern f-string notation:
sql_query = (f"""SELECT {fields} FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = {myid} AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );""")
But then, the query looks like:
SELECT ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', ) FROM ...;
which is not valid in PSQL because 1. of the brackets, and 2. of the single quoted column names.
We'd like to figure out a way to get rid of these.
In between, we went back to the doc and remembered this:
https://www.psycopg.org/docs/usage.html
Ooops! So we refactored it this way:
sql_query = (f"""SELECT %s FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = %s AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );""")
try:
with psycopg2.connect(**DB_PARAMS) as conn:
with conn.cursor(cursor_factory=DictCursor) as curs:
# passing a tuple as it only accept one more argument after the query!
curs.execute(sql_query, (fields, myid))
row = curs.fetchone()
and mogrify() says:
"SELECT ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', ) FROM ...;"
here again, the brackets and the single quotes are causing troubles, but no error is actually raised.
The only thing is that row evaluates to this strange result:
['('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', )']
So, how could we cleverly and dynamically build a psycopg2 query using a list of parameters for column names without neglecting the security?
(A trick could be to fetch all columns and filter them out after... but there are too many columns, some with quiet large amount of data that we don't need, that's why we want to run a query using a precisely defined selection of columns, which may get dynamically extended by some function, otherwise we would have hard-coded these column names of course).
OS: Ubuntu 18.04
PostgreSQL: 13.3 (Debian 13.3-1.pgdg100+1)

The '%s' insertion will try to turn every argument into an SQL string, as #AdamKG pointed out. Instead, you can use the psycopg2.sql module will allow you to insert identifiers into queries, not just strings:
from psycopg2 import sql
fields = ('id', 'name', 'type', 'price', 'warehouse', 'location', )
sql_query = sql.SQL(
"""SELECT {} FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = %s AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );""")
try:
with psycopg2.connect(**DB_PARAMS) as conn:
with conn.cursor(cursor_factory=DictCursor) as curs:
# passing a tuple as it only accept one more argument after the query!
curs.execute(sql_query.format(*[sql.Identifier(field) for field in fields]), (*fields, myid))
row = curs.fetchone()

I finally found a solution. It makes use of map to use a list or a tuple of column names and sql.Literal to use a given id, this is maybe cleaner:
conn = psycopg2.connect(**DB_PARAMS)
myid = 100
# using the simple column identifiers
fields_1 = ('id', 'name', 'type', 'price', 'warehouse', 'location',)
# using the dot notation with the table alias 'p' as the prefix:
fields_2 = ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location',)
sql_query_1 = sql.SQL("""
SELECT {f} FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );"""
).format(
f = sql.SQL(',').join(map(sql.Identifier, fields_1)),
j = sql.Literal(myid)
)
sql_query_2 = sql.SQL("""
SELECT {f} FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );"""
).format(
f = sql.SQL(',').join(map(sql.SQL, fields_2)), # use sql.SQL!
j = sql.Literal(myid)
)
sql_query_2b = sql.SQL("""
SELECT {f} FROM product p
INNER JOIN owner o ON p.id = o.product_id
WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
OR o.state = 'checked_out' );"""
).format(
f = sql.SQL(',').join(map(sql.Identifier, fields_2)), # DON'T use sql.Identifier!
j = sql.Literal(myid)
)
# VALID SQL QUERY:
print(sql_query_1.as_string(conn))
# will print:
# SELECT "id","name","type","price","warehouse","location" FROM product p
# INNER JOIN owner o ON p.id = o.product_id
# WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
# OR o.state = 'checked_out' );
# VALID SQL QUERY:
print(sql_query_2.as_string(conn))
# will print:
# SELECT p.id,p.name,p.type,p.price,p.warehouse,p.location FROM product p
# INNER JOIN owner o ON p.id = o.product_id
# WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
# OR o.state = 'checked_out' );
# /!\ INVALID SQL QUERY /!\:
print(sql_query_2b.as_string(conn))
# will print:
# SELECT "p.id","p.name","p.type","p.price","p.warehouse","p.location" FROM product p
# INNER JOIN owner o ON p.id = o.product_id
# WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
# OR o.state = 'checked_out' );
But because of that:
simple columns names are evaluated correctly when in double quotes, eg. id is equivalent to "id", name is equivalent to "name" for PostgreSQL,
column name, when prefixed with the dot notation using the table alias or identifier, e.g. p.id or product.id instead of just id or "id" will miserably fail with the following error:
UndefinedColumn: column "p.id" does not exist
LINE 1: SELECT "p.id","p.type","p.price","p.warehouse","p.location",...
^
HINT: Perhaps you meant to reference the column "p.id".

sqlalchemy exists() - how to avoid extra From

exists() containing another exists() results in extra From clause.
model.session.query(Table1.id).\
filter(~ exists().\
where(Table2.table1_id==Table1.id).\
where(~ exists().\
where(Table3.contract_id==Table2.contract_id).\
where(Table3.session_id==Table1.session_id))
)
this is generating:
SELECT table1.id AS table1_id FROM table1
WHERE NOT (EXISTS (SELECT * FROM table2
WHERE table2.table1_id = table1.id
AND NOT (EXISTS (SELECT * FROM table3, table1
WHERE table3.contract_id = table2.contract_id
AND table3.session_id = table1.session_id))))
Here, "FROM table1" in the last "exists" is not required because table1 is already in the topmost query. How can I force sqlalchemy not to add this extra "FROM table1"?
What I really want is:
SELECT table1.id AS table1_id FROM table1
WHERE NOT (EXISTS (SELECT * FROM table2
WHERE table2.table1_id = table1.id
AND NOT (EXISTS (SELECT * FROM table3
WHERE table3.contract_id = table2.contract_id
AND table3.session_id = table1.session_id))))
I wonder how to achieve that.
Can somebody help me please?
Using SQLAlchemy 0.7.9.

q = (session.query(Table1.id)
.filter(~exists(
select([Table2.id])
.where(Table2.table1_id == Table1.id)
.where(~exists(
# changing exists to be implicit enables the 'important' below
select([Table3.id])
.where(Table3.contract_id == Table2.contract_id)
.where(Table3.session_id == Table1.session_id)
# this is important
.correlate(Table1)
.correlate(Table2)
))
)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using CTE in Python with Postgresql and psycopg2 - python

It's an SQL syntax error, nothing specific to psycopg2. There's only one WITH in a CTE query. It should be WITH cte AS (...), cte2 AS (...) SELECT ..., not WITH cte AS (...), WITH cte2 AS (...) SELECT ....

Related

Peewee: Relation does not exist when querying with CTE

Can I intersect two Queryset of same table but with different query?

How to query multiple tables using join in sqlalchemy

Building a psycopg2 query using a list of the column names to fetch

sqlalchemy exists() - how to avoid extra From

Categories

Resources