Using COUNT(*) OVER() in current query with SQLAlchemy over PostgreSQL - python

In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.

So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.

Related

HOW TO DO OBJECT RELATIONAL MAPPING(ORM) FOR THIS QUERY?

I am trying to convert my sql query into python code by using flask-SQLAlchemy.
I am stuck at one query.
there are tables name flights and passengers where flight_id(passengers) is foreign key and id(flights) is Primary key.
My sql-query is:
SELECT * FROM flights JOIN
passengers ON flights.id=passengers.flight_id;
please help me converting it in python
try this:
db.session.query(flights,passengers).filter
(flights.id==passenger.flight_id).all()
Assuming your models look something like these (this is pure sqlalchemy, rather than flask-sqlalchemy):
import sqlalchemy as sa
from sqlalchemy import orm
class Flight(Base):
__tablename__ = 'flights'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(64))
passengers = orm.relationship('Passenger', back_populates='flight')
class Passenger(Base):
__tablename__ = 'passengers'
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(64))
flight_id = sa.Column(sa.Integer, sa.ForeignKey('flights.id'))
flight = orm.relationship('Flight', back_populates='passengers')
then this query:
session.query(Flight).join(Passenger)
will generate this SQL when executed:
SELECT flights.id AS flights_id, flights.name AS flights_name
FROM flights JOIN passengers ON flights.id = passengers.flight_id
and will return all the flight objects which have at least one passenger.
By contrast, this query:
session.query(Flight, Passenger).filter(Flight.id == Passenger.flight_id)
generates this SQL:
SELECT flights.id AS flights_id, flights.name AS flights_name, passengers.id AS passengers_id, passengers.name AS passengers_name, passengers.flight_id AS passengers_flight_id
FROM flights, passengers
WHERE flights.id = passengers.flight_id
which returns a (Flight, Passenger) tuple for each passenger that has a flight.
See the Sqlalchemy ORM tutorial for creating relationships and querying with joins.

Executemany SELECT queries with psycopg2

I have a large postgresql DB of users that I connect with using psycopg2. I need to retrieve (SELECT) the information of a specific large subset of users (>200). I am provided with a list of ids and I need to return the age of each of those users. I put down a working solution:
conn = psycopg2.connect("dbname= bla bla bla")
cur = conn.cursor()
for user_id in interesting_users:
qr = "SELECT age FROM users WHERE country_code = {0} AND user_id = {1}".format(1, user_id)
cur.execute(qr)
fetched_row = cur.fetchall()
#parse results
This solution works fine, however it is not ideal when the length of interesting_users is large. I am looking for a more efficient approach than executing multiple queries. One solution would be to create a single query by appending all the user ids:
for user_id in interesting_users:
query += "OR user_id {0}".format(user_id)
But I was hoping for a more elegant solution.
I found that psycopg2 provides the executemany() method. So, I tried to apply to my problem. However, I can't manage to make it work. This:
cur.executemany("SELECT age FROM users WHERE country_code = %s AND user_id = %s",[(1, user_id) for user_id in interesting_users])
r = cur.fetchall()
returns:
r = cur.fetchall()
psycopg2.ProgrammingError: no results to fetch
So, can executemany() be used for a SELECT statement? If yes, what's wrong with my code? If no, how can I perform multiple SELECT queries at once?
Note: ids in interesting_users have no order so I can't use something like WHERE id < ...
SOLUTION:
query = "SELECT age FROM users WHERE country_code = {0} AND user_id IN ({1});".format(1, ",".join(map(str, interesting_users)))
cur.execute(query)
fetched_rows = cur.fetchall()
executemany works only with INSERT, not SELECT. Use IN:
cur.executemany("SELECT age FROM users WHERE country_code = %s AND user_id IN ({})".format(','.join(['%s'] * len(interesting_users)),
[1] + interesting_users)
r = cur.fetchall()

Sqlalchemy complex NOT IN another table query

First of all, i would like to apologize as my SQL knowledge level is still very low. Basically the problem is the following: I have two distinct tables, no direct relationship between them, but they share two columns: storm_id and userid.
Basically, i would like to query all posts from storm_id, that are not from a banned user and some extra filters.
Here are the models:
Post
class Post(db.Model):
id = db.Column(db.Integer, primary_key = True)
...
userid = db.Column(db.String(100))
...
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Banneduser
class Banneduser(db.Model):
id = db.Column(db.Integer, primary_key=True)
sn = db.Column(db.String(60))
userid = db.Column(db.String(100))
name = db.Column(db.String(60))
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Both Post and Banneduser are another table (Storm) children. And here is the query i am trying to output. As you can see, i am trying to filter:
verified posts
by descending order
with a limit (i put it apart from the query as the elif has other filters)
# we query banned users id
bannedusers = db.session.query(Banneduser.userid)
# we do the query except the limit, as in the if..elif there are more filtering queries
joined = db.session.query(Post, Banneduser)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
# here comes the trouble
.filter(~Post.userid.in_(bannedusers))\
.order_by(Post.timenow.desc())\
try:
if contentsettings.filterby == 'all':
posts = joined.limit(contentsettings.maxposts)
print((posts.all()))
# i am not sure if this is pythonic
posts = [item[0] for item in posts]
return render_template("stream.html", storm=storm, wall=posts)
elif ... other queries
I got two problems, one basic and one underlying problem:
1/ .filter(~Post.userid.in_(bannedusers))\ gives one output EACH TIME post.userid is not in bannedusers, so i get N repeated posts. I try to filter this with distinct, but it does not work
2/ Underlying problem: i am not sure if my approach is the correct one (the ddbb model structure/relationship plus the queries)
Use SQL EXISTS. Your query should be like this:
db.session.query(Post)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
.filter(~ exists().where(Banneduser.storm_id==Post.storm_id))\
.order_by(Post.timenow.desc())

Make a SQL Alchemy Expression return value from a different Table's column for filtering

How can you use the sql alchemy expression language to make sqlalchemy's filter_by look through a hybrid property that returns a value from a column in another table?
Example Code
(Using flask-sqlalchemy so you'll see stuff like Device.query.get(203)
class Service(Model):
id = Column(Integer)
client_id = Column(Integer)
class Device(Mode):
id = Column(Integer)
owner = Column(Integer)
#hybrid_property
def client_id(self):
return Service.query.get(self.owner).client_id
#client_id.expression
def client_id(self):
# ???
# Make this return a useful query
Device.query.filter(client_id=124)
SQL QUERY
This is the SQL that returns the proper values.
SELECT service.clientid FROM device INNER JOIN service ON device.owner = service.id;
Not the desired sql, but should produce the same result:
#client_id.expression
def client_id(cls):
return (
select([Service.client_id])
.where(Service.id == cls.owner)
.as_scalar()
)

GeoAlchemy2: find a set of Geometry items that doesn't intersect with a separate set

I have a postgis database table called tasks, mapped to a python class Task using geoalchemy2/sqlalchemy - each entry has a MultiPolygon geometry and an integer state. Collectively, entries in my database cover a geographic region. I want to select a random entry of state=0 which is not geographically adjacent to any entry of state=1.
Here's code which selects a random entry of state=0:
class Task(Base):
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, index=True)
geometry = Column(Geometry('MultiPolygon', srid=4326))
state = Column(Integer, default=0)
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0)
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
So far so good. But now, how to make sure that they are not adjacent to another set of entries?
Geoalchemy has a function ST_Union which can unify geometries, and ST_Disjoint which detects if they intersect or not. SO it seems I should be able to select items of state=1, union them into a single geometry, and then filter down my original query (above) to only keep the items that are disjoint to it. But I can't find a way to express this in geoalchemy. Here's one way I tried:
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0) \
.filter(Task.geometry.ST_Disjoint(session.query( \
Task.geometry.ST_Union()).filter_by(state=1)))
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
and it yields an error like this:
ProgrammingError: (ProgrammingError) subquery in FROM must have an alias
LINE 3: FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"...
^
HINT: For example, FROM (SELECT ...) [AS] foo.
'SELECT count(*) AS count_1
FROM (SELECT tasks.id AS tasks_id
FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s)
WHERE tasks.state = %(state_2)s AND ST_Disjoint(tasks.geometry, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s))) AS anon_1' {'state_1': 1, 'state_2': 0}
A shot in the dark as I don't have the setup to test it :
This seems to be related to SQLAlchemy's subqueries more than GeoAlchemy, try to add .subquery() at the end of your subquery to generate an alias (cf : http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html#using-subqueries)
Edit :
Still using info from the linked tutorial, I think this may work :
state1 = session.query(
Task.geometry.ST_Union().label('taskunion')
).filter_by(state=1).subquery()
taskgetter = session.query(Task)\
.filter_by(state=0)
.filter(Task.geometry.ST_Disjoint(state1.c.taskunion))
Add a label to the column you're creating on your subquery to reference it in your super-query.

Categories

Resources