Flask-SQLAlchemy query for count - python

I'am using Flask-SQLAlchemy and i use one-to-many relationships. Two models
class Request(db.Model):
id = db.Column(db.Integer, primary_key = True)
r_time = db.Column(db.DateTime, index = True, default=datetime.utcnow)
org = db.Column(db.String(120))
dest = db.Column(db.String(120))
buyer_id = db.Column(db.Integer, db.ForeignKey('buyer.id'))
sale_id = db.Column(db.Integer, db.ForeignKey('sale.id'))
cost = db.Column(db.Integer)
sr = db.Column(db.Integer)
profit = db.Column(db.Integer)
def __repr__(self):
return '<Request {} by {}>'.format(self.org, self.buyer_id)
class Buyer(db.Model):
id = db.Column(db.Integer, primary_key = True)
name = db.Column(db.String(120), unique = True)
email = db.Column(db.String(120), unique = True)
requests = db.relationship('Request', backref = 'buyer', lazy='dynamic')
def __repr__(self):
return '<Buyer {}>'.format(self.name)
I need to identify which Buyer has a minimum requests from all
of the buyers.
I could do it manually by creating additional lists and put all
requests in a lists and search for the list. But I believe there is another simple way to do it via SQLAlchemy query

You can do this with a CTE (common table expression) for a select that produces buyer ids together with their request counts, so
buyer_id | request_count
:------- | :------------
1 | 5
2 | 3
3 | 1
4 | 1
You can filter here on the counts having to be greater than 0 to be listed.
You can then join the buyers table against that to produce:
buyer_id | buyer_name | buyer_email | request_count
:------- | :--------- | :--------------- | :------------
1 | foo | foo#example.com | 5
2 | bar | bar#example.com | 3
3 | baz | baz#example.com | 1
4 | spam | spam#example.com | 1
but because we are using a CTE, you can also query the CTE for the lowest count value. In the above example, that's 1, and you can add a WHERE clause to the joined buyer-with-cte-counts query to filter the results down to only rows where the request_count value is equal to that minimum number.
The SQL query for this is
WITH request_counts AS (
SELECT request.buyer_id AS buyer_id, count(request.id) AS request_count
FROM request GROUP BY request.buyer_id
HAVING count(request.id) > ?
)
SELECT buyer.*
FROM buyer
JOIN request_counts ON buyer.id = request_counts.buyer_id
WHERE request_counts.request_count = (
SELECT min(request_counts.request_count)
FROM request_counts
)
The WITH request_counts AS (...) defines a CTE, and it is that part that would produce the first table with buyer_id and request_count. The request_count table is then joined with request and the WHERE clause does the filtering on the min(request_counts.request_count) value.
Translating the above to Flask-SQLAlchemy code:
request_count = db.func.count(Request.id).label("request_count")
cte = (
db.select([Request.buyer_id.label("buyer_id"), request_count])
.group_by(Request.buyer_id)
.having(request_count > 0)
.cte('request_counts')
)
min_request_count = db.select([db.func.min(cte.c.request_count)]).as_scalar()
buyers_with_least_requests = Buyer.query.join(
cte, Buyer.id == cte.c.buyer_id
).filter(cte.c.request_count == min_request_count).all()
Demo:
>>> __ = db.session.bulk_insert_mappings(
... Buyer, [{"name": n} for n in ("foo", "bar", "baz", "spam", "no requests")]
... )
>>> buyers = Buyer.query.order_by(Buyer.id).all()
>>> requests = [
... Request(buyer_id=b.id)
... for b in [*([buyers[0]] * 3), *([buyers[1]] * 5), *[buyers[2], buyers[3]]]
... ]
>>> __ = db.session.add_all(requests)
>>> request_count = db.func.count(Request.id).label("request_count")
>>> cte = (
... db.select([Request.buyer_id.label("buyer_id"), request_count])
... .group_by(Request.buyer_id)
... .having(request_count > 0)
... .cte("request_counts")
... )
>>> buyers_w_counts = Buyer.query.join(cte, cte.c.buyer_id == Buyer.id)
>>> for buyer, count in buyers_w_counts.add_column(cte.c.request_count):
... # print out buyer and request count for this demo
... print(buyer, count, sep=": ")
<Buyer foo>: 3
<Buyer bar>: 5
<Buyer baz>: 1
<Buyer spam>: 1
>>> min_request_count = db.select([db.func.min(cte.c.request_count)]).as_scalar()
>>> buyers_w_counts.filter(cte.c.request_count == min_request_count).all()
[<Buyer baz>, <Buyer spam>]
I've also created a db<>fiddle here, containing the same queries, to play with.

Related

SQLAlchemy ORM: Merge two rows based on one common value

How can I merge two rows with same value in one column. Lets say I have a model with ~40 columns like below:
class Model(Base):
__tablename__ = "table"
id = Column(Integer, primary_key=True)
value_a = Column(String)
value_b = Column(String)
value_c = Column(String)
...
And I need to process each time ~500k rows of new data. Also each process creates a new table.
Once inserting the data first time(using session.bulk_insert_mappings(Model, data)) there are duplicated value_c values(max 2), but each time either it has value_a with some string and value_b is empty or value_b with some string and value_a is empty.
After initial insert:
| id | value_a | value_b | value_c |
| -- | ------- | ------- | ------- |
| 1 | foo | None | xyz |
| 2 | None | bar | xyz |
Having all rows I need to merge the rows with common value_c values together to get rid of duplicates.
After update:
| id | value_a | value_b | value_c |
| -- | ------- | ------- | ------- |
| 3 | foo | bar | xyz |
What is the most efficient way to do that? I was using from beginning session.merge(row) for each row but it is to slow and I decided to split it into insert and update stages.
You should be able to insert from a select statement that joins the not null a to the not null b. Then after inserted the combined rows you can delete the old rows. This matches the case you outlined exactly you might need to add more conditions to ignore other entries you might not want inserted or not deleted. (ie. (a, b, c) == (None, None, 'value'))
I used aliased so that i can join the same table against itself.
import sys
from sqlalchemy import (
create_engine,
Integer,
String,
)
from sqlalchemy.schema import (
Column,
)
from sqlalchemy.orm import Session, declarative_base, aliased
from sqlalchemy.sql import select, or_, and_, delete, insert
username, password, db = sys.argv[1:4]
Base = declarative_base()
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=True)
metadata = Base.metadata
class Model(Base):
__tablename__ = "table"
id = Column(Integer, primary_key=True)
value_a = Column(String)
value_b = Column(String)
value_c = Column(String)
metadata.create_all(engine)
def print_models(session):
for (model,) in session.execute(select(Model)).all():
print(model.id, model.value_a, model.value_b, model.value_c)
with Session(engine) as session, session.begin():
for (a, b, c) in [('foo', None, 'xyz'), (None, 'bar', 'xyz'), ('leave', 'it', 'asis')]:
session.add(Model(value_a=a, value_b=b, value_c=c))
session.flush()
print_models(session)
with Session(engine) as session, session.begin():
#
# Insert de-nulled entires.
#
left = aliased(Model)
right = aliased(Model)
nulls_joined_q = select(
left.value_a,
right.value_b,
left.value_c
).distinct().select_from(
left
).join(
right,
left.value_c == right.value_c
).where(
and_(
# Ignore entries with no C value.
left.value_c != None,
left.value_b == None,
right.value_a == None))
stmt = insert(
Model.__table__
).from_select([
"value_a",
"value_b",
"value_c"
], nulls_joined_q)
session.execute(stmt)
#
# Remove null entries: All rows where value_c is NOT NULL and either value_a is empty or value b is empty.
#
# #NOTE: This deletes entries where value_a and value_b are BOTH null in the same row as well.
#
stmt = delete(Model.__table__).where(and_(
# Ignore these like we did in insert.
Model.value_c != None,
or_(
Model.value_a == None,
Model.value_b == None),
))
session.execute(stmt)
session.flush()
# Output
print_models(session)
Output
1 foo None xyz
2 None bar xyz
3 leave it asis
#... then
3 leave it asis
4 foo bar xyz
Docs
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select
https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.aliased
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.delete
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.insert

Query filter over ForeignKey in SQLAlchemy

To simplify, I have two tables (ORM FastAPI)
class Object(Base):
__tablename__ = "object"
id = Column('id', Integer, Identity(start=1), primary_key=True)
name = Column(VARCHAR2(30), unique=True, index=True)
attributes = relationship("Attributes", backref="parent", cascade="all, delete", passive_deletes=True)
class Attributes(Base):
__tablename__ = "attributes"
id = Column('id', Integer, Identity(start=1), primary_key=True)
attribute = Column(VARCHAR2(200), index=True)
value = Column(VARCHAR2(2000), index=True)
parent_id = Column(Integer, ForeignKey("object.id",ondelete="CASCADE"), nullable=False)
An object can have multiple attributes, (1-N relationship).
Attributes are dynamic (depending of the object, some objects have 10 attributes, or 50...)
For example:
Object | Attributes
---------------------------------
Object 1 | color = red
| form = round
| level = 5
| ...
| attribute alpha
---------------------------------
Object 2 | color = red
| form = square
| level = 2
| ...
| attribute beta
I would like to do something like :
"find all Objects where attribute.color = red and attribute.level >= 2 and attribute.X is defined"
I tried :
query = db.query(Object).options(
joinedload(Attributes,innerjoin=False)).join(Attributes)
query = query.filter(Attributes.attribute == 'color')
query = query.filter(Attributes.value == 'red')
...
return query.all()
But I don't know how to cascade filters on table Attributes..
Thanks for your help...
To implement filters, i would use the any():
query = (
session.query(Object)
# -- NOTE: below join is not needed for the filter part;
# .options(joinedload(Attributes, innerjoin=False))
# .join(Attributes)
# add additional criteria
.filter(
Object.attributes.any(
and_(
Attributes.attribute == "color",
Attributes.value == "red",
)
)
)
.filter(
Object.attributes.any(
and_(
Attributes.attribute == "level",
func.cast(Attributes.value, Integer) >= 2,
)
)
)
.filter(Object.attributes.any(Attributes.attribute == "X")) # exists
)
which will produce SQL statement (exact one depends on the DB engine):
SELECT object.id,
object.name
FROM object
WHERE (EXISTS
(SELECT 1
FROM attributes
WHERE object.id = attributes.parent_id
AND attributes.attribute = 'color'
AND attributes.value = 'red'))
AND (EXISTS
(SELECT 1
FROM attributes
WHERE object.id = attributes.parent_id
AND attributes.attribute = 'level'
AND CAST(attributes.value AS INTEGER) >= 2))
AND (EXISTS
(SELECT 1
FROM attributes
WHERE object.id = attributes.parent_id
AND attributes.attribute = 'X'))

Get top level parent id of a hierarchy with SQLAlchemy via recursive CTE

I have this case:
| Note table |
|---------------------|------------------|
| id | parent_id |
|---------------------|------------------|
| 1 | Null |
|---------------------|------------------|
| 2 | 1
|---------------------|------------------|
| 3 | 2
|---------------------|------------------|
| 4 | 3
|---------------------|------------------|
What I want to achieve is to get the Top level parent Id.
in this case if I pass the Id number 4 I would get the Id 1, since Id 1 is the top level parent.
When it reaches the null on the parent_id it means that the id is the top level parent.
I've tried this, but the return is the Id that I pass to the function.
def get_top_level_Note(self, id: int):
hierarchy = self.db.session.query(Note).filter(Note.id == id).cte(name="hierarchy", recursive=True)
parent = aliased(hierarchy, name="p")
children = aliased(Note, name="c")
hierarchy = hierarchy.union_all(self.db.session.query(children).filter(children.parent_id == parent.c.id))
result = self.db.session.query(Note).select_entity_from(hierarchy).all()
With an existing table named "note"
id parent_id
----------- -----------
11 NULL
22 11
33 22
44 33
55 NULL
66 55
a bit of messing around in PostgreSQL showed that
WITH RECURSIVE parent (i, id, parent_id)
AS (
SELECT 0, id, parent_id FROM note WHERE id=44
UNION ALL
SELECT i + 1, n.id, n.parent_id
FROM note n INNER JOIN parent p ON p.parent_id = n.id
WHERE p.parent_id IS NOT NULL
)
SELECT * FROM parent ORDER BY i;
returned
i id parent_id
----------- ----------- -----------
0 44 33
1 33 22
2 22 11
3 11 NULL
and therefore we could get the top-level parent by changing the last line to
WITH RECURSIVE parent (i, id, parent_id)
AS (
SELECT 0, id, parent_id FROM note WHERE id=44
UNION ALL
SELECT i + 1, n.id, n.parent_id
FROM note n INNER JOIN parent p ON p.parent_id = n.id
WHERE p.parent_id IS NOT NULL
)
SELECT id FROM parent ORDER BY i DESC LIMIT 1 ;
returning
id
-----------
11
So to convert that into SQLAlchemy (1.4):
from sqlalchemy import (
create_engine,
Column,
Integer,
select,
literal_column,
)
from sqlalchemy.orm import declarative_base
connection_uri = "postgresql://scott:tiger#192.168.0.199/test"
engine = create_engine(connection_uri, echo=False)
Base = declarative_base()
class Note(Base):
__tablename__ = "note"
id = Column(Integer, primary_key=True)
parent_id = Column(Integer)
def get_top_level_note_id(start_id):
note_tbl = Note.__table__
parent_cte = (
select(
literal_column("0").label("i"), note_tbl.c.id, note_tbl.c.parent_id
)
.where(note_tbl.c.id == start_id)
.cte(name="parent_cte", recursive=True)
)
parent_cte_alias = parent_cte.alias("parent_cte_alias")
note_tbl_alias = note_tbl.alias()
parent_cte = parent_cte.union_all(
select(
literal_column("parent_cte_alias.i + 1"),
note_tbl_alias.c.id,
note_tbl_alias.c.parent_id,
)
.where(note_tbl_alias.c.id == parent_cte_alias.c.parent_id)
.where(parent_cte_alias.c.parent_id.is_not(None))
)
stmt = select(parent_cte.c.id).order_by(parent_cte.c.i.desc()).limit(1)
with engine.begin() as conn:
result = conn.execute(stmt).scalar()
return result
if __name__ == "__main__":
test_id = 44
print(
f"top level id for note {test_id} is {get_top_level_note_id(test_id)}"
)
# top level id for note 44 is 11
test_id = 66
print(
f"top level id for note {test_id} is {get_top_level_note_id(test_id)}"
)
# top level id for note 66 is 55

How to order nested SQL SELECT on SqlAlchemy

I have written an SQL script that should get X entries before a user id, ordered by registration_date desc on ordered db table based on registration date.
To be more concrete, lets say that these are some entries on the ordered db:
id | Name | Email | registration_data
3939 | Barbara Hayes | barbara.hayes#example.com | 2019-09-15T23:39:26.910Z
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
3890 | Villads Andersen | villads.andersen#example.com | 2019-09-12T06:29:48.708Z
3685 | Houssine Van Sabben | houssine.vansabben#example.com | 2019-09-12T02:27:08.396Z
I would like to get the users over id 3890. So the query should return
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
The raw SQL that I wrote is this:
SELECT * from (
SELECT id, name, email, registration_date FROM public.users
WHERE users.registration_date > (SELECT registration_date FROM users WHERE id = 3890)
order by registration_date
limit 2 )
as a
order by registration_date desc
See this dbfiddle.
I tried to implement the SqlAlchemy code with no luck. I believe that I am making a mistake on the subquery. This is what i have done so far.
registration_date_min = db.query(User.registration_date) \
.order_by(User.registration_date) \
.filter(User.id == ending_before).first()
users_list = db.query(User) \
.filter(User.registration_date > registration_date_min) \
.order_by('registration_date').limit(limit).subquery('users_list')
return users_list.order_by(desc('registration_date'))
P.s the ending_before represents a user_id. Like 3890 in the example.
Any ideas on the SqlAlchemy part would be very helpful!
First of all, your registration_date_min query has already been executed; you have a row with one column there. Remove the first() call; it executes the SELECT and returns the first row.
As you are selecting by the primary key, there is only ever going to be a single row and you don't need to order it. Just use:
registration_date_min = db.query(User.registration_date).filter(
User.id == ending_before
)
That's now a query object and can be used directly in a comparison:
users_list = (
db.query(User)
.filter(User.registration_date > registration_date_min)
.order_by(User.registration_date)
.limit(limit)
)
You can then self-select with Query.from_self() from that query to apply the final ordering:
return user_list.from_self().order_by(User.registration_date.desc()))
This produces the following SQL (on SQLite, other dialects can differ):
SELECT anon_1.users_id AS anon_1_users_id, anon_1.users_name AS anon_1_users_name, anon_1.users_email AS anon_1_users_email, anon_1.users_registration_date AS anon_1_users_registration_date
FROM (SELECT users.id AS users_id, users.name AS users_name, users.email AS users_email, users.registration_date AS users_registration_date
FROM users
WHERE users.registration_date > (SELECT users.registration_date AS users_registration_date
FROM users
WHERE users.id = ?) ORDER BY users.registration_date
LIMIT ? OFFSET ?) AS anon_1 ORDER BY anon_1.users_registration_date DESC
If I use the following model with __repr__:
class User(db.Model):
__tablename__ = "users"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
email = db.Column(db.String)
registration_date = db.Column(db.DateTime)
def __repr__(self):
return f"<User({self.id}, {self.name!r}, {self.email!r}, {self.registration_date!r}>"
and print the query result instances I get:
<User(689, 'Noémie Harris', 'noemie.harris#example.com', datetime.datetime(2019, 9, 14, 21, 39, 15, 641000)>
<User(2529, 'Andrea Iglesias', 'andrea.iglesias#example.com', datetime.datetime(2019, 9, 13, 2, 59, 8, 821000)>

How to avoid adding duplicates in a many-to-many relationship table in SQLAlchemy - python?

I am dealing with a many-to-many relationship with sqlalchemy. My question is how to avoid adding duplicate pair values in a many-to-many relational table.
To make things clearer, I will use the example from the official SQLAlchemy documentation.
Base = declarative_base()
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id'))
)
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
parent_name = Column(String(45))
child_rel = relationship("Child", secondary=Parents2children, backref= "parents_backref")
def __init__(self, parent_name=""):
self.parent_name=parent_name
def __repr__(self):
return "<parents(id:'%i', parent_name:'%s')>" % (self.id, self.parent_name)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
child_name = Column(String(45))
def __init__(self, child_name=""):
self.child_name= child_name
def __repr__(self):
return "<experiments(id:'%i', child_name:'%s')>" % (self.id, self.child_name)
###########################################
def setUp():
global Session
engine=create_engine('mysql://root:root#localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
Session=sessionmaker(bind=engine)
def add_data():
session=Session()
name_father1=Parent(parent_name="Richard")
name_mother1=Parent(parent_name="Kate")
name_daughter1=Child(child_name="Helen")
name_son1=Child(child_name="John")
session.add(name_father1)
session.add(name_mother1)
name_father1.child_rel.append(name_son1)
name_daughter1.parents_backref.append(name_father1)
name_son1.parents_backref.append(name_father1)
session.commit()
session.close()
setUp()
add_data()
session.close()
With this code, the data inserted in the tables is the following:
Parents table:
+----+-------------+
| id | parent_name |
+----+-------------+
| 1 | Richard |
| 2 | Kate |
+----+-------------+
Children table:
+----+------------+
| id | child_name |
+----+------------+
| 1 | Helen |
| 2 | John |
+----+------------+
Parents2children table
+------------+-------------+
| parents_id | children_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
+------------+-------------+
As you can see, there's a duplicate in the last table... how could I prevent SQLAlchemy from adding these duplicates?
I've tried to put relationship("Child", secondary=..., collection_class=set) but this error is displayed:
AttributeError: 'InstrumentedSet' object has no attribute 'append'
Add a PrimaryKeyConstraint (or a UniqueConstraint) to your relationship table:
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id')),
PrimaryKeyConstraint('parents_id', 'children_id'),
)
and your code will generate an error when you try to commit the relationship added from both sides. This is very recommended to do.
In order to not even generate an error, just check first:
if not(name_father1 in name_son1.parents_backref):
name_son1.parents_backref.append(name_father1)

Categories

Resources