How can I construct a count aggregation over a join with SqlAlchemy?

How can I construct a count aggregation over a join with SqlAlchemy? - python

I have a table of users, a table of groups that those users may belong to, and a join table between users and groups.
This is represented in SQLAlchemy as follows:
class User(Base):
__tablename__ = 'user'
user_id = Column(Integer, primary_key=True)
name = Column(String(250), nullable=False)
email = Column(String(250), nullable=False)
groups = relationship('Group', secondary='user_group_pair')
class Group(Base):
__tablename__ = 'group'
group_id = Column(Integer, primary_key=True)
name = Column(String(250), nullable=False)
date_created = Column(String(250), nullable=False)
members = relationship('User', secondary='user_group_pair')
class User_Group_Pair(Base):
__tablename__ = 'user_group_pair'
user_group_pair_id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('user.user_id'))
group_id = Column(Integer, ForeignKey('group.group_id'))
user = relationship(User, backref=backref("group_assoc"))
group = relationship(Group, backref=backref("user_assoc"))
I'm trying to solve the following simple problem:
I want to write a query that will return a list of users along with the number of groups that each of them belongs to.
This requires data from both User and User_Group_Pair (thus why the title of my question refers to a join), and a count aggregation grouped by user_id.
I'm not sure why this won't work:
subq = session.query(User_Group_Pair.user_id.label('user_id'), func.count(User_Group_Pair.user_group_pair_id).label('count')).\
group_by(User_Group_Pair.user_id).order_by('count ASC').subquery()
result = session.query(User).join(subq, User.user_id == subq.user_id).all()
I get this error:
'Alias' object has no attribute 'user_id'
However, note that I have labelled User_Group_Pair.user_id with the label 'user_id'... Any thoughts?
Thank you

Just change subq.user_id to subq.c.user_id (c stands for columns) to make it work:
result = session.query(User).join(subq, User.user_id == subq.c.user_id).all()
But still you will get only those users which belong to at least one group, and the number of groups is not really returned in the result of the query. The query below is an approach to solve this issue:
q = (session.query(User, func.count(Group.group_id).label("num_groups"))
.outerjoin(Group, User.groups)
.group_by(User.user_id)
)
for b, num_groups in q:
print(b, num_groups)

http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html#using-subqueries
subquery() method on Query produces a SQL expression construct representing a SELECT statement embedded within an alias. The columns on the statement are accessible through an attribute called c.
You can use column names with .c.column_name in your query
result = session.query(User).join(subq, User.user_id == subq.c.user_id).all()

Related

How to access column values in SQLAlchemy result list after a join a query

I need to access colums of result query. I have these models
class Order(Base):
__tablename__ = "orders"
internal_id = Column(Integer, primary_key=True)
total_cost = Column(Float, nullable=False)
created_at = Column(TIMESTAMP(timezone=True), nullable=False, server_default=text("now()"))
customer_id = Column(Integer, ForeignKey("customers.id", ondelete="CASCADE"), nullable=False)
customer = relationship("Customer")
class Item(Base):
__tablename__ = "items"
id = Column(Integer, primary_key=True, nullable=False)
internal_id = Column(Integer, nullable=False)
price = Column(Float, nullable=False)
description = Column(String, nullable=False)
order_id = Column(Integer, ForeignKey("orders.internal_id", ondelete="CASCADE"), nullable=False)
order = relationship("Order")
Now I run this left join query that gives me all the columns from both tables
result = db.query(Order, Item).join(Item, Item.order_id == Order.internal_id, isouter=True).filter(Item.order_id == order_id).all()
I get back a list of tuples. How do I access a particular column of the result list? Doing something like this
for i in result:
print(i.???) # NOW WHAT?
Getting AttributeError: Could not locate column in row for column anytime i try to fetch it by the name I declared.
this is the full function where I need to use it
#router.get("/{order_id}")
def get_orders(order_id: int, db: Session = Depends(get_db)):
""" Get one order by id. """
# select * from orders left join items on orders.internal_id = items.order_id where orders.internal_id = {order_id};
result = db.query(Order, Item).join(Item, Item.order_id == Order.internal_id, isouter=True).filter(Item.order_id == order_id).all()
for i in result:
print(i.description) # whatever value i put here it errors out
This is the traceback
...
print(i.description) # whatever value i put here it errors out
AttributeError: Could not locate column in row for column 'description'
At least if I could somehow get the column names.. But i just cant get them. Trying keys(), _metadata.keys .. etc. Nothing works so far.

If additional implicite queries are not an issue for you, you can do something like this:
class Order(Base):
__tablename__ = "orders"
internal_id = Column(Integer, primary_key=True)
total_cost = Column(Float, nullable=False)
created_at = Column(TIMESTAMP(timezone=True), nullable=False, server_default=text("now()"))
customer_id = Column(Integer, ForeignKey("customers.id", ondelete="CASCADE"), nullable=False)
customer = relationship("Customer")
items = relationship("Item", lazy="dynamic")
order = session.query(Order).join(Item, Order.internal_id == Item.order_id, isoutrr=True).filter(Order.internal_id == order_id).first()
if order:
for i in order.items:
print(i.description)
print(order.total_cost)
However to avoid additional query when accessing items you can exploit contains_eager option:
from sqlalchemy.orm import contains_eager
order = session.query(Order).join(Item, Order.internal_id == Item.order_id, isoutrr=True).options(contains_eager("items").filter(Order.internal_id == order_id).all()
Here you have some examples: https://jorzel.hashnode.dev/an-orm-can-bite-you

Ok, so acctualy the answer is quite simple. One just simply needs to use dot notation like i.Order.total_cost or whichever other field from the Order model
result = db.query(Order, Item).join(Item, Item.order_id == Order.internal_id, isouter=True).filter(Item.order_id == order_id).all()
for i in result:
print(i.Order.total_cost)
print(i.Item.description)

sqlalchemy - Limit for joined table as if they where not joined

I am using sqlalchemy, and I want to get the following data from user and apply those operations in the order given to a table:
keyword to filter the data with, column to order by, limit and page
number
now I have many tables. The majority "children" tables - a table that has no children - work. However I have a table with a lot of relationships of all kinds .. one to many on both sides, one to one and many to many
to achieve the above operations, I joined all the tables beforehand. filtering and ordering works fine but limit does not give me the wanted result
Join statement:
records = m.Activity.query.join(m.Event, m.Activity.events) \
.join(m.DateLocation, m.Activity.date_locations) \
.join(m.Goal, m.Activity.goals) \
.join(m.Type, m.Activity.type)
filtering and ordering contains a lot of unnessecary information, basically something like this:
# filtering if column == event
records = records.filter(m.Event.name == keyword)
# ordering if column == type and desc was chosen
records = records.order_by(m.Type.name.desc())
and finally limit and pagination:
records = records.limit(limit)
records = records.offset((page - 1) * limit)
Let me explain limit behavior vs what I want:
limit in this code works fine. since I joined all the tables it will return the number of joined row I gave it .. if the join resulted in extra 5 rows and I asked for limit 5 for example, it will return the first 5 regardless of the original table id
What I want is the limit behavior before joining. I only joined them to filter or order by them. after that when I say limit (5) I want to return the first 5 results with distinct ids
I tried the following(one at a time) but didn't work:
records = records.distinct(m.Activity.id).limit(limit)
records = records.group_by(m.Activity.id).limit(limit)
records = records.from_self().limit(limit)
I tried the solution presented here. It does work HOWEVER it limits the set of data BEFORE joining. which doesn't work in my case since I need to limit the filtered data
EDIT: The models:
EventsInActivities = db.Table(
'events_in_activities',
db.Column('activity_id', db.String, db.ForeignKey('activity.id')),
db.Column('event_id', db.Integer(), db.ForeignKey('event.id'))
)
class Event(db.Model, BaseMixin):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String)
class Type(db.Model, BaseMixin):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String, unique=True)
activities = db.relationship("Activity", backref="type", lazy='dynamic')
class Goal(db.Model, BaseMixin):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
activity_id = db.Column(db.String, db.ForeignKey('activity.id'), primary_key=True)
name = db.Column(db.String())
class DateLocation(db.Model, BaseMixin):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
activity_id = db.Column(db.String, db.ForeignKey('activity.id'), primary_key=True)
start_date = db.Column(db.DateTime)
end_date = db.Column(db.DateTime)
location = db.Column(db.String())
class Activity(db.Model, BaseMixin):
id = db.Column(db.String, primary_key=True)
name = db.Column(db.String())
type_id = db.Column(db.Integer, db.ForeignKey('type.id'))
date_locations = db.relationship("DateLocation", order_by='DateLocation.start_date', cascade="all, delete", backref="activity", lazy='dynamic')
goals = db.relationship("Goal", cascade="all, delete", backref="activity", lazy='dynamic')
events = db.relationship('Event', secondary=EventsInActivities, backref=db.backref('activities', lazy='dynamic'))

You could replace at least some of the joins for filters with EXISTS subquery expressions, or semijoins in a way. This way your query avoids producing multiple rows for single activity. It is ok to still join against Type, since it's a many to one relationship:
records = m.Activity.query.\
join(m.Activity.type).\
filter(m.Activity.events.any(name=keyword)).\
filter(m.Activity.goals.any(name=...)).\
filter(...).\
order_by(m.Type.name.desc()).\
limit(limit).\
offset((page - 1) * limit)
Passing keyword arguments to any() is a similar shorthand as filter_by(). It accepts complex criterion expressions as well, as positional arguments.
The distinct(m.Activity.id), or DISTINCT ON, should've worked as well, as long as you then use the results as a subquery, to which you then apply the ordering and limit:
records = m.Activity.query.\
join(m.Activity.events).\
join(m.Activity.date_locations).\
join(m.Activity.goals).\
filter(m.Event.name == keyword).\
filter(...).\
distinct(m.Activity.id).\
from_self().\
join(m.Activity.type).\
order_by(m.Type.name.desc()).\
limit(limit).\
offset((page - 1) * limit)

Using sqlalchemy to define relationships in MySQL

I am in the process of working with sqlalchemy and MySQL to build a database. I am currently having trouble defining a specific relationship between two tables.
class Experiment_Type(Base):
__tablename__='experiment_types'
id = Column(Integer, primary_key=True)
type = Column(String(100))
class Experiments(Base):
__tablename__ = 'experiments'
id = Column(Integer, primary_key=True)
type = Column(String(100))
sub_id = Column(Integer, ForeignKey('experiment_types.id'))
experiments = relationship('Experiments',
primaryjoin="and_(Experiment_Type.id == Experiments.sub_id,
'Experiments.type' == 'Experiment_Type.type')",
backref=backref('link'))
What I want to do is have values of sub_id in experiments match the id in experiment_types based on type (if an entry in experiment_types of type = 'type1' has id = 1, then an entry in experiments with type = 'type1' should have a sub_id = 1). I am not even sure if this is the best way to approach defining the relationship in this situation
so any advice is welcome.
The current error message is this:
sqlalchemy.exc.ArgumentError: Could not locate any relevant foreign key columns for primary join condition '0 = 1' on relationship Experiments.experiments. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or are annotated in the join condition with the foreign() annotation.

The whole point of setting up relationships in relational dbs is to not have to duplicate data across tables. Just do something like this:
class ExperimentType(Base):
__tablename__='experiment_types'
id = Column(Integer, primary_key=True)
name = Column(String(100))
class Experiments(Base):
__tablename__ = 'experiments'
id = Column(Integer, primary_key=True)
description = Column(String(100))
type_id = Column(Integer, ForeignKey('experiment_types.id'))
type = relationship("ExperimentType")
Then, if you do need to display the experiment type stuff later, can access it with something like:
exp = session.query(Experiment).first()
print exp.type.name

Long chained exists query with multiple one-to-many mappings in the chain

Edit: Following piece seems to be the right way:
session.query(User).join("userGadgets", "gadget", "components","gadgetComponentMetals")
Original:
I have the following tables configured:
class User(Base):
__tablename__ = "user"
id = Column(Integer, primary_key=True)
name = Column(String)
class Gadget(Base):
__tablename__ = "gadget"
id = Column(Integer, primary_key=True)
brand = Column(String)
class UserGadget(Base):
__tablename__ = "user_gadget"
user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
gadget_id = Column(Integer, ForeignKey('gadget.id'), primary_key=True)
user = relationship("User", backref=backref('userGadgets', order_by=user_id))
gadget = relationship("Gadget", backref=backref('userGadgets', order_by=gadget_id))
class GadgetComponent(Base):
__tablename__ = "gadget_component"
id = Column(String, primary_key=True)
gadget_id = Column(Integer,ForeignKey('gadget.id'))
component_maker = Column(String)
host = relationship("Gadget", backref=backref('components', order_by=id))
class ComponentUsingMetal(Base):
__tablename__ = "component_metal"
id = Column(Integer, primary_key=True)
component_id = Column(Integer, ForeignKey('GadgetComponent.id'))
metal = Column(String)
component = relationship("GadgetComponent", backref=backref('gadgetComponentMetals', order_by=id))
I want to find all user names for users who own gadgets having at least one component containing some kind of metal. SQL query for this will be something along the lines of:
SELECT distinct u.name FROM user u join user_gadget ug on (u.id = ug.user_id) join gadget_component gc on (ug.gadget_id = gc.id) join component_metal cm on (gc.id = cm.component_id) order by u.name
I have tried various versions along the line of: session.query(User).filter(User.userGadgets.any(UserGadget.gadget.components.any(GadgetComponent.gadgetComponentMetals.exists())))
I get the below error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with UserGadget.gadget has an attribute 'gadgetComponents'
Any ideas on what I am doing wrong or is there a better way to do this kind of query in SQLAlchemy?

the join() is the better way to go here since any() is going to produce lots of expensive nested subqueries. but the mistake you made with the "any" is using syntax like: UserGadget.gadget.components. SQLAlchemy doesn't continue the namespace of attributes in a series like that, e.g. there is no UserGadget.gadget.components; there is just UserGadget.gadget and Gadget.components, separately. Just like SQL won't let you say, "SELECT * from user_gadget.gadget_id.gadget.component_id" or something, SQLAlchemy needs you to tell it how you want to join together multiple tables that you're querying from. With the any() here it would be something like any(and_(UserGadget.gadget_id == GadgetComponent.gadget_id)), but using JOIN is better in any case.

sqlalchemy foreign keys / query joins

Hi im having some trouble with foreign key in sqlalchemy not auto incrementing on a primary key ID
Im using: python 2.7, pyramid 1.3 and sqlalchemy 0.7
Here is my models
class Page(Base):
__tablename__ = 'page'
id = Column(Integer, ForeignKey('mapper.object_id'), autoincrement=True, primary_key=True)
title = Column(String(30), unique=True)
title_slug = Column(String(75), unique=True)
text = Column(Text)
date_added = Column(DateTime)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String(100), unique=True)
email = Column(String(100), unique=True)
password = Column(String(100))
class Group(Base):
__tablename__ = 'groups'
id = Column(Integer, primary_key=True)
name = Column(String(100), unique=True)
class Member(Base):
__tablename__ = 'members'
user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
group_id = Column(Integer, ForeignKey('groups.id'), primary_key=True)
class Resource(Base):
__tablename__ = 'resource'
id = Column(Integer, primary_key=True)
tablename = Column(Text)
action = Column(Text)
class Mapper(Base):
__tablename__ = 'mapper'
resource_id = Column(Integer, ForeignKey('resource.id'), primary_key=True)
group_id = Column(Integer, ForeignKey('groups.id'), primary_key=True)
object_id = Column(Integer, primary_key=True)
and here is my RAW SQL query which i've written in SQLAlchemys ORM
'''
SELECT g.name, r.action
FROM groups AS g
INNER JOIN resource AS r
ON m.resource_id = r.id
INNER JOIN page AS p
ON p.id = m.object_id
INNER JOIN mapper AS m
ON m.group_id = g.id
WHERE p.id = ? AND
r.tablename = ?;
'''
obj = Page
query = DBSession().query(Group.name, Resource.action)\
.join(Mapper)\
.join(obj)\
.join(Resource)\
.filter(obj.id == obj_id, Resource.tablename == obj.__tablename__).all()
the raw SQL Query works fine without any relations between Page and Mapper, but SQLAlchemys ORM seem to require a ForeignKey link to be able to join them. So i decided to put the ForeignKey at Page.id since Mapper.object_id will link to several different tables.
This makes the SQL ORM query with the joins work as expected but adding new data to the Page table results in a exception.
FlushError: Instance <Page at 0x3377c90> has a NULL identity key.
If this is an auto- generated value, check that the database
table allows generation of new primary key values, and that the mapped
Column object is configured to expect these generated values.
Ensure also that this flush() is not occurring at an inappropriate time,
such as within a load() event.
here is my view code:
try:
session = DBSession()
with transaction.manager:
page = Page(title, text)
session.add(page)
return HTTPFound(location=request.route_url('home'))
except Exception as e:
print e
pass
finally:
session.close()
I really don't know why, but i'd rather have the solution in SQLalchemy than doing the RAW SQL since im making this project for learning purposes :)

I do not think autoincrement=True and ForeignKey(...) play together well.
In any case, for join to work without any ForeignKey, you can just specify the join condition in the second parameter of the join(...):
obj = Page
query = DBSession().query(Group.name, Resource.action)\
.join(Mapper)\
.join(Resource)\
.join(obj, Resource.tablename == obj.__tablename__)\
.filter(obj.id == obj_id)\
.all()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I construct a count aggregation over a join with SqlAlchemy? - python

Related

How to access column values in SQLAlchemy result list after a join a query

sqlalchemy - Limit for joined table as if they where not joined

Using sqlalchemy to define relationships in MySQL

Long chained exists query with multiple one-to-many mappings in the chain

sqlalchemy foreign keys / query joins

Categories

Resources