How to order nested SQL SELECT on SqlAlchemy - python

I have written an SQL script that should get X entries before a user id, ordered by registration_date desc on ordered db table based on registration date.
To be more concrete, lets say that these are some entries on the ordered db:
id | Name | Email | registration_data
3939 | Barbara Hayes | barbara.hayes#example.com | 2019-09-15T23:39:26.910Z
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
3890 | Villads Andersen | villads.andersen#example.com | 2019-09-12T06:29:48.708Z
3685 | Houssine Van Sabben | houssine.vansabben#example.com | 2019-09-12T02:27:08.396Z
I would like to get the users over id 3890. So the query should return
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
The raw SQL that I wrote is this:
SELECT * from (
SELECT id, name, email, registration_date FROM public.users
WHERE users.registration_date > (SELECT registration_date FROM users WHERE id = 3890)
order by registration_date
limit 2 )
as a
order by registration_date desc
See this dbfiddle.
I tried to implement the SqlAlchemy code with no luck. I believe that I am making a mistake on the subquery. This is what i have done so far.
registration_date_min = db.query(User.registration_date) \
.order_by(User.registration_date) \
.filter(User.id == ending_before).first()
users_list = db.query(User) \
.filter(User.registration_date > registration_date_min) \
.order_by('registration_date').limit(limit).subquery('users_list')
return users_list.order_by(desc('registration_date'))
P.s the ending_before represents a user_id. Like 3890 in the example.
Any ideas on the SqlAlchemy part would be very helpful!

First of all, your registration_date_min query has already been executed; you have a row with one column there. Remove the first() call; it executes the SELECT and returns the first row.
As you are selecting by the primary key, there is only ever going to be a single row and you don't need to order it. Just use:
registration_date_min = db.query(User.registration_date).filter(
User.id == ending_before
)
That's now a query object and can be used directly in a comparison:
users_list = (
db.query(User)
.filter(User.registration_date > registration_date_min)
.order_by(User.registration_date)
.limit(limit)
)
You can then self-select with Query.from_self() from that query to apply the final ordering:
return user_list.from_self().order_by(User.registration_date.desc()))
This produces the following SQL (on SQLite, other dialects can differ):
SELECT anon_1.users_id AS anon_1_users_id, anon_1.users_name AS anon_1_users_name, anon_1.users_email AS anon_1_users_email, anon_1.users_registration_date AS anon_1_users_registration_date
FROM (SELECT users.id AS users_id, users.name AS users_name, users.email AS users_email, users.registration_date AS users_registration_date
FROM users
WHERE users.registration_date > (SELECT users.registration_date AS users_registration_date
FROM users
WHERE users.id = ?) ORDER BY users.registration_date
LIMIT ? OFFSET ?) AS anon_1 ORDER BY anon_1.users_registration_date DESC
If I use the following model with __repr__:
class User(db.Model):
__tablename__ = "users"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
email = db.Column(db.String)
registration_date = db.Column(db.DateTime)
def __repr__(self):
return f"<User({self.id}, {self.name!r}, {self.email!r}, {self.registration_date!r}>"
and print the query result instances I get:
<User(689, 'Noémie Harris', 'noemie.harris#example.com', datetime.datetime(2019, 9, 14, 21, 39, 15, 641000)>
<User(2529, 'Andrea Iglesias', 'andrea.iglesias#example.com', datetime.datetime(2019, 9, 13, 2, 59, 8, 821000)>

Related

PostgreSQL JOIN on JSON Object column

I'm supposed to join 3 different tables on postgres:
lote_item (on which I have some books id's)
lote_item_log (on which I have a column "attributes", with a JSON object such as {"aluno_id": "2823", "aluno_email": "someemail#outlook.com", "aluno_unidade": 174, "livro_codigo": "XOZK-0NOYP0Z1EMJ"}) - Obs.: Some values on aluno_unidade are null
and finally
company (on which I have every school name for every aluno_unidade.
Ex: aluno_unidade = 174 ==> nome_fantasia = mySchoolName).
Joining the first two tables was easy, since lote_item_log has a foreign key which I could match like this:
SELECT * FROM lote_item JOIN lote_item_log ON lote_item.id = lote_item_log.lote_item_id
Now, I need to get the School Name, contained on table company, with the aluno_unidade ID from table lote_item_log.
My current query is:
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
(
SELECT
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int
FROM
lote_item_log
WHERE
operation_id = 6
) = company.senior_id
WHERE
item_id = {book_id};
operation_id determines which school is active.
ERROR I'M GETTING:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.CardinalityViolation) more than one row returned by a subquery used as an expression
I tried LIMIT 1, but then I got just an empty array.
What I need is:
lote_item.created_at | lote_item.updated_at | lote_item.item_id | uuid | aluno_email | c014_id | nome_fantasia | cnpj | is_franchise | is_active
somedate | somedate | some_item_id | XJW4 | someemail#a | some_id | SCHOOL NAME | cnpj | t | t
I got it.
Not sure it's the best way, but worked...
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int = company.senior_id
WHERE
item_id = {book_id};

SQLAlchemy: select most recent row for all ids in a single table with composite primary key

I want to do this but in
SQLAlchemy. The only difference is that rather than only being able to get the most recent record, I want to be able to get the most recent record before a
given timestamp. As long as I ensure rows are never deleted, this allows me to view the database as it was on a particular timestamp.
Let's say my model looks like this:
from datetime import datetime
from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative include declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id_ = Column("id", Integer, primary_key=True, index=True, nullable=False)
timestamp = Column(DateTime, primary_key=True, index=True, nullable=False, default=datetime.utcnow())
# other non-primary attributes would go here
And I have this users table (timestamps simplified):
| id_ | timestamp |
-------------------
0 1
0 4
0 6
1 3
2 7
2 3
For example, if I request a snapshot at timestamp = 4, I want to get:
| id_ | timestamp |
-------------------
0 4
1 3
2 3
The best I can come up with is doing it procedurally:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
db_engine = create_engine(...)
SessionLocal = sessionmaker(bind=db_engine, ...)
db_session = SessionLocal()
def get_snapshot(timestamp: datetime):
all_versions = db_session.query(User).filter(User.timestamp <= timestamp).order_by(desc(User.timestamp))
snapshot = []
for v in all_versions:
if v.id_ not in (i.id_ for i in snapshots):
snapshot.append(v)
return snapshot
However, this gives me a list of model objects rather than a sqlalchemy.orm.query.Query, so I have to treat the result differently to standard queries in
other parts of my code. Can this be done all in the ORM?
Thanks in advance
Have you tried:
all_versions = db_session.query(User, func.max(User.timestamp)).\
filter(User.timestamp <= timestamp).\
group_by(User.id_)
You can read more about generic functions in SQLAlchemy here
An alternative to Matteo's solution is to use a subquery and join it to the table, which gives the result in my preferred format of a sqlalchemy.orm.query.Query object. Credit to Matteo for the code for the subquery:
subq = db_session.query(User.id_, func.max(User.timestamp).label("maxtimestamp")).filter(User.timestamp < timestamp).group_by(User.id_).subquery()
q = db_session.query(User).join(subq, and_(User.id_ == subq.c.id, User.timestamp == subq.c.maxtimestamp))
SQL generation
Note that this is probably less efficient than Matteo's solution:
SQL generated by subquery solution
SELECT users.id AS users_id, users.timestamp AS users_timestamp, users.name AS users_name, users.notes AS users_notes, users.active AS users_active
FROM users JOIN (SELECT users.id AS id, max(users.timestamp) AS maxtimestamp
FROM users
WHERE users.timestamp < ? GROUP BY users.id) AS anon_1 ON users.id = anon_1.id AND users.timestamp = anon_1.maxtimestamp
SQL generated by Matteo's solution:
SELECT users.id AS users_id, users.timestamp AS users_timestamp, users.name AS users_name, users.notes AS users_notes, users.active AS users_active, max(users.timestamp) AS max_1
FROM users
WHERE users.timestamp <= ? GROUP BY users.id
Previous content of this answer
#Matteo Di Napoli
Thanks, your post is more or less what I need. The output of this is an sqlalchemy.util._collections.result, which behaves like a tuple from what I can see. In my application I need the full User objects, not just id / timestamp pairs, so the better fit for me is:
from sqlalchemy import func
all_versions = db_session.query(User, func.max(User.timestamp)).\
filter(User.timestamp <= timestamp).\
group_by(User.id_)
Returning something like:
> for i in all_versions: print(i)
...
(<User "my test user v2", id 0, modified 2019-06-19 14:42:16.380381>, datetime.datetime(2019, 6, 19, 14, 42, 16, 380381))
(<User "v2", id 1, modified 2019-06-19 15:53:53.147039>, datetime.datetime(2019, 6, 19, 15, 53, 53, 147039))
(<User "a user", id 2, modified 2019-06-20 12:34:56>, datetime.datetime(2019, 6, 20, 12, 34, 56))
I can then access the User objects with all_versions[n][0] or get a list with l = [i[0] for i in all_versions] (thanks to Matteo Di Napoli for the nicer syntax there).
The perfect end result would be if I could get a result that is still a sqlalchemy.orm.query.Query (like all_versions), but with each item a User object rather than a sqlalchemy.util._collections.result. Is that possible?

SQLAlchemy relationships populating foreign key fields

I have the following tables with their respective sqlalchemy classes:
class Enrolled(Base):
__tablename__ = 'enrolled'
id = Column(Integer, primary_key=True, nullable=False, autoincrement=True)
student_fk = Column(Integer, ForeignKey('students.id'))
student = relationship('Students', foreign_keys=[device_fk], uselist=False,backref="enrolled", innerjoin=False, post_update=False)
subject = Column(String(5, convert_unicode=True), nullable=False)
//__init__ for id and subject is here.
class Students(Base):
__tablename__ = 'students'
id = Column(Integer, primary_key=True, nullable=False, autoincrement=True)
name = Column(String(50, convert_unicode=True), nullable=False)
//init for name is here
Relationship between students and enrolled is one to many. i.e one student can enroll himself in more then 1 subject.
Now, I know to insert a couple of subjects into 'Enrolled' and names into 'Students' classes.
DBSession.add(Enrolled(subject="maths"))
In the end this is how my tables look
Enrolled:
+----+------------+---------+
| id | student_fk | subject |
+----+------------+---------+
| 1 | | Maths |
| 2 | | Physics |
| 3 | | Art |
+----+------------+---------+
Students:
+----+------+
| id | name |
+----+------+
| 1 | Jim |
| 2 | Bob |
| 3 | Cara |
+----+------+
Now, how do I get the students id get into Enrolled table as foreign keys?
I have this information : which student is enrolled into which subject as a .csv file..
mycsv: name,subject,name1,subject1,name2,subject2
Should I have a manual dictionary like dict {jim:maths,Bob:Art,Cara:Physics} and then map like
query=Enrolled(subject="maths")
for k, v in dict.items():
if subject in v:
list.append(k)
for i in list:
query.student=DBSession.query(Students).filter(name=i).first()
DBSession.add(query)
Please help.. How do I get the student_fk field populated properly?
Your 1-to-many enrollment table should have composite primary key on Student ID and subject. Assuming you want to keep subjects as ENUM (which works with small list of subjects, otherwise you should move it to a separate table), you tables should look something like:
subjects = [ 'Maths', 'Physics', 'Art', ]
class Student(Base):
__tablename__ = 'Student'
student_id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(50, convert_unicode=True), nullable=False)
class StudentEnrollment(Base):
__tablename__ = 'StudentEnrollment'
student_id = Column(Integer, ForeignKey('Student.student_id', ondelete='CASCADE'), primary_key=True)
subject = Column(Enum(*subjects), primary_key=True)
student = relationship("Student", primaryjoin='StudentEnrollment.student_id==Student.student_id', uselist=True, backref="enrollments")
which will result in:
root#localhost [inDB]> show create table Student\G
*************************** 1. row ***************************
Table: Student
Create Table: CREATE TABLE `Student` (
`student_id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
PRIMARY KEY (`student_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
root#localhost [inDB]> show create table StudentEnrollment\G
*************************** 1. row ***************************
Table: StudentEnrollment
Create Table: CREATE TABLE `StudentEnrollment` (
`student_id` int(11) NOT NULL,
`subject` enum('Maths','Physics','Art') NOT NULL,
PRIMARY KEY (`student_id`,`subject`),
CONSTRAINT `StudentEnrollment_ibfk_1` FOREIGN KEY (`student_id`) REFERENCES `Student` (`student_id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
then to insert few enrollments for user Jim:
student = Student(name='Jim')
session.add(student)
session.flush()
for enr in ('Maths', 'Physics', 'Art'):
session.add(StudentEnrollment(student_id=student.student_id, subject=enr))
session.flush()
session.commit()
which will result in:
root#localhost [inDB]> select * from Student;
+------------+------+
| student_id | name |
+------------+------+
| 3 | Jim |
+------------+------+
1 row in set (0.00 sec)
root#localhost [inDB]> select * from StudentEnrollment;
+------------+---------+
| student_id | subject |
+------------+---------+
| 3 | Maths |
| 3 | Physics |
| 3 | Art |
+------------+---------+
3 rows in set (0.00 sec)
This is a very basic example with two tables. A better option would be to normalize Enrollments into separate table and use association proxy pattern, see http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/associationproxy.html

How to avoid adding duplicates in a many-to-many relationship table in SQLAlchemy - python?

I am dealing with a many-to-many relationship with sqlalchemy. My question is how to avoid adding duplicate pair values in a many-to-many relational table.
To make things clearer, I will use the example from the official SQLAlchemy documentation.
Base = declarative_base()
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id'))
)
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
parent_name = Column(String(45))
child_rel = relationship("Child", secondary=Parents2children, backref= "parents_backref")
def __init__(self, parent_name=""):
self.parent_name=parent_name
def __repr__(self):
return "<parents(id:'%i', parent_name:'%s')>" % (self.id, self.parent_name)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
child_name = Column(String(45))
def __init__(self, child_name=""):
self.child_name= child_name
def __repr__(self):
return "<experiments(id:'%i', child_name:'%s')>" % (self.id, self.child_name)
###########################################
def setUp():
global Session
engine=create_engine('mysql://root:root#localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
Session=sessionmaker(bind=engine)
def add_data():
session=Session()
name_father1=Parent(parent_name="Richard")
name_mother1=Parent(parent_name="Kate")
name_daughter1=Child(child_name="Helen")
name_son1=Child(child_name="John")
session.add(name_father1)
session.add(name_mother1)
name_father1.child_rel.append(name_son1)
name_daughter1.parents_backref.append(name_father1)
name_son1.parents_backref.append(name_father1)
session.commit()
session.close()
setUp()
add_data()
session.close()
With this code, the data inserted in the tables is the following:
Parents table:
+----+-------------+
| id | parent_name |
+----+-------------+
| 1 | Richard |
| 2 | Kate |
+----+-------------+
Children table:
+----+------------+
| id | child_name |
+----+------------+
| 1 | Helen |
| 2 | John |
+----+------------+
Parents2children table
+------------+-------------+
| parents_id | children_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
+------------+-------------+
As you can see, there's a duplicate in the last table... how could I prevent SQLAlchemy from adding these duplicates?
I've tried to put relationship("Child", secondary=..., collection_class=set) but this error is displayed:
AttributeError: 'InstrumentedSet' object has no attribute 'append'
Add a PrimaryKeyConstraint (or a UniqueConstraint) to your relationship table:
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id')),
PrimaryKeyConstraint('parents_id', 'children_id'),
)
and your code will generate an error when you try to commit the relationship added from both sides. This is very recommended to do.
In order to not even generate an error, just check first:
if not(name_father1 in name_son1.parents_backref):
name_son1.parents_backref.append(name_father1)

SQLAlchemy Column to Row Transformation and vice versa -- is it possible?

I'm looking for a SQLAlchemy only solution for converting a dict received from a form submission into a series of rows in the database, one for each field submitted. This is to handle preferences and settings that vary widely across applications. But, it's very likely applicable to creating pivot table like functionality. I've seen this type of thing in ETL tools but I was looking for a way to do it directly in the ORM. I couldn't find any documentation on it but maybe I missed something.
Example:
Submitted from form: {"UniqueId":1, "a":23, "b":"Hello", "c":"World"}
I would like it to be transformed (in the ORM) so that it is recorded in the database like this:
_______________________________________
|UniqueId| ItemName | ItemValue |
---------------------------------------
| 1 | a | 23 |
---------------------------------------
| 1 | b | Hello |
---------------------------------------
| 1 | c | World |
---------------------------------------
Upon a select the result would be transformed (in the ORM) back into a row of data from each of the individual values.
---------------------------------------------------
| UniqueId | a | b | c |
---------------------------------------------------
| 1 | 23 | Hello | World |
---------------------------------------------------
I would assume on an update that the best course of action would be to wrap a delete/create in a transaction so the current records would be removed and the new ones inserted.
The definitive list of ItemNames will be maintained in a separate table.
Totally open to more elegant solutions but would like to keep out of the database side if at all possible.
I'm using the declarative_base approach with SQLAlchemy.
Thanks in advance...
Cheers,
Paul
Here is a slightly modified example from documentation to work with such table structure mapped to dictionary in model:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm import relation, sessionmaker
metadata = MetaData()
Base = declarative_base(metadata=metadata, name='Base')
class Item(Base):
__tablename__ = 'Item'
UniqueId = Column(Integer, ForeignKey('ItemSet.UniqueId'),
primary_key=True)
ItemSet = relation('ItemSet')
ItemName = Column(String(10), primary_key=True)
ItemValue = Column(Text) # Use PickleType?
def _create_item(ItemName, ItemValue):
return Item(ItemName=ItemName, ItemValue=ItemValue)
class ItemSet(Base):
__tablename__ = 'ItemSet'
UniqueId = Column(Integer, primary_key=True)
_items = relation(Item,
collection_class=attribute_mapped_collection('ItemName'))
items = association_proxy('_items', 'ItemValue', creator=_create_item)
engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
data = {"UniqueId": 1, "a": 23, "b": "Hello", "c": "World"}
s = ItemSet(UniqueId=data.pop("UniqueId"))
s.items = data
session.add(s)
session.commit()

Categories

Resources