Question?
What is the most elegant way to insert if new and update if it exists between tables with a relationship in the same postgres DB using sqlalchemy?
Context
Users review a stage table; can accept changes that then get sent to the production table after review.
The stage table may have new records or just edited records compared to the production table.
I figured out how to do this from in peices from other questions, but all of them involve making a new object and specifying the column names e.g.
rutable.ru=rustage.ru
...ect....
My model
class rutable(db.Model):
id = db.Column(db.Integer, primary_key=True)
ru =db.Column(db.String(6), db.ForeignKey('rustage.ru'),unique=True)
class rustage(db.Model):
id = db.Column(db.Integer, primary_key=True)
ru =db.Column(db.String(6), index=True, unique=True)
rutablelink = db.relationship('rutable', lazy='dynamic', backref='ruback')
Pseudo code
#check to see if records exists
if db.session.query(exists().where(models.rutable.id == rurow)).scalar():
rustage=models.rustage.query.filter_by(id=rurow).first()
ruprod=models.rutable.query.filter_by(id=rurow).first()
form = rutable(obj=rustage)
# import pdb;pdb.set_trace()
form.populate_obj(ruprod)
else:
rustage=models.rustage.query.filter_by(id=rurow).first()
#how do i insert rustage into rutable and reset primary key?
#without creating a new obj in which i have to specify each column name?
#e.g. rutable(a=rustage.a,..... 1 million column names (i could use a form?
db.session.commit()
UPDATE
i decided to go with a form
here is what i came up with
rustage=models.rustage.query.filter_by(id=rurow).first()
if db.session.query(exists().where(models.rutable.id == rurow)).scalar():
ruprod=models.rutable.query.filter_by(id=rurow).first()
form = rutable(obj=rustage)
form.populate_obj(ruprod)
else:
ruprod=rutable(ru=rustage.ru)
form = rutable(obj=rustage)
form.populate_obj(ruprod)
rustage.reviewEdit=False
ruprod.reviewEdit=False
db.session.commit()
Related
I am building a system for experiment data organization. What is the most efficient way to insert new entries to a table that has a many-to-many relationship with another table?
Simple example
I have two tables: Subject & Task. Subjects can run multiple Tasks (experiments) and Tasks can be run by many Subjects. They are associated by an association table r_Subject_Task.
Questions
How would I insert multiple entries at a time into Subjects that share the same task that does not already exist in the Task database?
Q1 but if the Task does already exist in the Task database?
How can I add a Task that relates to an existing Subject?
Code Example
Setup engine and classes
import sqlalchemy as db
from sqlalchemy import Column, ForeignKey, Integer, String, Table
from sqlalchemy.orm import declarative_base, relationship
Base = declarative_base()
engine = db.create_engine('sqlite:///test.db')
r_Subject_Task = Table(
'r_subject_task',
Base.metadata,
Column('subject_id', ForeignKey('subject.id'), primary_key=True),
Column('task_id', ForeignKey('task.id'), primary_key=True)
)
class Subject (Base):
__tablename__ = 'subject'
id = Column(String(30), primary_key = True)
tasks = relationship("Task", secondary=r_Subject_Task, back_populates='subjects')
def __repr__(self):
return f'Subject(id = {self.id}), tasks = {self.tasks}'
class Task (Base):
__tablename__ = 'task'
id = Column(String(30), primary_key=True)
subjects = relationship("Subject", secondary=r_Subject_Task, back_populates='tasks')
def __repr__(self):
return f'Task(id = {self.id}, subjects = {self.subjects})'
Base.metadata.create_all(engine)
Great, the tables are set up. Now I'd like to insert data.
from sqlalchemy.orm import Session
with Session(engine) as session:
subject001 = Subject(
id = 'subj001',
tasks = [Task(id = 'task1')]
)
subject002 = Subject(
id = 'subj002',
tasks = [Task(id = 'task1'), Task(id = 'task2')]
)
subject003 = Subject(
id = 'subj003'
)
task3 = Task(
id = 'task3'
subjects = [Subject(id = 'subj003')]
)
session.add_all([subject001, subject002, subject003, task3])
session.commit()
This results in two errors.
Error1: committing multiple subjects with matching tasks. When I try to create multiple Subjects with matching Tasks, the UNIQUE constraint is failed; I am trying to pass in multiple instances of 'task1'.
Error2: committing a task with an existing subject. When I try to create Task 'task3' and apply it to 'subj004', the UNIQUE constraint is failed; I am trying to add a subject that already exists.
What is the most efficient way to insert many-to-many data dynamically? Should I go one by one? This seems like a standard use case but I'm having difficulty finding good resources in the docs. Any links to relevant tutorials or examples would be much appreciated.
Thanks
First of all, i would like to apologize as my SQL knowledge level is still very low. Basically the problem is the following: I have two distinct tables, no direct relationship between them, but they share two columns: storm_id and userid.
Basically, i would like to query all posts from storm_id, that are not from a banned user and some extra filters.
Here are the models:
Post
class Post(db.Model):
id = db.Column(db.Integer, primary_key = True)
...
userid = db.Column(db.String(100))
...
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Banneduser
class Banneduser(db.Model):
id = db.Column(db.Integer, primary_key=True)
sn = db.Column(db.String(60))
userid = db.Column(db.String(100))
name = db.Column(db.String(60))
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Both Post and Banneduser are another table (Storm) children. And here is the query i am trying to output. As you can see, i am trying to filter:
verified posts
by descending order
with a limit (i put it apart from the query as the elif has other filters)
# we query banned users id
bannedusers = db.session.query(Banneduser.userid)
# we do the query except the limit, as in the if..elif there are more filtering queries
joined = db.session.query(Post, Banneduser)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
# here comes the trouble
.filter(~Post.userid.in_(bannedusers))\
.order_by(Post.timenow.desc())\
try:
if contentsettings.filterby == 'all':
posts = joined.limit(contentsettings.maxposts)
print((posts.all()))
# i am not sure if this is pythonic
posts = [item[0] for item in posts]
return render_template("stream.html", storm=storm, wall=posts)
elif ... other queries
I got two problems, one basic and one underlying problem:
1/ .filter(~Post.userid.in_(bannedusers))\ gives one output EACH TIME post.userid is not in bannedusers, so i get N repeated posts. I try to filter this with distinct, but it does not work
2/ Underlying problem: i am not sure if my approach is the correct one (the ddbb model structure/relationship plus the queries)
Use SQL EXISTS. Your query should be like this:
db.session.query(Post)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
.filter(~ exists().where(Banneduser.storm_id==Post.storm_id))\
.order_by(Post.timenow.desc())
I am creating a website using Flask and SQLAlchemy. This website keeps track of classes that a student has taken. I would like to find a way to search my database using SQLAlchemy to find all unique classes that have been entered. Here is code from my models.py for Class:
class Class(db.Model):
__tablename__ = 'classes'
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(100))
body = db.Column(db.Text)
created = db.Column(db.DateTime, default=datetime.datetime.now)
user_email = db.Column(db.String(100), db.ForeignKey(User.email))
user = db.relationship(User)
In other words, I would like to get all unique values from the title column and pass that to my views.py.
Using the model query structure you could do this
Class.query.with_entities(Class.title).distinct()
query = session.query(Class.title.distinct().label("title"))
titles = [row.title for row in query.all()]
titles = [r.title for r in session.query(Class.title).distinct()]
As #van has pointed out, what you are looking for is:
session.query(your_table.column1.distinct()).all(); #SELECT DISTINCT(column1) FROM your_table
but I will add that in most cases, you are also looking to add another filter on the results. In which case you can do
session.query(your_table.column1.distinct()).filter_by(column2 = 'some_column2_value').all();
which translates to sql
SELECT DISTINCT(column1) FROM your_table WHERE column2 = 'some_column2_value';
I have three tables: UserTypeMapper, User, and SystemAdmin. In my get_user method, depending on the UserTypeMapper.is_admin row, I then query either the User or SystemAdmin table. The user_id row correlates to the primary key id in the User and SystemAdmin tables.
class UserTypeMapper(Base):
__tablename__ = 'user_type_mapper'
id = Column(BigInteger, primary_key=True)
is_admin = Column(Boolean, default=False)
user_id = Column(BigInteger, nullable=False)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
class User(Base):
__tablename__ = 'user'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
I want to be able to get any user – system admin or regular user – from one query, so I do a join, on either User or SystemAdmin depending on the is_admin row. For example:
DBSession.query(UserTypeMapper, SystemAdmin).join(SystemAdmin, UserTypeMapper.user_id==SystemAdmin.id).first()
and
DBSession.query(UserTypeMapper, User).join(User, UserTypeMapper.user_id==User.id).first()
This works fine; however, I then would like to be access these, like so:
>>> my_admin_obj.is_admin
True
>>> my_admin_obj.name
Bob Smith
versus
>>> my_user_obj.is_admin
False
>>> my_user_obj.name
Bob Stevens
Currently, I have to specify: my_user_obj.UserTypeMapper.is_admin and my_user_obj.User.name. From what I've been reading, I need to map the tables so that I don't need to specify which table the attribute belongs to. My problem is that I do not understand how I can specify this given that I have two potential tables that the name attribute, for example, may come from.
This is the example I am referring to: Mapping a Class against Multiple Tables
How can I achieve this? Thank you.
You have discovered why "dual purpose foreign key", is an antipattern.
There is a related problem to this that you haven't quite pointed out; there's no way to use a foreign key constraint to enforce the data be in a valid state. You want to be sure that there's exactly one of something for each row in UserTypeMapper, but that 'something' is not any one table. formally you want a functional dependance on
user_type_mapper → (system_admin× 1) ∪ (user× 0)
But most sql databses won't allow you to write a foreign key constraint expressing that.
It looks complicated because it is complicated.
instead, lets consider what we really want to say; "every system_admin should be a user; or
system_admin → user
In sql, that would be written:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
CREATE TABLE system_admin (
user_id INTEGER PRIMARY KEY REFERENCES user(id)
);
Or, in sqlalchemy declarative style
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
email = Column(String)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
user_id = Column(ForeignKey(User.id), primary_key=True)
What sort of questions does this schema allow us to ask?
"Is there a SystemAdmin by the name of 'john doe'"?
>>> print session.query(User).join(SystemAdmin).filter(User.name == 'john doe').exists()
EXISTS (SELECT 1
FROM "user" JOIN system_admin ON "user".id = system_admin.user_id
WHERE "user".name = :name_1)
"How many users are there? How many sysadmins?"
>>> print session.query(func.count(User.id), func.count(SystemAdmin.user_id)).outerjoin(SystemAdmin)
SELECT count("user".id) AS count_1, count(system_admin.user_id) AS count_2
FROM "user" LEFT OUTER JOIN system_admin ON "user".id = system_admin.user_id
I hope you can see why the above is prefereable to the design you describe in your question; but in the off chance you don't have a choice (and only in that case, if you still feel what you've got is better, please refine your question), you can still cram that data into a single python object, which will be very difficult to work with, by providing an alternate mapping to the tables; specifically one which follows the rough structure in the first equation.
We need to mention UserTypeMapper twice, once for each side of the union, for that, we need to give aliases.
>>> from sqlalchemy.orm import aliased
>>> utm1 = aliased(UserTypeMapper)
>>> utm2 = aliased(UserTypeMapper)
For the union bodies join each alias to the appropriate table: Since SystemAdmin and User have the same columns in the same order, we don't need to describe them in detail, but if they are at all different, we need to make them "union compatible", by mentioning each column explicitly; this is left as an exercise.
>>> utm_sa = Query([utm1, SystemAdmin]).join(SystemAdmin, (utm1.user_id == SystemAdmin.id) & (utm1.is_admin == True))
>>> utm_u = Query([utm2, User]).join(User, (utm2.user_id == User.id) & (utm2.is_admin == False))
And then we join them together...
>>> print utm_sa.union(utm_u)
SELECT anon_1.user_type_mapper_1_id AS anon_1_user_type_mapper_1_id, anon_1.user_type_mapper_1_is_admin AS anon_1_user_type_mapper_1_is_admin, anon_1.user_type_mapper_1_user_id AS anon_1_user_type_mapper_1_user_id, anon_1.system_admin_id AS anon_1_system_admin_id, anon_1.system_admin_name AS anon_1_system_admin_name, anon_1.system_admin_email AS anon_1_system_admin_email
FROM (SELECT user_type_mapper_1.id AS user_type_mapper_1_id, user_type_mapper_1.is_admin AS user_type_mapper_1_is_admin, user_type_mapper_1.user_id AS user_type_mapper_1_user_id, system_admin.id AS system_admin_id, system_admin.name AS system_admin_name, system_admin.email AS system_admin_email
FROM user_type_mapper AS user_type_mapper_1 JOIN system_admin ON user_type_mapper_1.user_id = system_admin.id AND user_type_mapper_1.is_admin = 1 UNION SELECT user_type_mapper_2.id AS user_type_mapper_2_id, user_type_mapper_2.is_admin AS user_type_mapper_2_is_admin, user_type_mapper_2.user_id AS user_type_mapper_2_user_id, "user".id AS user_id, "user".name AS user_name, "user".email AS user_email
FROM user_type_mapper AS user_type_mapper_2 JOIN "user" ON user_type_mapper_2.user_id = "user".id AND user_type_mapper_2.is_admin = 0) AS anon_1
While it's theoretically possible to wrap this all up into a python class that looks a bit like standard sqlalchemy orm stuff, I would certainly not do that. working with non-table mappings, especially when they are more than simple joins (this is a union), is lots of work for zero payoff.
I am a relative newcomer to SQLAlchemy and have read the basic docs. I'm currently following Mike Driscoll's MediaLocker tutorial and modifying/extending it for my own purpose.
I have three tables (loans, people, cards). Card to Loan and Person to Loan are both one-to-many relationships and modelled as such:
from sqlalchemy import Table, Column, DateTime, Integer, ForeignKey, Unicode
from sqlalchemy.orm import backref, relation
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine("sqlite:///cardsys.db", echo=True)
DeclarativeBase = declarative_base(engine)
metadata = DeclarativeBase.metadata
class Loan(DeclarativeBase):
"""
Loan model
"""
__tablename__ = "loans"
id = Column(Integer, primary_key=True)
card_id = Column(Unicode, ForeignKey("cards.id"))
person_id = Column(Unicode, ForeignKey("people.id"))
date_issued = Column(DateTime)
date_due = Column(DateTime)
date_returned = Column(DateTime)
issue_reason = Column(Unicode(50))
person = relation("Person", backref="loans", cascade_backrefs=False)
card = relation("Card", backref="loans", cascade_backrefs=False)
class Card(DeclarativeBase):
"""
Card model
"""
__tablename__ = "cards"
id = Column(Unicode(50), primary_key=True)
active = Column(Boolean)
class Person(DeclarativeBase):
"""
Person model
"""
__tablename__ = "people"
id = Column(Unicode(50), primary_key=True)
fname = Column(Unicode(50))
sname = Column(Unicode(50))
When I try to create a new loan (using the below method in my controller) it works fine for unique cards and people, but once I try to add a second loan for a particular person or card it gives me a "non-unique" error. Obviously it's not unique, that's the point, but I thought SQLAlchemy would take care of the behind-the-scenes stuff for me, and add the correct existing person or card id as the FK in the new loan, rather than trying to create new person and card records. Is it up to me to query to the db to check PK uniqueness and handle this manually? I got the impression this should be something SQLAlchemy might be able to handle automatically?
def addLoan(session, data):
loan = Loan()
loan.date_due = data["loan"]["date_due"]
loan.date_issued = data["loan"]["date_issued"]
loan.issue_reason = data["loan"]["issue_reason"]
person = Person()
person.id = data["person"]["id"]
person.fname = data["person"]["fname"]
person.sname = data["person"]["sname"]
loan.person = person
card = Card()
card.id = data["card"]["id"]
loan.card = card
session.add(loan)
session.commit()
In the MediaLocker example new rows are created with an auto-increment PK (even for duplicates, not conforming to normalisation rules). I want to have a normalised database (even in a small project, just for best practise in learning) but can't find any examples online to study.
How can I achieve the above?
It's up to you to retrieve and assign the existing Person or Card object to the relationship before attempting to add a new one with a duplicate primary key. You can do this with a couple of small changes to your code.
def addLoan(session, data):
loan = Loan()
loan.date_due = data["loan"]["date_due"]
loan.date_issued = data["loan"]["date_issued"]
loan.issue_reason = data["loan"]["issue_reason"]
person = session.query(Person).get(data["person"]["id"])
if not person:
person = Person()
person.id = data["person"]["id"]
person.fname = data["person"]["fname"]
person.sname = data["person"]["sname"]
loan.person = person
card = session(Card).query.get(data["card"]["id"])
if not card:
card = Card()
card.id = data["card"]["id"]
loan.card = card
session.add(loan)
session.commit()
There are also some solutions for get_or_create functions, if you want to wrap it into one step.
If you're loading large numbers of records into a new database from scratch, and your query is more complex than a get (the session object is supposed to cache get lookups on its own), you could avoid the queries altogether at the cost of memory by adding each new Person and Card object to a temporary dict by ID, and retrieving the existing objects there instead of hitting the database.