I am new to ORM with SQLAlchemy and I only used to work with raw SQL. I have database tables, Label, Position, and DataSetlike following:
And the corresponding python classes following:
class Label(Base):
__tablename__ = 'Label'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=true)
class Position(Base):
__tablename__ = 'Position'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=true)
class DataSet(Base):
__tablename__ = 'DataSet'
id = Column(Integer, primary_key=True)
label_id = Column(Integer, ForeignKey('Label.id'))
position_id = Column(Integer, ForeignKey('Position.id'))
timestamp = Column(Integer, nullable=False)
But in my servie, I don't expose those label_id and position_id. So I made a new class Data to hold label and position as string.
# Not a full class to only show my concept
class Data:
# data dictionary will have data
def __init__(self, **kwargs):
# So it doesn't have ids. Label and Position as string
keys = {'label', 'position', 'timestamp'}
self.data = {k: kwargs[k] for k in keys}
# An example of inserting data.
# skipped detail and error handling to clarify
def insert(self):
session = Session()
# get id of label and position
# remember that it returns a tuple, not a single value
self.data['label_id'] = session.query(Label.id).\
filter(Label.name == self.data['label']).one_or_none()
self.data['position_id'] = session.query(Position.id).\
filter(Position.name == self.data['position']).one_or_none()
# add new dataset
self.data.pop('label')
self.data.pop('position')
new_data = DataSet(**self.data)
session.add(new_data)
session.commit()
But it looks somewhat ugly and I think there should be a simpler way to do it. Are there any way to combine these table classes using SQLAlchemy APIs?
You can use relationships and association proxies to make links from DataSet to Label and Position objects:
from sqlalchemy.orm import relationship
from sqlalchemy.ext.associationproxy import association_proxy
class DataSet(Base):
__tablename__ = 'DataSet'
id = Column(Integer, primary_key=True)
label_id = Column(Integer, ForeignKey('Label.id'))
label = relationship('Label')
label_name = association_proxy('label', 'name')
position_id = Column(Integer, ForeignKey('Position.id'))
position = relationship('Position')
position_name = association_proxy('position', 'name')
timestamp = Column(Integer, nullable=False)
After this you can access Label and Position objects linked to DataSet (and their names) through new attributes:
>>> d = session.query(DataSet).first()
>>> d.position
<Position object at 0x7f3021a9ed30>
>>> d.position_name
'position1'
Inserting DataSet objects is not so beautiful unfortunately. You can specify creator function for association_proxy which can get a name and create or retrieve a corresponding object (found in this answer):
def _label_creator(name):
session = Session()
label = session.query(Label).filter_by(name=name).first()
if not label:
label = Label(name=name)
session.add(label)
session.commit()
session.close()
return label
label_name = association_proxy('label', 'name', creator=_label_creator)
After specifying creator functions for both proxies you can create new DataSet objects this way:
dataset = DataSet(
label_name='label1',
position_name='position2',
timestamp=datetime.datetime.now()
)
Related
The model in my source code is in the format below.
Array in dict Array in dict Array in dict...
# data structure
user_list = [{user_name: 'A',
email: 'aaa#aaa.com',
items:[{name:'a_itme1', properties:[{1....},{2....}...]}
]} * 100]
I'm trying to put the above data into a postgresql db with SQLAlchemy.
There is a user table, an entity table, and an attribute table.
And there are tables that link users and items, and items and properties respectively.
for u in user_list:
new_user = User(user_name=u.get('user_name'),....)
session.add(new_user)
session.flush()
for item in u.get('items'):
new_item = Item(name=item.get('name'),.....)
session.add(new_item)
session.flush()
new_item_link = UserItemLink(user_id=new_user.id, item_id=new_item.id,...)
session.add(new_item_link)
session.flush()
for prop in item.properties:
new_properties = Properties(name=prop.get('name'),...)
session.add(new_properties)
session.flush()
new_prop_link = ItemPropLink(item_id=new_item.id, prop_id=new_properties.id,...)
session.add(new_prop_link)
session.flush()
session.commit()
My models look like this:
class User(Base):
__tablename__ = 'user'
id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
name = Column(String(20))
email = Column(String(50))
user_item_link = relationship('UserItemLink', back_populates='user')
class Item(Base):
__tablename__ = 'item'
id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
name = Column(String(50))
note = Column(String(50))
user_item_link = relationship('UserItemLink', back_populates='item')
class Properties(Base):
__tablename__ = 'properties'
id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
name = Column(String(50))
value = Column(String(50))
item_prop_link = relationship('ItemPropLink', back_populates='properties')
class UserItemLink(Base):
__tablename__ = 'user_item_link'
id = Column(Integer, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=2147483647, cycle=False, cache=1), primary_key=True)
user_id = Column(ForeignKey('db.user.id'), nullable=False)
item_id = Column(ForeignKey('db.item.id'), nullable=False)
The above sources have been simplified for better understanding.
When session.add() is performed sequentially with the above information, it takes a lot of time.
When 100 user information is input, there is a delay of 8 seconds or more.
Please advise to improve python speed and sqlalchemy speed.
As you have relationships configured on the models you can compose complex objects using these relationships instead of relying on ids:
with Session() as s, s.begin():
for u in user_list:
user_item_links = []
for item in u.get('items'):
item_prop_links = []
for prop in item['properties']:
item_prop_link = ItemPropLink()
item_prop_link.properties = Properties(name=prop.get('name'), value=prop.get('value'))
item_prop_links.append(item_prop_link)
item = Item(name=item.get('name'), item_prop_link=item_prop_links)
user_item_link = UserItemLink()
user_item_link.item = item
user_item_links.append(user_item_link)
new_user = User(name=u.get('user_name'), email=u.get('email'), user_item_link=user_item_links)
s.add(new_user)
SQLAlchemy will automatically set the foreign keys when the session is flushed at commit time, removing the need to manually flush.
Suppose we have a simple one-to-many relationship between Company and Employee, is there a way to query all companies and have a list of employees in the attribute of each company?
class Company(Base):
__tablename__ = 'company'
id = Column(Integer, primary_key=True)
name = Column(String)
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
first_name = Column(String)
last_name = Column(String)
company_id = Column(Integer, ForeignKey(Company.id))
I'm looking for something like this:
>>> result = db.session.query(Company).join(Employee).all()
>>> result[0].Employee
[<Employee object at 0x...>, <Employee object at 0x...>]
The size of result should be same as the number of rows in company table.
I tried the following and it gives tuple of objects (which makes sense) instead of nice parent / child structure:
>>> db.session.query(Company, Employee).filter(Company.id = Employee.company_id).all()
It's not hard to convert this into my desired object structure but just wanted to see if there's any shortcut.
You have to configure the relationship in the parent class:
class Company(Base):
__tablename__ = 'company'
id = Column(Integer, primary_key=True)
name = Column(String)
employees = relationship('Employee', lazy='joined') # <<< Add this line
Then you can query it without a join:
companies = session.query(Company).all()
print(companies[0].employees)
Documentation:
https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html
You could do something like this:
class Company(Base):
__tablename__ = 'company'
id = Column(Integer, primary_key=True)
name = Column(String)
employees = db.session.query(Company, Employee).filter(Company.id = self.id).all()
self.employee_list = ['{0} {1}'.format(c.first_name, c.last_name) for c in employees]
Then you could access a employee name with Company.employee_list[0]
I have 2 tables with the same column structure.
The script pulls from 2 different json sources with slightly different keys.
My Item class identifies the source and then parses the data.
In my Item class I want to be able to change the __tablename__ based on the data source.
Is this possible or do I need to write a separate class for each data source?
Thanks,
Code:
Base = declarative_base()
class Item(Base):
__tablename__ = 'products'
timestamp = Column(TIMESTAMP)
itemid = Column(String, primary_key=True, index=True, unique=True)
name = Column(String)
def __init__(self, item):
if type(item) == Product_A:
self.__tablename__ = "A_products"
# Parse Data
elif type(item) == Product_B:
self.__tablename__ = "B_products"
# Parse Data
This is not a good idea, in sqlalchemy each class should be mapped to a single table. A solution is to make two classes and a free function to dispatch between them:
Base = declarative_base()
class Item_A(Base):
__tablename__ = 'A_products'
timestamp = Column(TIMESTAMP)
itemid = Column(String, primary_key=True, index=True, unique=True)
name = Column(String)
class Item_B(Base):
__tablename__ = 'B_products'
timestamp = Column(TIMESTAMP)
itemid = Column(String, primary_key=True, index=True, unique=True)
name = Column(String)
def create_item_object(item):
if isinstance(item, Product_A):
result = Item_A()
#... more stuff
elif isinstance(item, Product_B):
result = Item_B()
#... more stuff
return result
How to add objects in the constructor with relationship? The id is not yet ready when constructor is evaluated. In simpler cases it is possible to just provide a list, calculated beforehand. In the example below I tried to say there is a complex_cls_method, in a way it is more like black box.
from sqlalchemy import create_engine, MetaData, Column, Integer, String, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm import sessionmaker
DB_URL = "mysql://user:password#localhost/exampledb?charset=utf8"
engine = create_engine(DB_URL, encoding='utf-8', convert_unicode=True, pool_recycle=3600, pool_size=10)
session = sessionmaker(autocommit=False, autoflush=False, bind=engine)()
Model = declarative_base()
class User(Model):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
simple = Column(String(255))
main_address = Column(String(255))
addresses = relationship("Address",
cascade="all, delete-orphan")
def __init__(self, addresses, simple):
self.simple = simple
self.main_address = addresses[0]
return # because the following does not work
self.addresses = Address.complex_cls_method(
user_id_=self.id, # <-- this does not work of course
key_="address",
value_=addresses
)
class Address(Model):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
keyword = Column(String(255))
value = Column(String(255))
user_id = Column(Integer, ForeignKey('user.id'), nullable=False)
parent_id = Column(Integer, ForeignKey('address.id'), nullable=True)
#classmethod
def complex_cls_method(cls, user_id_, key_, value_):
main = Address(keyword=key_, value="", user_id=user_id_, parent_id=None)
session.add_all([main])
session.flush()
addrs = [Address(keyword=key_, value=item, user_id=user_id_, parent_id=main.id) for item in value_]
session.add_all(addrs)
return [main] + addrs
if __name__ == "__main__":
# Model.metadata.create_all(engine)
user = User([u"address1", u"address2"], "simple")
session.add(user)
session.flush()
# as it can't be done in constructor, these additional statements needed
user.addresses = Address.complex_cls_method(
user_id_=user.id,
key_="address",
value_=[u"address1", u"address2"]
)
session.commit()
The question is, is there syntactically elegant (and technically sound) way to do this with User's constructor, or is it safer to just call a separate method of User class after session.flush to add desired objects to relationships (as in the example code)?
Giving up on constructor altogether is still possible, but less desirable option as resulting signature change would require significant refactorings.
Instead of manually flushing and setting ids etc. you could let SQLAlchemy handle persisting your object graph. You'll just need one more adjacency list relationship in Address and you're all set:
class User(Model):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
simple = Column(String(255))
main_address = Column(String(255))
addresses = relationship("Address",
cascade="all, delete-orphan")
def __init__(self, addresses, simple):
self.simple = simple
self.main_address = addresses[0]
self.addresses = Address.complex_cls_method(
key="address",
values=addresses
)
class Address(Model):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
keyword = Column(String(255))
value = Column(String(255))
user_id = Column(Integer, ForeignKey('user.id'), nullable=False)
parent_id = Column(Integer, ForeignKey('address.id'), nullable=True)
# For handling parent/child relationships in factory method
parent = relationship("Address", remote_side=[id])
#classmethod
def complex_cls_method(cls, key, values):
main = cls(keyword=key, value="")
addrs = [cls(keyword=key, value=item, parent=main) for item in values]
return [main] + addrs
if __name__ == "__main__":
user = User([u"address1", u"address2"], "simple")
session.add(user)
session.commit()
print(user.addresses)
Note the absence of manual flushes etc. SQLAlchemy automatically figures out the required order of insertions based on the object relationships, so that dependencies between rows can be honoured. This is a part of the Unit of Work pattern.
I have two related classes as below:
class IP(Base):
__tablename__ = 'ip'
id = Column(Integer, primary_key=True)
value = Column(String, unique=True)
topics = relationship('Topic')
class Topic(Base):
__tablename__ = 'topic'
id = Column(Integer, primary_key=True)
value = Column(String)
ip_id = Column(Integer, ForeignKey('ip.id'))
ip = relationship('IP')
if __name__ == '__main__':
Base.metadata.create_all(engine)
topics = [
Topic(value='t1', ip=IP(value='239.255.48.1')),
Topic(value='t2', ip=IP(value='239.255.48.1')),
Topic(value='t3', ip=IP(value='239.255.48.1'))
]
session.add_all(topics)
The above doesnt work as it tries to add different ip entries with same value. Is it possible to create or get the existing one so that I can use like below?
topics = [
Topic(value='t1', ip=create_or_get(value='239.255.48.1')),
Topic(value='t2', ip=create_or_get(value='239.255.48.1')),
Topic(value='t3', ip=create_or_get(value='239.255.48.1'))
]
Sure, just create the function:
def create_or_get(value):
obj = session.query(IP).filter(IP.value==value).first()
if not obj:
obj = IP(value=value)
session.add(obj)
return obj
Of course, it needs a session, but if you use scoped_session factory, it is straightforward. Alternatively, you might look into events, but it gets too complicated for the problem to solve.