Using SQLAlchemy with an SQLite engine, I've got a self-referential hierarchal table that describes a directory structure.
from sqlalchemy import Column, Integer, String, ForeignKey, Index
from sqlalchemy.orm import column_property, aliased, join
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Dr(Base):
__tablename__ = 'directories'
id = Column(Integer, primary_key=True)
name = Column(String)
parent_id = Column(Integer, ForeignKey('directories.id'))
Each Dr row only knows it's own "name" and its "parent_id". I've added a recursive column_property called "path" that returns a string containing all of a Dr's ancestors from the root Dr.
root_anchor = (
select([Dr.id, Dr.name, Dr.parent_id,Dr.name.label('path')])
.where(Dr.parent_id == None).cte(recursive=True)
)
dir_alias = aliased(Dr)
cte_alias = aliased(root_anchor)
path_table = root_anchor.union_all(
select([
dir_alias.id, dir_alias.name,
dir_alias.parent_id, cte_alias.c.path + "/" + dir_alias.name
]).select_from(join(
dir_alias, cte_alias, onclause=cte_alias.c.id==dir_alias.parent_id)
))
)
Dr.path = column_property(
select([path_table.c.path]).where(path_table.c.id==Dr.id)
)
Here's an example of the output:
"""
-----------------------------
| id | name | parent_id |
-----------------------------
| 1 | root | NULL |
-----------------------------
| 2 | kid | 1 |
-----------------------------
| 3 | grandkid | 2 |
-----------------------------
"""
sqllite_engine = create_engine('sqlite:///:memory:')
Session = sessionmaker(bind=sqllite_engine)
session = Session()
instance = session.query(Dr).filter(Dr.name=='grandkid').one()
print(instance.path)
# Outputs: "root/kid/grandkid"
I'd like to be able to add an index, or a least a unique constraint, on the "path" property so that unique paths cannot exist more than once in the table. I've tried:
Index('pathindex', Directory.path, unique=True)
...with no luck. No error is raised, but SQLAlchemy doesn't seem to register the index, it just silently ignores it. It still allows adding a duplicate path, e.g.:
session.add(Dr(name='grandkid', parent_id=2))
session.commit()
As further evidence that the Index() was ignored, inspecting the "indexes" property of the table results in an empty set:
print(Dr.__table__.indexes)
#Outputs: set([])
It's essential to me that duplicate paths cannot exist in the database. I'm not sure whether what I'm trying to do with column_property is possible in SQLAlchemy, and if not I'd love to hear some suggestions on how else I can go about this.
I think unique index should suffice, in class Db
__table_args__ = (UniqueConstraint('parent_id', 'name'), )
Related
How can I merge two rows with same value in one column. Lets say I have a model with ~40 columns like below:
class Model(Base):
__tablename__ = "table"
id = Column(Integer, primary_key=True)
value_a = Column(String)
value_b = Column(String)
value_c = Column(String)
...
And I need to process each time ~500k rows of new data. Also each process creates a new table.
Once inserting the data first time(using session.bulk_insert_mappings(Model, data)) there are duplicated value_c values(max 2), but each time either it has value_a with some string and value_b is empty or value_b with some string and value_a is empty.
After initial insert:
| id | value_a | value_b | value_c |
| -- | ------- | ------- | ------- |
| 1 | foo | None | xyz |
| 2 | None | bar | xyz |
Having all rows I need to merge the rows with common value_c values together to get rid of duplicates.
After update:
| id | value_a | value_b | value_c |
| -- | ------- | ------- | ------- |
| 3 | foo | bar | xyz |
What is the most efficient way to do that? I was using from beginning session.merge(row) for each row but it is to slow and I decided to split it into insert and update stages.
You should be able to insert from a select statement that joins the not null a to the not null b. Then after inserted the combined rows you can delete the old rows. This matches the case you outlined exactly you might need to add more conditions to ignore other entries you might not want inserted or not deleted. (ie. (a, b, c) == (None, None, 'value'))
I used aliased so that i can join the same table against itself.
import sys
from sqlalchemy import (
create_engine,
Integer,
String,
)
from sqlalchemy.schema import (
Column,
)
from sqlalchemy.orm import Session, declarative_base, aliased
from sqlalchemy.sql import select, or_, and_, delete, insert
username, password, db = sys.argv[1:4]
Base = declarative_base()
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=True)
metadata = Base.metadata
class Model(Base):
__tablename__ = "table"
id = Column(Integer, primary_key=True)
value_a = Column(String)
value_b = Column(String)
value_c = Column(String)
metadata.create_all(engine)
def print_models(session):
for (model,) in session.execute(select(Model)).all():
print(model.id, model.value_a, model.value_b, model.value_c)
with Session(engine) as session, session.begin():
for (a, b, c) in [('foo', None, 'xyz'), (None, 'bar', 'xyz'), ('leave', 'it', 'asis')]:
session.add(Model(value_a=a, value_b=b, value_c=c))
session.flush()
print_models(session)
with Session(engine) as session, session.begin():
#
# Insert de-nulled entires.
#
left = aliased(Model)
right = aliased(Model)
nulls_joined_q = select(
left.value_a,
right.value_b,
left.value_c
).distinct().select_from(
left
).join(
right,
left.value_c == right.value_c
).where(
and_(
# Ignore entries with no C value.
left.value_c != None,
left.value_b == None,
right.value_a == None))
stmt = insert(
Model.__table__
).from_select([
"value_a",
"value_b",
"value_c"
], nulls_joined_q)
session.execute(stmt)
#
# Remove null entries: All rows where value_c is NOT NULL and either value_a is empty or value b is empty.
#
# #NOTE: This deletes entries where value_a and value_b are BOTH null in the same row as well.
#
stmt = delete(Model.__table__).where(and_(
# Ignore these like we did in insert.
Model.value_c != None,
or_(
Model.value_a == None,
Model.value_b == None),
))
session.execute(stmt)
session.flush()
# Output
print_models(session)
Output
1 foo None xyz
2 None bar xyz
3 leave it asis
#... then
3 leave it asis
4 foo bar xyz
Docs
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select
https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.aliased
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.delete
https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.insert
I'm using SQLAlchemy to query a number of similar tables, and union the results. The tables are rows of customer information, but our current database structures it so that different groups of customers are in their own tables e.g. client_group1, client_group2, client_group3:
client_group1:
| id | name | email |
| 1 | john | johnsmith#gmail.com |
| 2 | greg | gregjones#gmail.com |
Each of the other tables have identical columns. If I'm using SQLAlchemy declarative_base, I can have a class for client_group1 like the following:
def ClientGroup1(Base):
__tablename__ = 'client_group1'
__table_args__ = {u'schema': 'clients'}
id = Column(Integer, primary_key=True)
name = Column(String(32))
email = Column(String(32))
Then I can do queries such as:
session.query(ClientGroup1.name)
However, if I use union_all to combine a bunch of client tables into a viewport, such as:
query1 = session.query(ClientGroup1.name)
query2 = session.query(ClientGroup2.name)
viewport = union_all(query1, query2)
then I'm not sure how to map a viewport to an object, and instead I have to access viewport columns using:
viewport.c.name
Is there any way to map the viewport to a specific table structure? Especially considering the fact that each class points to a different __table_name__
Read Concrete Table Inheritance documentation for the idea how this can be done. The code below is a running example of how this can be done:
from sqlalchemy import create_engine, Column, String, Integer
from sqlalchemy.orm import sessionmaker, configure_mappers
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.declarative import AbstractConcreteBase
engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base(engine)
class ClientGroupBase(AbstractConcreteBase, Base):
pass
class ClientGroup1(ClientGroupBase):
__tablename__ = 'client_group1'
# __table_args__ = {'schema': 'clients'}
__mapper_args__ = {
'polymorphic_identity': 'client_group1',
'concrete': True,
}
id = Column(Integer, primary_key=True)
name = Column(String(32))
email = Column(String(32))
class ClientGroup2(ClientGroupBase):
__tablename__ = 'client_group2'
# __table_args__ = {'schema': 'clients'}
__mapper_args__ = {
'polymorphic_identity': 'client_group2',
'concrete': True,
}
id = Column(Integer, primary_key=True)
name = Column(String(32))
email = Column(String(32))
def _test_model():
# generate classes for all tables
Base.metadata.create_all()
print('-'*80)
# configure mappers (see documentation)
configure_mappers()
print('-'*80)
# add some test data
session.add(ClientGroup1(name="name1"))
session.add(ClientGroup1(name="name1"))
session.add(ClientGroup2(name="name1"))
session.add(ClientGroup2(name="name1"))
session.commit()
print('-'*80)
# perform a query
q = session.query(ClientGroupBase).all()
for r in q:
print(r)
if __name__ == '__main__':
_test_model()
The above example has an added benefit that you can also create new objects, as well as query only some tables.
You could do it mapping an SQL VIEW to a class, but you need to specify a primary key explicitly (see Is possible to mapping view with class using mapper in SqlAlchemy?). In you case, I am afraid, this might not work because of the same PK value in multiple tables, and using a multi-column PK might not be the best idea.
SQLAlchemy is generating, but not enabling, sequences for columns in postgresql. I suspect I may be doing something wrong in engine setup.
Using an example from the SQLAlchemy tutorial (http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html):
#!/usr/bin/env python
from sqlalchemy import create_engine, Column, Integer, String, Sequence
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, Sequence('user_id_seq'), primary_key=True)
name = Column(String(50))
fullname = Column(String(50))
password = Column(String(12))
def __repr__(self):
return "<User(name='%s', fullname='%s', password='%s')>" % (
self.name, self.fullname, self.password)
db_url = 'postgresql://localhost/serial'
engine = create_engine(db_url, echo=True)
Base.metadata.create_all(engine)
With this script, the following table is generated:
serial=# \d+ users
Table "public.users"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
name | character varying(50) | | extended | |
fullname | character varying(50) | | extended | |
password | character varying(12) | | extended | |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
Has OIDs: no
However, a sequence was created:
serial=# select sequence_schema,sequence_name,data_type from information_schema.sequences ;
sequence_schema | sequence_name | data_type
-----------------+---------------+-----------
public | user_id_seq | bigint
SQLAlchemy 0.9.1, Python 2.7.5+, Postgresql 9.3.1, Ubuntu 13.10
-Reece
this is because you provided it with an explicit Sequence. The SERIAL datatype in postgresql generates its own sequence, which SQLAlchemy knows how to locate - so if you omit the Sequence, SQLAlchemy will render SERIAL, assuming the intent is that the column is auto-incrementing (which is determined by the autoincrement argument in conjunction with Integer primary_key; it defaults to True). But when Sequence is passed, SQLAlchemy sees the intent that you don't want the sequence implicitly created by SERIAL but instead the one you are specifying:
from sqlalchemy import create_engine, Column, Integer, String, Sequence
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class T1(Base):
__tablename__ = 't1'
# emits CREATE SEQUENCE + INTEGER
id = Column(Integer, Sequence('user_id_seq'), primary_key=True)
class T2(Base):
__tablename__ = 't2'
# emits SERIAL
id = Column(Integer, primary_key=True)
class T3(Base):
__tablename__ = 't3'
# emits INTEGER
id = Column(Integer, autoincrement=False, primary_key=True)
engine = create_engine("postgresql://scott:tiger#localhost/test", echo=True)
Base.metadata.create_all(engine)
output:
CREATE SEQUENCE user_id_seq
CREATE TABLE t1 (
id INTEGER NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE t2 (
id SERIAL NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE t3 (
id INTEGER NOT NULL,
PRIMARY KEY (id)
)
If you need to create the sequence explicitly for some reason, like setting a start value, and still want the same default value behavior as when using the Column(Integer, primary_key=True) notation, it can be accomplished with the following code:
#!/usr/bin/env python
from sqlalchemy import create_engine, Column, Integer, String, Sequence
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
USER_ID_SEQ = Sequence('user_id_seq') # define sequence explicitly
class User(Base):
__tablename__ = 'users'
# use sequence in column definition, and pass .next_value() as server_default
id = Column(Integer, USER_ID_SEQ, primary_key=True, server_default=USER_ID_SEQ.next_value())
name = Column(String(50))
fullname = Column(String(50))
password = Column(String(12))
def __repr__(self):
return "<User(name='%s', fullname='%s', password='%s')>" % (
self.name, self.fullname, self.password)
db_url = 'postgresql://localhost/serial'
engine = create_engine(db_url, echo=True)
Base.metadata.create_all(engine)
Reece
I also used that tutorial as a model, and just could not get it to work with any Postgres tables that already existed and had key ID columns with serial sequences to generate the new key ID values.
Like David, I found the Sequence had to be defined separately to the class. For anyone using the "db.Model" approach, here's one example.
from flask.ext.sqlalchemy import SQLAlchemy
from sqlalchemy import Sequence
db = SQLAlchemy()
pageimpression_imp_id_seq = Sequence('pageimpression_imp_id_seq')
class PageImpression(db.Model):
__tablename__ = 'pageimpression'
imp_id = db.Column(db.Integer,
pageimpression_imp_id_seq,
server_default=usersession_sessionid_seq.next_value(),primary_key=True)
logdate = db.Column(db.DateTime)
sessionid = db.Column(db.String)
path = db.Column(db.String)
referrer = db.Column(db.String)
def __init__(self, imp_id, logdate, sessionid, path, referrer):
self.imp_id = imp_id
self.logdate = logdate
self.sessionid = sessionid
self.path = path
self.referrer = referrer
def __repr__(self):
return "<PageImpression(imp_id='%s', logdate='%s',sessionid='%s', path='%s', referrer='%s')>" % (self.imp_id, self.logdate, self.sessionid, self.path, self.referrer)
def PageImpressionAdd(sessionid):
sessionid = 0 # dummy value for unit testing
current_time = datetime.now().isoformat()
if CurrentConfig.IMPRESSION_LOGGING_ON == True:
path = request.path
if request.environ.get('HTTP_REFERER') and not request.environ.get('HTTP_REFERER').isspace():
referrer = request.environ.get('HTTP_REFERER') # the string is not-empty
else:
referrer = '' # the string is empty
from website.models import PageImpression
thisPageImpression = PageImpression(None,current_time,sessionid, path, referrer)
db.session.add(thisPageImpression)
db.session.commit()
# get the values created by the Postgres table defaults
imp_id = thisPageImpression.imp_id
logdate = thisPageImpression.logdate
return current_time
You can also change Sequence without any SQL script by GUI pgAdmin as below:
select your DB -> Schemas -> public -> Sequences -> right click -> properties -> Definition -> Current value.
I'm using MySQL with SQLAlchemy. I have a class defined like so:
Base = sqlalchemy.ext.declarative.declarative_base()
class process(Base):
__tablename__ = 'processes'
process = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True, nullable=False)
get_javascript = sqlalchemy.Column(sqlalchemy.types.Boolean, nullable=False)
With my schema defined like so:
CREATE TABLE processes
(
process Mediumint NOT NULL AUTO_INCREMENT,
get_javascript Varchar(1) NOT NULL,
PRIMARY KEY (process)
) ENGINE = InnoDB
In my database, I have the following rows:
+---------+----------------+
| process | get_javascript |
+---------+----------------+
| 17 | 0 |
| 18 | 1 |
+---------+----------------+
Querying them in Python always gives me true for the get_javascript field.
>>> for i in s.query(db_classes.process).all():
... print i.process, i.get_javascript
...
17 True
18 True
Apparently, SQLAlchemy doesn't like it when you use VarChar(1) for a boolean field. Switched it to BOOLEAN and recreated the database. It worked.
I'm looking for a SQLAlchemy only solution for converting a dict received from a form submission into a series of rows in the database, one for each field submitted. This is to handle preferences and settings that vary widely across applications. But, it's very likely applicable to creating pivot table like functionality. I've seen this type of thing in ETL tools but I was looking for a way to do it directly in the ORM. I couldn't find any documentation on it but maybe I missed something.
Example:
Submitted from form: {"UniqueId":1, "a":23, "b":"Hello", "c":"World"}
I would like it to be transformed (in the ORM) so that it is recorded in the database like this:
_______________________________________
|UniqueId| ItemName | ItemValue |
---------------------------------------
| 1 | a | 23 |
---------------------------------------
| 1 | b | Hello |
---------------------------------------
| 1 | c | World |
---------------------------------------
Upon a select the result would be transformed (in the ORM) back into a row of data from each of the individual values.
---------------------------------------------------
| UniqueId | a | b | c |
---------------------------------------------------
| 1 | 23 | Hello | World |
---------------------------------------------------
I would assume on an update that the best course of action would be to wrap a delete/create in a transaction so the current records would be removed and the new ones inserted.
The definitive list of ItemNames will be maintained in a separate table.
Totally open to more elegant solutions but would like to keep out of the database side if at all possible.
I'm using the declarative_base approach with SQLAlchemy.
Thanks in advance...
Cheers,
Paul
Here is a slightly modified example from documentation to work with such table structure mapped to dictionary in model:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm import relation, sessionmaker
metadata = MetaData()
Base = declarative_base(metadata=metadata, name='Base')
class Item(Base):
__tablename__ = 'Item'
UniqueId = Column(Integer, ForeignKey('ItemSet.UniqueId'),
primary_key=True)
ItemSet = relation('ItemSet')
ItemName = Column(String(10), primary_key=True)
ItemValue = Column(Text) # Use PickleType?
def _create_item(ItemName, ItemValue):
return Item(ItemName=ItemName, ItemValue=ItemValue)
class ItemSet(Base):
__tablename__ = 'ItemSet'
UniqueId = Column(Integer, primary_key=True)
_items = relation(Item,
collection_class=attribute_mapped_collection('ItemName'))
items = association_proxy('_items', 'ItemValue', creator=_create_item)
engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
data = {"UniqueId": 1, "a": 23, "b": "Hello", "c": "World"}
s = ItemSet(UniqueId=data.pop("UniqueId"))
s.items = data
session.add(s)
session.commit()