SQLAlchemy Column to Row Transformation and vice versa -- is it possible? - python

I'm looking for a SQLAlchemy only solution for converting a dict received from a form submission into a series of rows in the database, one for each field submitted. This is to handle preferences and settings that vary widely across applications. But, it's very likely applicable to creating pivot table like functionality. I've seen this type of thing in ETL tools but I was looking for a way to do it directly in the ORM. I couldn't find any documentation on it but maybe I missed something.
Example:
Submitted from form: {"UniqueId":1, "a":23, "b":"Hello", "c":"World"}
I would like it to be transformed (in the ORM) so that it is recorded in the database like this:
_______________________________________
|UniqueId| ItemName | ItemValue |
---------------------------------------
| 1 | a | 23 |
---------------------------------------
| 1 | b | Hello |
---------------------------------------
| 1 | c | World |
---------------------------------------
Upon a select the result would be transformed (in the ORM) back into a row of data from each of the individual values.
---------------------------------------------------
| UniqueId | a | b | c |
---------------------------------------------------
| 1 | 23 | Hello | World |
---------------------------------------------------
I would assume on an update that the best course of action would be to wrap a delete/create in a transaction so the current records would be removed and the new ones inserted.
The definitive list of ItemNames will be maintained in a separate table.
Totally open to more elegant solutions but would like to keep out of the database side if at all possible.
I'm using the declarative_base approach with SQLAlchemy.
Thanks in advance...
Cheers,
Paul

Here is a slightly modified example from documentation to work with such table structure mapped to dictionary in model:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm import relation, sessionmaker
metadata = MetaData()
Base = declarative_base(metadata=metadata, name='Base')
class Item(Base):
__tablename__ = 'Item'
UniqueId = Column(Integer, ForeignKey('ItemSet.UniqueId'),
primary_key=True)
ItemSet = relation('ItemSet')
ItemName = Column(String(10), primary_key=True)
ItemValue = Column(Text) # Use PickleType?
def _create_item(ItemName, ItemValue):
return Item(ItemName=ItemName, ItemValue=ItemValue)
class ItemSet(Base):
__tablename__ = 'ItemSet'
UniqueId = Column(Integer, primary_key=True)
_items = relation(Item,
collection_class=attribute_mapped_collection('ItemName'))
items = association_proxy('_items', 'ItemValue', creator=_create_item)
engine = create_engine('sqlite://', echo=True)
metadata.create_all(engine)
session = sessionmaker(bind=engine)()
data = {"UniqueId": 1, "a": 23, "b": "Hello", "c": "World"}
s = ItemSet(UniqueId=data.pop("UniqueId"))
s.items = data
session.add(s)
session.commit()

Related

How to create index on on SQLAlchemy column_property?

Using SQLAlchemy with an SQLite engine, I've got a self-referential hierarchal table that describes a directory structure.
from sqlalchemy import Column, Integer, String, ForeignKey, Index
from sqlalchemy.orm import column_property, aliased, join
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Dr(Base):
__tablename__ = 'directories'
id = Column(Integer, primary_key=True)
name = Column(String)
parent_id = Column(Integer, ForeignKey('directories.id'))
Each Dr row only knows it's own "name" and its "parent_id". I've added a recursive column_property called "path" that returns a string containing all of a Dr's ancestors from the root Dr.
root_anchor = (
select([Dr.id, Dr.name, Dr.parent_id,Dr.name.label('path')])
.where(Dr.parent_id == None).cte(recursive=True)
)
dir_alias = aliased(Dr)
cte_alias = aliased(root_anchor)
path_table = root_anchor.union_all(
select([
dir_alias.id, dir_alias.name,
dir_alias.parent_id, cte_alias.c.path + "/" + dir_alias.name
]).select_from(join(
dir_alias, cte_alias, onclause=cte_alias.c.id==dir_alias.parent_id)
))
)
Dr.path = column_property(
select([path_table.c.path]).where(path_table.c.id==Dr.id)
)
Here's an example of the output:
"""
-----------------------------
| id | name | parent_id |
-----------------------------
| 1 | root | NULL |
-----------------------------
| 2 | kid | 1 |
-----------------------------
| 3 | grandkid | 2 |
-----------------------------
"""
sqllite_engine = create_engine('sqlite:///:memory:')
Session = sessionmaker(bind=sqllite_engine)
session = Session()
instance = session.query(Dr).filter(Dr.name=='grandkid').one()
print(instance.path)
# Outputs: "root/kid/grandkid"
I'd like to be able to add an index, or a least a unique constraint, on the "path" property so that unique paths cannot exist more than once in the table. I've tried:
Index('pathindex', Directory.path, unique=True)
...with no luck. No error is raised, but SQLAlchemy doesn't seem to register the index, it just silently ignores it. It still allows adding a duplicate path, e.g.:
session.add(Dr(name='grandkid', parent_id=2))
session.commit()
As further evidence that the Index() was ignored, inspecting the "indexes" property of the table results in an empty set:
print(Dr.__table__.indexes)
#Outputs: set([])
It's essential to me that duplicate paths cannot exist in the database. I'm not sure whether what I'm trying to do with column_property is possible in SQLAlchemy, and if not I'd love to hear some suggestions on how else I can go about this.
I think unique index should suffice, in class Db
__table_args__ = (UniqueConstraint('parent_id', 'name'), )

reflecting every schema from postgres DB using SQLAlchemy

I have an existing database that has two schemas, named schools and students, contained in an instance of declarative_base and through two different classes that inherit from that instance
class DirectorioEstablecimiento(Base):
__table_args__ = {'schema': 'schools'}
__tablename__ = 'addresses'
# some Columns are defined here
and
class Matricula(Base):
__table_args__ = {'schema': 'students'}
__tablename__ = 'enrollments'
# some Columns are defined here
I can use the Base instance to as Base.metadata.create_all(bind=engine) to recreate it in a test DB I have in postgres. I can confirm this was done without problems if I query the pg_namespace
In [111]: engine.execute("SELECT * FROM pg_namespace").fetchall()
2017-12-13 18:04:01,006 INFO sqlalchemy.engine.base.Engine SELECT * FROM pg_namespace
2017-12-13 18:04:01,006 INFO sqlalchemy.engine.base.Engine {}
Out[111]:
[('pg_toast', 10, None),
('pg_temp_1', 10, None),
('pg_toast_temp_1', 10, None),
('pg_catalog', 10, '{postgres=UC/postgres,=U/postgres}'),
('public', 10, '{postgres=UC/postgres,=UC/postgres}'),
('information_schema', 10, '{postgres=UC/postgres,=U/postgres}'),
('schools', 16386, None),
('students', 16386, None)]
and from the psql CLI
user# select * from pg_tables;
schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity
--------------------+------------------------------+------------+------------+------------+----------+-------------+-------------
schools | addresses | diego | | t | f | f | f
students | enrollments | diego | | t | f | f | f
pg_catalog | pg_statistic | postgres | | t | f | f | f
pg_catalog | pg_type | postgres | | t | f | f | f
pg_catalog | pg_authid | postgres | pg_global | t | f | f | f
pg_catalog | pg_user_mapping | postgres | | t | f | f | f
-- other tables were omitted
However, if I want to reflect that database in some other instance of declarative_base nothing is reflected.
Something like
In [87]: Base.metadata.tables.keys()
Out[87]: dict_keys(['schools.addresses', 'students.enrollments'])
In [88]: new_base = declarative_base()
In [89]: new_base.metadata.reflect(bind=engine)
In [90]: new_base.metadata.tables.keys()
Out[90]: dict_keys([])
I understand that reflect accepts a schema as a parameter but I would like to obtain all of them at once during reflection. For some reason I can achieve this one at a time.
Is there a way to do this?
When you call metadata.reflect() it will only reflect the default schema (the first in your search_path for which you have permissions). So if your search_path is public,students,school it will only reflect the tables in schema public. If you do not have permissions on schema public, public schema will be skipped and will default to reflect only students.
The default schema is retrieved by SELECT current_schema();
In order to reflect other schemas
you need to call metadata.reflect() for each schema.
metadata.reflect(schema='public') # will reflect even if you do not have permissions on the tables in schema `public`, as long as you have access to pg_* system tables
metadata.reflect(schema='students')
metadata.reflect(schema='schools')
Note: When you reflect with an explicit schema
Reflected tables in metadata.tables will have the keys with the tables fully qualified schema name as in schema1.mytable, schema2.mytable
Any conflicting table names will be replaced with the later one. If you have any tables with the same name, you should implement your the function classname_for_table to prefix the names with the schema name.
An example of prefixing table names with the schema
def classname_for_table(base, tablename, table):
schema_name = table.schema
fqname = '{}.{}'.format(schema_name, tablename)
return fqname
Base.prepare(classname_for_table=classname_for_table)
**As a bonus, here is a small snippet which will expose all tables within a dynamic submodule per schema so you can access it **
create a file ie. db.py and place the following
from types import ModuleType
def register_classes(base, module_dict):
for name, table in base.classes.items():
schema_name, table_name = name.split('.')
class_name = table_name.title().replace('_', '')
if schema_name not in module_dict:
module = module_dict[schema_name] = ModuleType(schema_name)
else:
module = module_dict[schema_name]
setattr(module, class_name, table)
Call this function with the automap base and the __dict__ of the module which you would like to register the schemas with.
register_classes(base, globals())
or
import db
db.register_classes(base, db.__dict__)
and then you will get
import db
db.students.MyTable
db.schools.MyTable

cassandra not set default value for new column added later in python model

I have code like below.
from uuid import uuid4
from uuid import uuid1
from cassandra.cqlengine import columns, connection
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
class BaseModel(Model):
__abstract__ = True
id = columns.UUID(primary_key=True, default=uuid4)
created_timestamp = columns.TimeUUID(primary_key=True,
clustering_order='DESC',
default=uuid1)
deleted = columns.Boolean(required=True, default=False)
class OtherModel(BaseModel):
__table_name__ = 'other_table'
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create()
After first execution, I can see the record in db when run query as.
cqlsh> select * from test.other_table;
id | created_timestamp | deleted
--------------------------------------+--------------------------------------+---------
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False
(1 rows)
After this, I added new column name in OtherModel it and run same program.
class OtherModel(BaseModel):
__table_name__ = 'other_table'
name = columns.Text(required=True, default='')
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create(name='test')
When check db entry
cqlsh> select * from test.other_table;
id | created_timestamp | deleted | name
--------------------------------------+--------------------------------------+---------+------
936cfd6c-44a4-43d3-a3c0-fdd493144f4b | 4d7fd78c-cc74-11e6-bb49-38c986054a88 | False | test
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False | null
(2 rows)
There is one row with name as null.
But I can't query on null value.
cqlsh> select * from test.other_table where name=null;
InvalidRequest: code=2200 [Invalid query] message="Unsupported null value for indexed column name"
I got reference How Can I Search for Records That Have A Null/Empty Field Using CQL?.
When I set default='' in the Model, why it not set for all the null value in table?
Is there any way to set null value in name to default value '' with query?
The null cell is actually it just not being set. And the absence of data isn't something you can query on, since its a filtering operation. Its not scalable or possible to do efficiently, so its not something C* will encourage (or in this case even allow).
Going back and retroactively setting values to all the previously created rows would be very expensive (has to read everything, then do as many writes). Its pretty easy in application side to just say if name is null its '' though.

My Class in SQLAlchemy has a boolean field that always returns True

I'm using MySQL with SQLAlchemy. I have a class defined like so:
Base = sqlalchemy.ext.declarative.declarative_base()
class process(Base):
__tablename__ = 'processes'
process = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True, nullable=False)
get_javascript = sqlalchemy.Column(sqlalchemy.types.Boolean, nullable=False)
With my schema defined like so:
CREATE TABLE processes
(
process Mediumint NOT NULL AUTO_INCREMENT,
get_javascript Varchar(1) NOT NULL,
PRIMARY KEY (process)
) ENGINE = InnoDB
In my database, I have the following rows:
+---------+----------------+
| process | get_javascript |
+---------+----------------+
| 17 | 0 |
| 18 | 1 |
+---------+----------------+
Querying them in Python always gives me true for the get_javascript field.
>>> for i in s.query(db_classes.process).all():
... print i.process, i.get_javascript
...
17 True
18 True
Apparently, SQLAlchemy doesn't like it when you use VarChar(1) for a boolean field. Switched it to BOOLEAN and recreated the database. It worked.

How to avoid adding duplicates in a many-to-many relationship table in SQLAlchemy - python?

I am dealing with a many-to-many relationship with sqlalchemy. My question is how to avoid adding duplicate pair values in a many-to-many relational table.
To make things clearer, I will use the example from the official SQLAlchemy documentation.
Base = declarative_base()
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id'))
)
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
parent_name = Column(String(45))
child_rel = relationship("Child", secondary=Parents2children, backref= "parents_backref")
def __init__(self, parent_name=""):
self.parent_name=parent_name
def __repr__(self):
return "<parents(id:'%i', parent_name:'%s')>" % (self.id, self.parent_name)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
child_name = Column(String(45))
def __init__(self, child_name=""):
self.child_name= child_name
def __repr__(self):
return "<experiments(id:'%i', child_name:'%s')>" % (self.id, self.child_name)
###########################################
def setUp():
global Session
engine=create_engine('mysql://root:root#localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
Session=sessionmaker(bind=engine)
def add_data():
session=Session()
name_father1=Parent(parent_name="Richard")
name_mother1=Parent(parent_name="Kate")
name_daughter1=Child(child_name="Helen")
name_son1=Child(child_name="John")
session.add(name_father1)
session.add(name_mother1)
name_father1.child_rel.append(name_son1)
name_daughter1.parents_backref.append(name_father1)
name_son1.parents_backref.append(name_father1)
session.commit()
session.close()
setUp()
add_data()
session.close()
With this code, the data inserted in the tables is the following:
Parents table:
+----+-------------+
| id | parent_name |
+----+-------------+
| 1 | Richard |
| 2 | Kate |
+----+-------------+
Children table:
+----+------------+
| id | child_name |
+----+------------+
| 1 | Helen |
| 2 | John |
+----+------------+
Parents2children table
+------------+-------------+
| parents_id | children_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 1 |
+------------+-------------+
As you can see, there's a duplicate in the last table... how could I prevent SQLAlchemy from adding these duplicates?
I've tried to put relationship("Child", secondary=..., collection_class=set) but this error is displayed:
AttributeError: 'InstrumentedSet' object has no attribute 'append'
Add a PrimaryKeyConstraint (or a UniqueConstraint) to your relationship table:
Parents2children = Table('parents2children', Base.metadata,
Column('parents_id', Integer, ForeignKey('parents.id')),
Column('children_id', Integer, ForeignKey('children.id')),
PrimaryKeyConstraint('parents_id', 'children_id'),
)
and your code will generate an error when you try to commit the relationship added from both sides. This is very recommended to do.
In order to not even generate an error, just check first:
if not(name_father1 in name_son1.parents_backref):
name_son1.parents_backref.append(name_father1)

Categories

Resources