Related
I'm trying to create a db using sqlalchemist to connect with snowflake and alembic to migrations for an app created in FastAPI. I created some models and all works fine to create this one in snowflake for examples:
create or replace TABLE PRICE_SERVICE.FP7.LOCATION (
ID NUMBER(38,0) NOT NULL autoincrement,
CREATED_AT TIMESTAMP_NTZ(9),
UPDATED_AT TIMESTAMP_NTZ(9),
ADDRESS VARCHAR(16777216),
LATITUDE VARCHAR(16777216) NOT NULL,
LONGITUDE VARCHAR(16777216) NOT NULL,
unique (LATITUDE),
unique (LONGITUDE),
primary key (ID)
);
but when I try to create a new obj to this table and I'm getting:
sqlalchemy.orm.exc.FlushError: Instance <Location at 0x7fead79677c0> has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
my model is:
class Location(Base):
id = Column(Integer, primary_key=True)
address = Column(String)
latitude = Column(String, unique=True, nullable=False)
longitude = Column(String, unique=True, nullable=False)
buildings = relationship("Building", back_populates="location")
quotes = relationship("Quote", back_populates="location")
binds = relationship("Bind", back_populates="location")
and I'm trying to do this:
def create_location(db: Session, data: Dict[str, Any]) -> Location:
location = Location(
address=data["address"], # type: ignore
latitude=data["lat"], # type: ignore
longitude=data["lng"], # type: ignore
)
db.add(location)
db.commit()
return location
also I tried using:
id = Column(Integer, Sequence("id_seq"), primary_key=True)
but I got:
sqlalchemy.exc.StatementError: (sqlalchemy.exc.ProgrammingError) (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 7
backend_1 | invalid identifier 'ID_SEQ.NEXTVAL'
You forgot to define the Sequence in your model. When you define the Sequence value on table creation in Snowflake a Sequence is generated at the schema level.
from sqlalchemy import Column, Integer, Sequence
...
class Location(Base):
id = Column(Integer, Sequence("Location_Id"), primary_key=True,
autoincrement=True)
address = Column(String)
...
Make sure your user role has usage permission for that sequence and that should take care of your issue setting the next value for your primary key.
An approach that helps me with table primary keys is defining a mixin class that uses declared_attr to automatically define my primary keys based on the table name.
from sqlalchemy import Column, Integer, Sequence
from slqalchemy.ext.declarative import declared_attr
class SomeMixin(object):
#declared_attr
def record_id(cls):
"""
Use table name to define pk
""""
return Column(
f"{cls.__tablename__} Id",
Integer(),
primary_key=True,
autoincrement=True
)
Then you pass said mixin into your model
from sqlalchemy import Column, Integer, String, Sequence
from wherever import SomeMixin
class Location(Base, SomeMixin):
address = Column(String)
...
Now Location.record_id gets set through the sequence you defined in the mixin.
Hope this helped
I need to create a PostgreSQL Full Text Search index in Python with SQLAlchemy. Here's what I want in SQL:
CREATE TABLE person ( id INTEGER PRIMARY KEY, name TEXT );
CREATE INDEX person_idx ON person USING GIN (to_tsvector('simple', name));
Now how do I do the second part with SQLAlchemy when using the ORM:
class Person(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
You could create index using Index in __table_args__. Also I use a function to create ts_vector to make it more tidy and reusable if more than one field is required. Something like below:
from sqlalchemy.dialects import postgresql
def create_tsvector(*args):
exp = args[0]
for e in args[1:]:
exp += ' ' + e
return func.to_tsvector('english', exp)
class Person(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
__ts_vector__ = create_tsvector(
cast(func.coalesce(name, ''), postgresql.TEXT)
)
__table_args__ = (
Index(
'idx_person_fts',
__ts_vector__,
postgresql_using='gin'
)
)
Update:
A sample query using index (corrected based on comments):
people = Person.query.filter(Person.__ts_vector__.match(expressions, postgresql_regconfig='english')).all()
The answer from #sharez is really useful (especially if you need to concatenate columns in your index). For anyone looking to create a tsvector GIN index on a single column, you can simplify the original answer approach with something like:
from sqlalchemy import Column, Index, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func
Base = declarative_base()
class Example(Base):
__tablename__ = 'examples'
id = Column(Integer, primary_key=True)
textsearch = Column(String)
__table_args__ = (
Index(
'ix_examples_tsv',
func.to_tsvector('english', textsearch),
postgresql_using='gin'
),
)
Note that the comma following Index(...) in __table_args__ is not a style choice, the value of __table_args__ must be a tuple, dictionary, or None.
If you do need to create a tsvector GIN index on multiple columns, here is another way to get there using text().
from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func
Base = declarative_base()
def to_tsvector_ix(*columns):
s = " || ' ' || ".join(columns)
return func.to_tsvector('english', text(s))
class Example(Base):
__tablename__ = 'examples'
id = Column(Integer, primary_key=True)
atext = Column(String)
btext = Column(String)
__table_args__ = (
Index(
'ix_examples_tsv',
to_tsvector_ix('atext', 'btext'),
postgresql_using='gin'
),
)
Thanks for this question and answers.
I'd like to add a bit more in case ppl using alembic to manage versions by
using autogenerate
which creating the index seems not be detected.
We might end up writing our own alter script which look like.
"""add fts idx
Revision ID: e3ce1ce23d7a
Revises: 079c4455d54d
Create Date:
"""
# revision identifiers, used by Alembic.
revision = 'e3ce1ce23d7a'
down_revision = '079c4455d54d'
from alembic import op
import sqlalchemy as sa
def upgrade():
op.create_index('idx_content_fts', 'table_name',
[sa.text("to_tsvector('english', content)")],
postgresql_using='gin')
def downgrade():
op.drop_index('idx_content_fts')
It has been answered already by #sharez and #benvc. I needed to make it work with weights though. This is how I did it based on their answers :
from sqlalchemy import Column, func, Index, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.operators import op
CONFIG = 'english'
Base = declarative_base()
def create_tsvector(*args):
field, weight = args[0]
exp = func.setweight(func.to_tsvector(CONFIG, field), weight)
for field, weight in args[1:]:
exp = op(exp, '||', func.setweight(func.to_tsvector(CONFIG, field), weight))
return exp
class Example(Base):
__tablename__ = 'example'
foo = Column(String)
bar = Column(String)
__ts_vector__ = create_tsvector(
(foo, 'A'),
(bar, 'B')
)
__table_args__ = (
Index('my_index', __ts_vector__, postgresql_using='gin'),
)
Previous answers here were helpful for pointing in the right direction.
Below, a distilled & simplified approach using ORM approach & TSVectorType helper from sqlalchemy-utils (that is quite basic and can be simply copy/pasted to avoid external dependencies if needed https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):
Defining a TSVECTOR column (TSVectorType) in your ORM model (declarative) populated automatically from the source text field(s)
import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html
class MyModel(Base):
__tablename__ = 'mymodel'
id = sa.Column(sa.Integer, primary_key=True)
content = sa.Column(sa.String, nullable=False)
content_tsv = sa.Column(
TSVectorType("content", regconfig="english"),
sa.Computed("to_tsvector('english', \"content\")", persisted=True))
# ^-- equivalent for SQL:
# COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;
__table_args__ = (
# Indexing the TSVector column
sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"),
)
For additional details on querying using ORM, see https://stackoverflow.com/a/73999486/11750716 (there is an important difference between SQLAlchemy 1.4 and SQLAlchemy 2.0).
Per below, I am trying initialize a sqlalchemy Mapped Class from a python dictionary that has extra keys. Is it possible to have the Mapped Class automatically ignore the extra keys instead of throwing an error? Likewise, can the Mapped Class have default values if the keys are not present?
from sqlalchemy import Column, Integer, String
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
And here is the init part:
my_example_user = {'id'=1, 'name'='john', 'extra_key'= 1234}
User(**my_example_user)
Which throws an invalid key error
Thoughts?
SQLAlchemy Mapper objects have an attrs property which is a dictionary of the names of the fields of your mapped class.
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import class_mapper
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
user = {
'name': 'Eihli',
'skill': 11
}
user_mapper = class_mapper(User)
mapped_user = User(**user)
# Boom! TypeError: 'skill' is an invalid keyword argument for User
mapped_user = User(**{
k: v for k, v in user.items()
if k in user_mapper.attrs.keys()
})
# Success!
No need to mess around with maintaining an exclude lists or mucking about with dict or getting in the way of super calls.
If you're trying to generate models with nested data, you'll have to do things a little different. Otherwise you'll get an "Unhashable type 'dict'" error.
Here's an example of a helper to inspect the mapper and get the keys of the relationships.
def from_json(model, data):
mapper = class_mapper(model)
keys = mapper.attrs.keys()
relationships = inspect(mapper).relationships
args = {k: v for k, v in data.items()
if k in keys and k not in relationships}
return model(**args)
In short, define constructor which does not pass arguments up to its superclass:
class User(Base):
# ...
def __init__(self, **entries):
# NOTE: Do not call superclass
# (which is otherwise a default behaviour).
#super(User, self).__init__(**entries)
self.__dict__.update(entries)
I hit the same problem in transition from peewee which requires the opposite - to pass arguments to its superclass (and, therefore, constructor was already defined). So, I just tried commenting the line out and things start to work.
UPDATE
Also, make sure that entries do not contain (and, therefore, overwrite) any meta field in User class defined for SQLAlchemy defined, for example, those ORM relationships. It's kind of obvious (SQLAlchemy), but when mistake is made, it might not be easy to spot the problem.
Are we guaranteed that the __init__ of the superclass which is in place will never have other desired effects than setting the __dict__ entries? I didn't feel quite comfortable bypassing the superclass call completely, so my attempt at solving this was as follows, passing on only the entries which correspond to column names:
class User(Base):
# ...
def __init__(self, **entries):
'''Override to avoid TypeError when passed spurious column names'''
col_names = set([col.name for col in self.__table__.columns])
superentries = {k : entries[k] for k in col_names.intersection(entries.keys())}
super().__init__(**superentries)
Also to pass extra keywords and call Base.__init__() method you can exclude extrakeys from super() and after that do what you want:
from sqlalchemy import Column, Integer, String
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
def __init__(self, **kwargs):
extra_kw_list = ['key1', 'key2']
super(User, self).__init__(**{x: y for x, y in kwargs.items()
if x not in extra_kw_list})
#do something you need here
item1, item2 = kwargs['key1'], kwargs['key2']
If your model has relationships, you can use your model's Mapper object, as #eric-ihli mentioned. Here is another way (note the __init__ method):
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.orm import backref, relationship
from my_app.db_models import Base
class Employee(Base):
__tablename__ = "employee"
id = Column(Integer, primary_key=True, autoincrement=True)
department_id = Column(Integer, ForeignKey("department.id"), index=True)
email = Column(String, unique=True, index=True, nullable=False)
name = Column(String)
department = relationship(
"Department", backref=backref("employees", cascade="all, delete-orphan")
)
def __init__(self, **kwargs):
allowed_args = self.__mapper__.class_manager # returns a dict
kwargs = {k: v for k, v in kwargs.items() if k in allowed_args}
super().__init__(**kwargs)
This way, you can create an employee model like this:
from contextlib import closing
from my_app.db_models import Department, Employee, SessionLocal
with closing(SessionLocal()) as db:
dept = db.query(Department).filter(Department.name == 'HR').first()
employee = Employee(name='John Smith', email='john#smith.com', department=dept)
db.add(employee)
db.commit()
Based on R Yakovlev's answer, you can make the list of the elements dynamic:
from sqlalchemy import Column, Integer, String
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
def __init__(self, **kwargs):
keep_kwargs = {k: v for k, v in kwargs.items() if k in user_columns}
super(User, self).__init__(**keep_kwargs)
user_columns = [_ for _ in User.__dict__.keys() if not _.startswith('_')]
I wanted to try find a way to embed the user_columns in the object, like with a #hybrid_property, yet not have it called every time it's used.
I expect that is possible but exceeded my time limit.
Does SQLAlchemy offer a generic way to get the primary key from a declaratively defined instance, so that if:
Base = declarative_base()
class MyClass(Base):
__tablename__ = 'mytable'
key = Column(Integer, primary_key=True)
I can do:
>>> a = MyClass(key=1)
>>> a.generic_get_primary_key() # <-- does it exist ??
1
You can use inspection for that purpose:
http://docs.sqlalchemy.org/en/latest/core/inspection.html
Passing an instance of a mapped object to inspect, returns an InstanceState, describing that object.
This state also contains the identity:
Base = declarative_base()
class MyClass(Base):
__tablename__ = 'mytable'
key = Column(Integer, primary_key=True)
a = MyClass(key=1)
from sqlalchemy.inspection import inspect
pk = inspect(a).identity
print pk
Will give:
(1,)
Since primary keys can consist of multiple columns, the identity in general is a tuple containing all the column values that are part of the primary key.
In your case, that's simply the key.
If you need to retrieve a list primary keys of a class not an instance the following describes how to do it.
You can use the inspect method. This will return an Inspect instance on which you can do your analysis as follows. In this example I use the Inspect instance to return to me all attribute names of each primary_key of MyClass.
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
Base = declarative_base()
class MyClass(Base):
__tablename__ = "myclass"
key = Column(Integer, primary_key=True)
name = Column(String, primary_key=True)
from sqlalchemy import inspect
ins = inspect(MyClass)
print("Tuple of primary keys: ", ins.primary_key)
# Let's loop over them and print the attribute key name of each
for x in ins.primary_key:
print(x.key)
Returns
> Tuple of primary keys: (Column('key', Integer(), table=<myclass>, primary_key=True, nullable=False), Column('name', String(), table=<myclass>, primary_key=True, nullable=False))
> key
> name
I did it by getting the primary key name and next, with getattr, passing the class instance and the name of primary key like this:enter image description here
from sqlalchemy.inspection import inspect
# I use this class for all my tables in sqlalchemy
class BaseTablesClass():
# First I get the name of the primary key, just the first value of array because
# a table can have more than a primary key
def primary_key_name(self):
return inspect(self).primary_key[0].name
# Next in this line I get the value of primary key by the name that I get with the
# method of primary key name
def primary_key_value(self):
return getattr(self, self.primary_key_name())
Use inspect function:
inspect(obj).identity
This will work include "transient" and "pending":
inspect(obj.__class__).primary_key_from_instance(obj)
I have defined few tables in Pyramid like this:
# coding: utf-8
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Integer, Float, DateTime, ForeignKey, ForeignKeyConstraint, String, Column
from sqlalchemy.orm import scoped_session, sessionmaker, relationship, backref,
from zope.sqlalchemy import ZopeTransactionExtension
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
Base = declarative_base()
class Codes(Base):
__tablename__ = 'Code'
__table_args__ = {u'schema': 'Locations'}
id = Column(Integer, nullable=False)
code_str = Column(String(9), primary_key=True)
name = Column(String(100))
incoming = relationship(u'Voyages', primaryjoin='Voyage.call == Codes.code_str', backref=backref('Code'))
class Locations(Base):
__tablename__ = 'Location'
__table_args__ = {u'schema': 'Locations'}
unit_id = Column(ForeignKey(u'Structure.Definition.unit_id', ondelete=u'RESTRICT', onupdate=u'CASCADE'), primary_key=True, nullable=False)
timestamp = Column(DateTime, primary_key=True, nullable=False)
longitude = Column(Float)
latitude = Column(Float)
class Voyages(Base):
__tablename__ = 'Voyage'
__table_args__ = (ForeignKeyConstraint(['unit_id', 'Voyage_id'], [u'Locations.Voyages.unit_id', u'Locations.Voyages.voyage_id'], ondelete=u'RESTRICT', onupdate=u'CASCADE'), {u'schema': 'Locations'}
)
uid = Column(Integer, primary_key=True)
unit_id = Column(Integer)
voyage_id = Column(Integer)
departure = Column(ForeignKey(u'Locations.Code.code_str', ondelete=u'RESTRICT', onupdate=u'CASCADE'))
call = Column(ForeignKey(u'Locations.Code.code_str', ondelete=u'RESTRICT', onupdate=u'CASCADE'))
departure_date = Column(DateTime)
voyage_departure = relationship(u'Codes', primaryjoin='Voyage.departure == Codes.code_str')
voyage_call = relationship(u'Codes', primaryjoin='Voyage.call == Codes.code_str')
class Definitions(Base):
__tablename__ = 'Definition'
__table_args__ = {u'schema': 'Structure'}
unit_id = Column(Integer, primary_key=True)
name = Column(String(90))
type = Column(ForeignKey(u'Structure.Type.id', ondelete=u'RESTRICT', onupdate=u'CASCADE'))
locations = relationship(u'Locations', backref=backref('Definition'))
dimensions = relationship(u'Dimensions', backref=backref('Definition'))
types = relationship(u'Types', backref=backref('Definition'))
voyages = relationship(u'Voyages', backref=backref('Definition'))
class Dimensions(Base):
__tablename__ = 'Dimension'
__table_args__ = {u'schema': 'Structure'}
unit_id = Column(ForeignKey(u'Structure.Definition.unit_id', ondelete=u'RESTRICT', onupdate=u'CASCADE'), primary_key=True, nullable=False)
length = Column(Float)
class Types(Base):
__tablename__ = 'Type'
__table_args__ = {u'schema': 'Structure'}
id = Column(SmallInteger, primary_key=True)
type_name = Column(String(255))
type_description = Column(String(255))
What I am trying to do here is to find a specific row from Codes table (filter it by code_str) and get all related tables in return, but under the condition that Location table returns only the last row by timestamp, Voyage table must return only the last row by departure, and it must have all information from Definitions table.
I started to create a query from the scratch and came across something like this:
string_to_search = request.matchdict.get('code')
sub_dest = DBSession.query(func.max(Voyage.departure).label('latest_voyage_timestamp'), Voyage.unit_id, Voyage.call.label('destination_call')).\
filter(Voyage.call== string_to_search).\
group_by(Voyage.unit_id, Voyage.call).\
subquery()
query = DBSession.query(Codes, Voyage).\
join(sub_dest, sub_dest.c.destination_call == Codes.code_str).\
outerjoin(Voyage, sub_dest.c.latest_voyage_timestamp == Voyage.departure_date)
but I have notice that when I iterate through my results (like for code, voyage in query) I am actually iterating every Voyage I get in return. In theory it is not a big problem for me but I am trying to construct some json response with basic information from Codes table which would include all possible Voyages (if there is any at all).
For example:
code_data = {}
all_units = []
for code, voyage in query:
if code_data is not {}:
code_data = {
'code_id': code.id,
'code_str': code.code_str,
'code_name': code.name,
}
single_unit = {
'unit_id': voyage.unit_id,
'unit_departure': str(voyage.departure_date) if voyage.departure_date else None,
}
all_units.append(single_unit)
return {
'code_data': exception.message if exception else code_data,
'voyages': exception.message if exception else all_units,
}
Now, this seems a bit wrong because I don't like rewriting this code_data in each loop, so I put if code_data is not {} line here, but I suppose it would be much better (logical) to iterate in a way similar to this:
for code in query:
code_data = {
'code_id': code.id,
'code_str': code.code_str,
'code_name': code.name,
}
for voyage in code.voyages:
single_unit = {
'unit_id': voyage.unit_id,
'unit_departure': str(voyage.departure) if voyage.departure else None,
}
all_units.append(single_unit)
return {
'code_data': exception.message if exception else code_data,
}
So, to get only single Code in return (since I queried the db for that specific Code) which would then have all Voyages related to it as a nested value, and of course, in each Voyage all other information related to Definition of the particular Unit...
Is my approach good at all in the first place, and how could I construct my query in order to iterate it in this second way?
I'm using Python 2.7.6, SQLAlchemy 0.9.7 and Pyramid 1.5.1 with Postgres database.
Thanks!
Try changing the outer query like so:
query = DBSession.query(Codes).options(contains_eager('incoming')).\
join(sub_dest, sub_dest.c.destination_call == Codes.code_str).\
outerjoin(Voyage, sub_dest.c.latest_voyage_timestamp == Voyage.departure_date)
In case of problems, try calling the options(...) part like so:
(...) .options(contains_eager(Codes.incoming)). (...)
This should result in a single Codes instance being returned with Voyages objects accessible via the relationship you've defined (incoming), so you could proceed with:
results = query.all()
for code in results:
print code
# do something with code.incoming
# actually, you should get only one code so if it proves to work, you should
# use query.one() so that in case something else than a single Code is returned,
# an exception is thrown
of course you need an import, e.g.: from sqlalchemy.orm import contains_eager