Generic query in SQLAlchemy

Generic query in SQLAlchemy - python

I have following code:
class ArchaeologicalRecord(Base, ObservableMixin, ConcurrentMixin):
author_id = Column(Integer, ForeignKey('authors.id'))
author = relationship('Author', backref=backref('record'))
horizont_id = Column(Integer, ForeignKey('horizonts.id'))
horizont = relationship('Horizont', backref=backref('record'))
.....
somefield_id = Column(Integer, ForeignKey('somefields.id'))
somefield = relationship('SomeModel', backref=backref('record'))
At the moment I have one of entry (Author or Horizont or any other entry which related to arch.record). And I want to ensure that no one record has reference to this field. But I hate to write a lot of code for each case and want to do it most common way.
So, actually I have:
instance of ArchaeologicalRecord
instance of child entity, for example, Horizont
(from previous) it's class definition.
How to check whether any ArchaeologicalRecord contains (or does not) reference to Horizont (or any other child entity) without writing great chunk of copy-pasted code?

Are you asking how to find orphaned authors, horzonts, somefields etc?
Assuming all your relations are many-to-one (ArchaelogicalRecord-to-Author), you could try something like:
from sqlalchemy.orm.properties import RelationshipProperty
from sqlalchemy.orm import class_mapper
session = ... # However you setup the session
# ArchaelogicalRecord will have various properties defined,
# some of these are RelationshipProperties, which hold the info you want
for rp in class_mapper(ArchaeologicalRecord).iterate_properties:
if not isinstance(rp, RelationshipProperty):
continue
query = session.query(rp.mapper.class_)\
.filter(~getattr(rp.mapper.class_, rp.backref[0]).any())
orphans = query.all()
if orphans:
# Do something...
print rp.mapper.class_
print orphans
This will fail when rp.backref is None (i.e. where you've defined a relationship without a backref) - in this case you'd probably have to construct the query a bit more manually, but the RelationshipProperty, and it's .mapper and .mapper.class_ attributes should get you all the info you need to do this in a generic way.

Related

Performance one-to-many relationship in SQLAlchemy

I'm trying to define a one-to-many relationship with SqlAlchemy where I have Parent has many Child
class Parent(Base):
__tablename__ = "parent"
id = Column(String, primary_key = True)
children = relationship("Child")
class Child(Base):
__tablename__ = "child"
id = Column(Integer, primary_key = True)
feed_type_id = Column(String, ForeignKey("parent.id"))
From business rules, Parent has no much Child (between 10 and 30) and most of the time I will need access to all of them so I think that it's good idea that relationship() retrieve all children in memory in order to increase performance (First question: am I right?) but Few times I need to get a particular child but I won't do something like:
def search_bar_attr(some_value)
for bar in foo.bars:
if(bar.attr == some_value)
return bar
lazy="dynamic" returns a list that allows queries but I think it's slow against "eagerly" loaded because dynamic relationship always queries the database.
Second question: Is there some configuration that covers all my needs?

You can construct the same query that lazy="dynamic" does by using .with_parent.
class Parent(Base):
...
#property
def children_dynamic(self):
return object_session(self).query(Child).with_parent(self, Parent.children)
You can even add a function to reduce boilerplate if you have to write a lot of these:
def dynamicize(rel):
#property
def _getter(self):
return object_session(self).query(rel.parent).with_parent(self, rel)
return _getter
class Parent(Base):
...
children = relationship("Child")
children_dynamic = dynamicize(children)

You don't need to use a function like that one, you don't even need to load all of the child objects in memory.
When you want to search for a child with a certain attribute, you can do:
# get a session object, usually with sessionmaker() configured to bind to your engine instance
c = session.query(Child).filter_by(some_attribute="some value here").all() # returns a list of all child objects that match the filter
# or: to get a child who belongs to a certain parrent with a certain attribute:
# get the parent object (p)
c = session.query(Child).filter_by(feed_type_id=p.id).filter_by(some_attr="some attribute that belongs to children of the p parrent object")

No one strategy will give you everything. However, you can choose a default strategy and then override it.
My recommendation would be to:
Add lazy = "joined" to your relationship so that by default, you will get all the parents.
In cases where you want to query for a set of children dependent on properties of their parents but don't need the parent objects, use the join function on the query and filters referring both to the parent and child
In cases where you need to construct a query similar to what lazy = "dynamic" would do, use the sqlalchemy.orm.defer operator to turn off your lazy = "joined" eager loading and the loading interface( to override eager loading and then use with_parent to construct query. a query like you would have gotten with lazy = "dynamic"

SQLAlchemy; getting list of related tables/classes

Say I have a Thing class that is related to some other classes, Foo and Bar.
class Thing(Base):
FooKey = Column('FooKey', Integer,
ForeignKey('FooTable.FooKey'), primary_key=True)
BarKey = Column('BarKey', Integer, ForeignKey('BarTable.BarKey'), primary_key=True)
foo = db.relationship('Foo')
bar = db.relationship('Bar')
I want to get a list of the classes/tables related to Thing created by my relationships() e.g. [Foo, Bar]. Any way to do this?
This is a closely related question:
SQLAlchemy, Flask: get relationships from a db.Model. That identifies the string names of the relationships, but not the target classes.
Context:
I'm building unit tests for my declarative base mapping of a SQL database. A lot of dev work is going into it and I want robust checks in place.

Using the Mapper as described in that other question gets you on the right path. As mentioned on the doc [0], you will get a bunch of sqlalchemy.orm.relationships.RelationshipProperty, and then you can use class_ on the mapper associated with each RelationshipProperty to get to the class:
from sqlalchemy.inspection import inspect
rels = inspect(Thing).relationships
clss = [rel.mapper.class_ for rel in rels]

SQLAlchemy: How do I get an object from a relationship by object's PK?

Suppose I have a one-to-many relationship like this:
class Book(Base):
__tablename__ = "books"
id = Column(Integer)
...
library_id = Column(Integer, ForeignKey("libraries.id"))
class Library(Base):
__tablename__ = "books"
id = Column(Integer)
...
books = relationship(Book, backref="library")
Now, if I have an ID of a book, is there a way to retrieve it from the Library.books relationship, "get me a book with id=10 in this particular library"? Something like:
try:
the_book = some_library.books.by_primary_key(10)
except SomeException:
print "The book with id 10 is not found in this particular library"
Workarounds I can think of (but which I'd rather avoid using):
book = session.query(Book).get(10)
if book and book.library_id != library_id:
raise SomeException("The book with id 10 is not found in this particular library")
or
book = session.query(Book).filter(Book.id==10).filter(Book.library_id=library.id).one()
Reason: imagine there are several different relationships (scifi_books, books_on_loan etc.) which specify different primaryjoin conditions - manually querying would require writing individual queries for all of them, while SQLAlchemy already knows how to retrieve items for that relationship. Also, I'd prefer to load the books all at once (by accessing library.books) than issuing individual queries.
Another option, which works but is inefficient and inelegant is:
for b in library.books:
if b.id == book_id:
return b
What I'm currently using is:
library_books = {b.id:b for b in library.books}
for data in list_of_dicts_containing_book_id:
if data['id'] in library_books:
library_books[data['id']].do_something(data)
else:
print "Book %s is not in the library" % data['id']
I just hope there's a nicer built-in way of quickly retrieving items from a relationship by their id
UPD: I've asked the question in the sqlalchemy mail list.

SQLAlchemy's query object has with_parent method which does exactly that:
with_parent(instance, property=None)
Add filtering criterion that relates the given instance to a child object or collection, using its attribute state as well as an established relationship() configuration.
so in my example the code would look like
q = session.query(Book)
q = q.with_parent(my_library, "scifi_books")
q = q.filter(Book.id==10).one()
This will issue a separate query though, even if the my_library.scifi_books relation is already loaded. There seems to be no "built-in" way to retrieve an item from an already-loaded relation by its PK, so the easiest is to just convert the relation to a dict and use that to look up items:
book_lookup = {b.id: b for b in my_library.scifi_books}
book = books_lookup[10]

See SQLAlchemy docs on querying with joins. So you want something like this (be aware that this is untested):
query(Book, Library). \
filter(Book.id==10). \
filter(Book.library.id==needed_library_id).all()
If Book -> Library reference would be scalar, you could use has():
query.filter(Library.books.has(id=10))
To make batch queries for multiple books at once, you can use in_() operator:
query(Library).join('books', Book).filter(Book.id.in_([1, 2, 10])).all()

Entity groups, ReferenceProperty or key as a string

After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?

Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class

I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling

I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.

When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff

SQLAlchemy: cascade delete

I must be missing something trivial with SQLAlchemy's cascade options because I cannot get a simple cascade delete to operate correctly -- if a parent element is a deleted, the children persist, with null foreign keys.
I've put a concise test case here:
from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Parent(Base):
__tablename__ = "parent"
id = Column(Integer, primary_key = True)
class Child(Base):
__tablename__ = "child"
id = Column(Integer, primary_key = True)
parentid = Column(Integer, ForeignKey(Parent.id))
parent = relationship(Parent, cascade = "all,delete", backref = "children")
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
parent = Parent()
parent.children.append(Child())
parent.children.append(Child())
parent.children.append(Child())
session.add(parent)
session.commit()
print "Before delete, children = {0}".format(session.query(Child).count())
print "Before delete, parent = {0}".format(session.query(Parent).count())
session.delete(parent)
session.commit()
print "After delete, children = {0}".format(session.query(Child).count())
print "After delete parent = {0}".format(session.query(Parent).count())
session.close()
Output:
Before delete, children = 3
Before delete, parent = 1
After delete, children = 3
After delete parent = 0
There is a simple, one-to-many relationship between Parent and Child. The script creates a parent, adds 3 children, then commits. Next, it deletes the parent, but the children persist. Why? How do I make the children cascade delete?

The problem is that sqlalchemy considers Child as the parent, because that is where you defined your relationship (it doesn't care that you called it "Child" of course).
If you define the relationship on the Parent class instead, it will work:
children = relationship("Child", cascade="all,delete", backref="parent")
(note "Child" as a string: this is allowed when using the declarative style, so that you are able to refer to a class that is not yet defined)
You might want to add delete-orphan as well (delete causes children to be deleted when the parent gets deleted, delete-orphan also deletes any children that were "removed" from the parent, even if the parent is not deleted)
EDIT: just found out: if you really want to define the relationship on the Child class, you can do so, but you will have to define the cascade on the backref (by creating the backref explicitly), like this:
parent = relationship(Parent, backref=backref("children", cascade="all,delete"))
(implying from sqlalchemy.orm import backref)

#Steven's asnwer is good when you are deleting through session.delete() which never happens in my case. I noticed that most of the time I delete through session.query().filter().delete() (which doesn't put elements in the memory and deletes directly from db).
Using this method sqlalchemy's cascade='all, delete' doesn't work. There is a solution though: ON DELETE CASCADE through db (note: not all databases support it).
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey("parents.id", ondelete='CASCADE'))
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key=True)
child = relationship(Child, backref="parent", passive_deletes=True)

Pretty old post, but I just spent an hour or two on this, so I wanted to share my finding, especially since some of the other comments listed aren't quite right.
TL;DR
Give the child table a foreign or modify the existing one, adding ondelete='CASCADE':
parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
And one of the following relationships:
a) This on the parent table:
children = db.relationship('Child', backref='parent', passive_deletes=True)
b) Or this on the child table:
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))
Details
First off, despite what the accepted answer says, the parent/child relationship is not established by using relationship, it's established by using ForeignKey. You can put the relationship on either the parent or child tables and it will work fine. Although, apparently on the child tables, you have to use the backref function in addition to the keyword argument.
Option 1 (preferred)
Second, SqlAlchemy supports two different kinds of cascading. The first, and the one I recommend, is built into your database and usually takes the form of a constraint on the foreign key declaration. In PostgreSQL it looks like this:
CONSTRAINT child_parent_id_fkey FOREIGN KEY (parent_id)
REFERENCES parent_table(id) MATCH SIMPLE
ON DELETE CASCADE
This means that when you delete a record from parent_table, then all the corresponding rows in child_table will be deleted for you by the database. It's fast and reliable and probably your best bet. You set this up in SqlAlchemy through ForeignKey like this (part of the child table definition):
parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))
The ondelete='CASCADE' is the part that creates the ON DELETE CASCADE on the table.
Gotcha!
There's an important caveat here. Notice how I have a relationship specified with passive_deletes=True? If you don't have that, the entire thing will not work. This is because by default when you delete a parent record SqlAlchemy does something really weird. It sets the foreign keys of all child rows to NULL. So if you delete a row from parent_table where id = 5, then it will basically execute
UPDATE child_table SET parent_id = NULL WHERE parent_id = 5
Why you would want this I have no idea. I'd be surprised if many database engines even allowed you to set a valid foreign key to NULL, creating an orphan. Seems like a bad idea, but maybe there's a use case. Anyway, if you let SqlAlchemy do this, you will prevent the database from being able to clean up the children using the ON DELETE CASCADE that you set up. This is because it relies on those foreign keys to know which child rows to delete. Once SqlAlchemy has set them all to NULL, the database can't delete them. Setting the passive_deletes=True prevents SqlAlchemy from NULLing out the foreign keys.
You can read more about passive deletes in the SqlAlchemy docs.
Option 2
The other way you can do it is to let SqlAlchemy do it for you. This is set up using the cascade argument of the relationship. If you have the relationship defined on the parent table, it looks like this:
children = relationship('Child', cascade='all,delete', backref='parent')
If the relationship is on the child, you do it like this:
parent = relationship('Parent', backref=backref('children', cascade='all,delete'))
Again, this is the child so you have to call a method called backref and putting the cascade data in there.
With this in place, when you delete a parent row, SqlAlchemy will actually run delete statements for you to clean up the child rows. This will likely not be as efficient as letting this database handle if for you so I don't recommend it.
Here are the SqlAlchemy docs on the cascading features it supports.

Alex Okrushko answer almost worked best for me. Used ondelete='CASCADE' and passive_deletes=True combined. But I had to do something extra to make it work for sqlite.
Base = declarative_base()
ROOM_TABLE = "roomdata"
FURNITURE_TABLE = "furnituredata"
class DBFurniture(Base):
__tablename__ = FURNITURE_TABLE
id = Column(Integer, primary_key=True)
room_id = Column(Integer, ForeignKey('roomdata.id', ondelete='CASCADE'))
class DBRoom(Base):
__tablename__ = ROOM_TABLE
id = Column(Integer, primary_key=True)
furniture = relationship("DBFurniture", backref="room", passive_deletes=True)
Make sure to add this code to ensure it works for sqlite.
from sqlalchemy import event
from sqlalchemy.engine import Engine
from sqlite3 import Connection as SQLite3Connection
#event.listens_for(Engine, "connect")
def _set_sqlite_pragma(dbapi_connection, connection_record):
if isinstance(dbapi_connection, SQLite3Connection):
cursor = dbapi_connection.cursor()
cursor.execute("PRAGMA foreign_keys=ON;")
cursor.close()
Stolen from here: SQLAlchemy expression language and SQLite's on delete cascade

Steven is correct in that you need to explicitly create the backref, this results in the cascade being applied on the parent (as opposed to it being applied to the child like in the test scenario).
However, defining the relationship on the Child does NOT make sqlalchemy consider Child the parent. It doesn't matter where the relationship is defined (child or parent), its the foreign key that links the two tables that determines which is the parent and which is the child.
It makes sense to stick to one convention though, and based on Steven's response, I'm defining all my child relationships on the parent.

Steven's answer is solid. I'd like to point out an additional implication.
By using relationship, you're making the app layer (Flask) responsible for referential integrity. That means other processes that access the database not through Flask, like a database utility or a person connecting to the database directly, will not experience those constraints and could change your data in a way that breaks the logical data model you worked so hard to design.
Whenever possible, use the ForeignKey approach described by d512 and Alex. The DB engine is very good at truly enforcing constraints (in an unavoidable way), so this is by far the best strategy for maintaining data integrity. The only time you need to rely on an app to handle data integrity is when the database can't handle them, e.g. versions of SQLite that don't support foreign keys.
If you need to create further linkage among entities to enable app behaviors like navigating parent-child object relationships, use backref in conjunction with ForeignKey.

I struggled with the documentation as well, but found that the docstrings themselves tend to be easier than the manual. For example, if you import relationship from sqlalchemy.orm and do help(relationship), it will give you all the options you can specify for cascade. The bullet for delete-orphan says:
if an item of the child's type with no parent is detected, mark it for deletion.
Note that this option prevents a pending item of the child's class from being
persisted without a parent present.
I realize your issue was more with the way the documentation for defining parent-child relationships. But it seemed that you might also be having a problem with the cascade options, because "all" includes "delete". "delete-orphan" is the only option that's not included in "all".

Even tho this question is very old, it comes up first when searched for in Google so I'll post my solution to add up to what others said (I've spent few hours even after reading all the answers in here).
As d512 explained, it is all about Foreign Keys. It was quite a surprise to me but not all databases / engines support Foreign Keys. I'm running a MySQL database. After long investigation, I noticed that when I create new table it defaults to an engine (MyISAM) that doesn't support Foreign Keys. All I had to do was to set it to InnoDB by adding mysql_engine='InnoDB' when defining a Table. In my project I'm using an imperative mapping and it looks like so:
db.Table('child',
Column('id', Integer, primary_key=True),
# other columns
Column('parent_id',
ForeignKey('parent.id', ondelete="CASCADE")),
mysql_engine='InnoDB')

Answer by Stevan is perfect. But if you are still getting the error. Other possible try on top of that would be -
http://vincentaudebert.github.io/python/sql/2015/10/09/cascade-delete-sqlalchemy/
Copied from the link-
Quick tip if you get in trouble with a foreign key dependency even if you have specified a cascade delete in your models.
Using SQLAlchemy, to specify a cascade delete you should have cascade='all, delete' on your parent table. Ok but then when you execute something like:
session.query(models.yourmodule.YourParentTable).filter(conditions).delete()
It actually triggers an error about a foreign key used in your children tables.
The solution I used it to query the object and then delete it:
session = models.DBSession()
your_db_object = session.query(models.yourmodule.YourParentTable).filter(conditions).first()
if your_db_object is not None:
session.delete(your_db_object)
This should delete your parent record AND all the children associated with it.

TLDR: If the above solutions don't work, try adding nullable=False to your column.
I'd like to add a small point here for some people who may not get the cascade function to work with the existing solutions (which are great). The main difference between my work and the example was that I used automap. I do not know exactly how that might interfere with the setup of cascades, but I want to note that I used it. I am also working with a SQLite database.
I tried every solution described here, but rows in my child table continued to have their foreign key set to null when the parent row was deleted. I'd tried all the solutions here to no avail. However, the cascade worked once I set the child column with the foreign key to nullable = False.
On the child table, I added:
Column('parent_id', Integer(), ForeignKey('parent.id', ondelete="CASCADE"), nullable=False)
Child.parent = relationship("parent", backref=backref("children", passive_deletes=True)
With this setup, the cascade functioned as expected.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generic query in SQLAlchemy - python

Related

Performance one-to-many relationship in SQLAlchemy

SQLAlchemy; getting list of related tables/classes

SQLAlchemy: How do I get an object from a relationship by object's PK?

Entity groups, ReferenceProperty or key as a string

SQLAlchemy: cascade delete

Categories

Resources