sqlalchemy id equality vs reference equality

sqlalchemy id equality vs reference equality - python

I'm working with SQLAlchemy for the first time and was wondering...generally speaking is it enough to rely on python's default equality semantics when working with SQLAlchemy vs id (primary key) equality?
In other projects I've worked on in the past using ORM technologies like Java's Hibernate, we'd always override .equals() to check for equality of an object's primary key/id, but when I look back I'm not sure this was always necessary.
In most if not all cases I can think of, you only ever had one reference to a given object with a given id. And that object was always the attached object so technically you'd be able to get away with reference equality.
Short question: Should I be overriding eq() and hash() for my business entities when using SQLAlchemy?

Short answer: No, unless you're working with multiple Session objects.
Longer answer, quoting the awesome documentation:
The ORM concept at work here is known as an identity map and ensures that all operations upon a particular row within a Session operate upon the same set of data. Once an object with a particular primary key is present in the Session, all SQL queries on that Session will always return the same Python object for that particular primary key; it also will raise an error if an attempt is made to place a second, already-persisted object with the same primary key within the session.

I had a few situations where my sqlalchemy application would load multiple instances of the same object (multithreading/ different sqlalchemy sessions ...). It was absolutely necessary to override eq() for those objects or I would get various problems. This could be a problem in my application design, but it probably doesn't hurt to override eq() just to be sure.

Related

SQLAlchemy insert multiple objects with relationships

I am trying to create many objects. Currently it is done by calling session.add() then session.flush() upon each object being created. This is bad because there are over 500+ objects being created so each of them performs a query which affects performance. I am trying to optimise this.
To give some background, here is how my object roughly looks like:
class Metric:
data: str
fk_document_id: int (relationship)
fk_user_id: int (realtionship)
Key thing to highlight, there are 2 relationships in this object, implemented using the relationship property that SQLAlchemy offers.
My first go was to change the code to use bulk_save_objects() however this method does not save the relationships and also requires a session.commit() which I'd ideally not do because there is a chance this transaction could get rolled back after the creation of the Metric objects.
Then I found add_all() however this performs an add() for each object anyways, so I do not think it will impact the current performance much.
I can't see any other approach, so my question is if there is a way in this case to use bulk_save_objects() to persist relationships and possibly not require a commit. Alternatively, if there is another method to do this that I do not see, please let me know.

SQLAlchemy MetaData.reflect() vs. automap_base.prepare()

It seems to me that MetaData.reflect() and sqlalchemy.ext.automap.prepare() tables should be able to be used interchangeably for many use cases, but they can't be.
The metadata.tables['mytable'] into conn.execute(select(...)) returns a sqlalchemy.engine.cursor.CursorResult and your iterator gets the columns directly (eg x.columnA).
But automap_base().classes.mytable into the same conn.execute(select(...)) returns a sqlalchemy.engine.result.ChunkedIteratorResult and you need x.mytable.columnA to get at the column.
The sqlalchemy.engine.Result() documention says as much:
New in version 1.4: The Result object provides a completely updated
usage model and calling facade for SQLAlchemy Core and SQLAlchemy ORM.
In Core, it forms the basis of the CursorResult object which replaces
the previous ResultProxy interface. When using the ORM, a higher level
object called ChunkedIteratorResult is normally used.
Can I generically convert one to the other? That is, some wrapper that works for every table without needing the table name?
What's the best futureproof way to do this? I want my code to be forward-looking to sqlalchemy 2.0. Does that mean I should move away from either automap or MetaData?
sqlalchemy 1.4.35

This is the difference between the Core and the ORM.
select() from a Table vs. ORM class
While the SQL generated in these examples looks the same whether we
invoke select(user_table) or select(User), in the more general case
they do not necessarily render the same thing, as an ORM-mapped class
may be mapped to other kinds of “selectables” besides tables. The
select() that’s against an ORM entity also indicates that ORM-mapped
instances should be returned in a result, which is not the case when
SELECTing from a Table object.
Don't hesitate to use the ORM. It's higher level, pythonic, cool, and automap is ORM.

SQLAlchemy get items from the identity map not only by primary key

Is it possible to use a couple of fields not from the primary key to retrieve items (already fetched earlier) from the identity map? For example, I often query a table by (external_id, platform_id) pair, which is a unique key, but not a primary key. And I want to omit unnecessary SQL queries in such cases.

A brief overview of identity_map and get():
An identity map is kept for a lifecycle of a SQLAlchemy's session object i.e. in case of a web-service or a RESTful api the session object's lifecycle is not more than a single request (recommended).
From : http://martinfowler.com/eaaCatalog/identityMap.html
An Identity Map keeps a record of all objects that have been read from
the database in a single business transaction. Whenever you want an object, you check the Identity Map first to see if you already have it.
In SQLAlchemy's ORM there's this special query method get(), it first looks into identity_map using the pk (only allowed argument) and returns object from the identity map, actually executing the SQL query and hitting the database.
From the docs:
get(ident)
Return an instance based on the given primary key identifier, or None
if not found.
get() is special in that it provides direct access to the identity
map of the owning Session. If the given primary key identifier is
present in the local identity map, the object is returned directly
from this collection and no SQL is emitted, unless the object has been
marked fully expired. If not present, a SELECT is performed in order
to locate the object.
Only get() is using the identity_map - official docs:
It’s somewhat used as a cache, in that it implements the identity map
pattern, and stores objects keyed to their primary key. However, it
doesn’t do any kind of query caching. This means, if you say
session.query(Foo).filter_by(name='bar'), even if Foo(name='bar')
is right there, in the identity map, the session has no idea about
that. It has to issue SQL to the database, get the rows back, and
then when it sees the primary key in the row, then it can look in the
local identity map and see that the object is already there. It’s
only when you say query.get({some primary key}) that the Session
doesn’t have to issue a query.
P.S. If you're querying not using pk, you aren't hitting the identity_map in the first place.
Few relevant SO questions, helpful to clear the concept:
Forcing a sqlalchemy ORM get() outside identity map

It's possible to access the whole identity map sequentially:
for obj in session.identity_map.values():
print(obj)
To get an object by arbitrary attributes, you then have to filter for the object type first and then check your attributes.
It's not a lookup in constant time, but can prevent unnecessary queries.
There is the argument, that objects may have been modified by another process and the identity map doesn't hold the current state, but this argument is invalid: If your transaction isolation level is read committed (or less) - and this is often the case, data ALWAYS may have been changed immediately after the query is finished.

Will SQLAlchemy update the content of objects in the middle of a session?

In the SQLAlchemy docs it says this:
"When using a Session, it’s important to note that the objects which are associated with it are proxy objects to the transaction being held by the Session - there are a variety of events that will cause objects to re-access the database in order to keep synchronized. It is possible to “detach” objects from a Session, and to continue using them, though this practice has its caveats. It’s intended that usually, you’d re-associate detached objects with another Session when you want to work with them again, so that they can resume their normal task of representing database state."
[http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html]
If I am in the middle of a session in which I read some objects, do some manipulations and more queries and save some objects, before committing, is there a risk that changes to the dbase by other users will unexpectedly update my objects while I am working with them?
In other words, what are the "variety of events" referred to above?
Is the answer to set the transaction isolation level to maximum? (I am using postureSQL with Flask-SQLAlchemy and Flask-Restful, if any of that matters.)

No, SQLAlchemy does not monitor the database for changes or update your objects whenever it feels like it. I can imagine it would be quite expensive operation. The "variety of events" refers more to SQLAlchemy's internal state. I'm not familiar with all the "events" but for example when objects are marked as expired, SQLAlchemy automatically reloads them from the database. One such case is calling session.commit() and accessing any object's property again.
More here: Documentation about expiring objects

Discovering referers to SQLAlchemy object

I have a lot of model classes with ralations between them with a CRUD interface to edit. The problem is that some objects can't be deleted since there are other objects refering to them. Sometimes I can setup ON DELETE rule to handle this case, but in most cases I don't want automatic deletion of related objects till they are unbound manually. Anyway, I'd like to present editor a list of objects refering to currently viewed one and highlight those that prevent its deletion due to FOREIGN KEY constraint. Is there a ready solution to automatically discover referers?
Update
The task seems to be quite common (e.g. django ORM shows all dependencies), so I wonder that there is no solution to it yet.
There are two directions suggested:
Enumerate all relations of current object and go through their backref. But there is no guarantee that all relations have backref defined. Moreover, there are some cases when backref is meaningless. Although I can define it everywhere I don't like doing this way and it's not reliable.
(Suggested by van and stephan) Check all tables of MetaData object and collect dependencies from their foreign_keys property (the code of sqlalchemy_schemadisplay can be used as example, thanks to stephan's comments). This will allow to catch all dependencies between tables, but what I need is dependencies between model classes. Some foreign keys are defined in intermediate tables and have no models corresponding to them (used as secondary in relations). Sure, I can go farther and find related model (have to find a way to do it yet), but it looks too complicated.
Solution
Below is a method of base model class (designed for declarative extention) that I use as solution. It is not perfect and doesn't meet all my requirements, but it works for current state of my project. The result is collected as dictionary of dictionaries, so I can show them groupped by objects and their properties. I havn't decided yet whether it's good idea, since the list of referers sometimes is huge and I'm forced to limit it to some reasonable number.
def _get_referers(self):
db = object_session(self)
cls, ident = identity_key(instance=self)
medatada = cls.__table__.metadata
result = {}
# _mapped_models is my extension. It is collected by metaclass, so I didn't
# look for other ways to find all model classes.
for other_class in medatada._mapped_models:
queries = {}
for prop in class_mapper(other_class).iterate_properties:
if not (isinstance(prop, PropertyLoader) and \
issubclass(cls, prop.mapper.class_)):
continue
query = db.query(prop.parent)
comp = prop.comparator
if prop.uselist:
query = query.filter(comp.contains(self))
else:
query = query.filter(comp==self)
count = query.count()
if count:
queries[prop] = (count, query)
if queries:
result[other_class] = queries
return result
Thanks to all who helped me, especially stephan and van.

SQL: I have to absolutely disagree with S.Lott' answer.
I am not aware of out-of-the-box solution, but it is definitely possible to discover all the tables that have ForeignKey constraints to a given table. One needs to use properly the INFORMATION_SCHEMA views such as REFERENTIAL_CONSTRAINTS, KEY_COLUMN_USAGE, TABLE_CONSTRAINTS, etc. See SQL Server example. With some limitations and extensions, most versions of new relational databases support INFORMATION_SCHEMA standard. When you have all the FK information and the object (row) in the table, it is a matter of running few SELECT statements to get all other rows in other tables that refer to given row and prevent it from being deleted.
SqlAlchemy: As noted by stephan in his comment, if you use orm with backref for relations, then it should be quite easy for you to get the list of parent objects that keep reference to the object you are trying to delete, because those objects are basically mapped properties of your object (child1.Parent).
If you work with Table objects of sql alchemy (or not always use backref for relations), then you would have to get values of foreign_keys for all the tables, and then for all those ForeignKeys call references(...) method, providing your table as a parameter. In this way you will find all the FKs (and tables) that have reference to the table your object maps to. Then you can query all the objects that keep reference to your object by constructing the query for each of those FKs.

In general, there's no way to "discover" all of the references in a relational database.
In some databases, they may use declarative referential integrity in the form of explicit Foreign Key or Check constraints.
But there's no requirement to do this. It can be incomplete or inconsistent.
Any query can include a FK relationship that is not declared. Without the universe of all queries, you can't know the relationships which are used but not declared.
To find "referers" in general, you must actually know the database design and have all queries.

For each model class, you can easily see if all its one-to-many relations are empty simply by asking for the list in each case and seeing how many entries it contains. (There is probably a more efficient way implemented in terms of COUNT, too.) If there are any foreign keys relating to the object, and you have your object relations set up correctly, then at least one of these lists will be non-zero in length.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.