Let's say I have an object in memory from my database that I retrieved using sqlalchemy like this:
user_object = database_session.query(User).first()
I do a few checks on this object that don't change anything, like this (for this example my user_object has name property equal to "John"):
if user_object.name == "John":
do nothing
else:
do something
I then add this object to a list that will get committed to the database at once, with this operation in mind:
objects_list.append(user_object)
database_session.add_all(objects_list)
database_session.commit()
Some of the objects in objects_list will have changes, some will not.
Is this process of committing a number of unchanged objects wasteful, and if so what is a better way to handle this situation?
When you do
user_object = database_session.query(User).first()
the loaded object is already in the session. From there on the session tracks changes to the object through instrumentation. Session.add() and Session.add_all() simply ignore objects that are already in the session, so
objects_list.append(user_object)
database_session.add_all(objects_list)
is at least partly redundant. When you call Session.commit() SQLAlchemy will first flush any pending changes held in the Session to the database and then commits. In other words you're not committing any "unchanged objects".
Related
I am new to Python and was studying FastApi and SQL model.
Reference link: https://sqlmodel.tiangolo.com/tutorial/fastapi/session-with-dependency/#the-with-block
Here, they have something like this
def create_hero(*, session: Session = Depends(get_session), hero: HeroCreate):
db_hero = Hero.from_orm(hero)
session.add(db_hero)
session.commit()
session.refresh(db_hero)
return db_hero
Here I am unable to understand this part
session.add(db_hero)
session.commit()
session.refresh(db_hero)
What is it doing and how is it working?
Couldn't understand this
In fact, you could think that all that block of code inside of the create_hero() function is still inside a with block for the session, because this is more or less what's happening behind the scenes.
But now, the with block is not explicitly in the function, but in the dependency above:
It's an explanation from docs what is a session
In the most general sense, the Session establishes all conversations
with the database and represents a “holding zone” for all the objects
which you’ve loaded or associated with it during its lifespan. It
provides the interface where SELECT and other queries are made that
will return and modify ORM-mapped objects. The ORM objects themselves
are maintained inside the Session, inside a structure called the
identity map - a data structure that maintains unique copies of each
object, where “unique” means “only one object with a particular
primary key”.
So
# This line just simply create a python object
# that sqlalchemy would "understand".
db_hero = Hero.from_orm(hero)
# This line add the object `db_hero` to a “holding zone”
session.add(db_hero)
# This line take all objects from a “holding zone” and put them in a database
# In our case we have only one object in this zone,
# but it is possible to have several
session.commit()
# This line gets created row from the database and put it to the object.
# It means it could have new attributes. For example id,
# that database would set for this new row
session.refresh(db_hero)
I know I can use the Session.new, Session.dirty, Session.deleted attributes to check objects that have been added, modified, or deleted from the Session.
However, after I've added an object o to the Session and committed, o won't show up in any of those Session attributes above, although o is still being tracked by the Session, i.e. subsequent modifications are reflected in Session.dirty.
s.add(o)
s.commit()
# s.new, s.dirty, s.deleted are now all empty (none contain o)
How can I see all objects tracked by the Session?
Individually you can check whether a single object is in the session using in:
if some_obj in session:
do_something()
To see all tracked instances, you can inspect the identity_map, from the docs:
Iterating through Session.identity_map.values() provides access to
the full set of persistent objects (i.e., those that have row
identity) currently in the session.
I have an application in which I query against a SQL database and end up with a SQL Alchemy object representing a given row. Then, based on user input and a series of if/then statements I may perform an update on the SQLA object.
i.e.,
if 'fooinput1' in payload:
sqla_instance.foo1 = validate_foo1(fooinput1)
if 'fooinput2' in payload:
sqla_instance.foo2 = validate_foo2(fooinput2)
...
I now need to add modified_at and modified_by data to this system. Is it possible to check something on the SQLA instance like sqla_instance.was_modified or sqla_instance.update_pending to determine if a modification was performed?
(I recognize that I could maintain my own was_modified boolean, but since there are many of these if/then clauses that would lead to a lot of boilerplate which I'd like to avoid if possible.)
FWIW: this is a python 2.7 pyramids app reading from a MySQL db in the context of a web request.
The Session object SQL Alchemy ORM provides has two attributes that can help with what you are trying to do:
1) Session.is_modified()
2) Session.dirty
Depending on the number of fields which modification you want to track you may achieve what you need using ORM Events:
#sa.event.listens_for(Thing.foo, 'set')
def foo_changed(target, value, oldvalue, initiator):
target.last_modified = datetime.now()
#sa.event.listens_for(Thing.baz, 'set')
def baz_changed(target, value, oldvalue, initiator):
target.last_modified = datetime.now()
The session object provides events, which may help you.
just review once,
-->Transient to pending status for session object
I have a little problem similar to Getting fields history before flush
Here is my code:
user = User.query.filter(User.id == user_id).first()
print(user.first_name)
# Rick
user.first_name = 'Anders'
print(get_history(user, 'first_name'))
# History(added=[u'Anders'], unchanged=(), deleted=[u'Rick'])
db.session.flush()
print(get_history(user, 'first_name'))
# History(added=(), unchanged=[u'Anders'], deleted=())
So, I can easily get the original value before flush. I can use get_history and the 'dirty' attribute of a session. But when I do session.flush() this method shows an incorrect result and the 'dirty' is empty.
However, I can rollback these changes using session.rollback() after session.flush().
That's why I think it's possible to get the original values. They just hide somewhere.
But where?
session.flush() writes out all pending object creations, deletions and modifications to the database as INSERTs, DELETEs, UPDATEs, etc
So when you call session.flush() your objects which you add session(session.add(object)) goes to the database but they arent write permanently until a commit(session.commit()) or a rollback happen.
The answer for your question: your objects are in database with session.flush()
In my Django app very often I need to do something similar to get_or_create(). E.g.,
User submits a tag. Need to see if
that tag already is in the database.
If not, create a new record for it. If
it is, just update the existing
record.
But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.
This must be a very common situation. How do I handle it in a threadsafe way?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
If you are using MySQL, be sure to use the READ COMMITTED isolation
level rather than REPEATABLE READ (the default), otherwise you may see
cases where get_or_create will raise an IntegrityError but the object
won’t appear in a subsequent get() call.
From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
Here's an example of how you could do it:
Define a model with either unique=True:
class MyModel(models.Model):
slug = models.SlugField(max_length=255, unique=True)
name = models.CharField(max_length=255)
MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>})
... or by using unique_togheter:
class MyModel(models.Model):
prefix = models.CharField(max_length=3)
slug = models.SlugField(max_length=255)
name = models.CharField(max_length=255)
class Meta:
unique_together = ("prefix", "slug")
MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>})
Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.
Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.
This must be a very common situation. How do I handle it in a threadsafe way?
Yes.
The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.
If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.
Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.
try transaction.commit_on_success decorator for callable where you are trying get_or_create(**kwargs)
"Use the commit_on_success decorator to use a single transaction for all the work done in a function.If the function returns successfully, then Django will commit all work done within the function at that point. If the function raises an exception, though, Django will roll back the transaction."
apart from it, in concurrent calls to get_or_create, both the threads try to get the object with argument passed to it (except for "defaults" arg which is a dict used during create call in case get() fails to retrieve any object). in case of failure both the threads try to create the object resulting in multiple duplicate objects unless some unique/unique together is implemented at database level with field(s) used in get()'s call.
it is similar to this post
How do I deal with this race condition in django?
So many years have passed, but nobody has written about threading.Lock. If you don't have the opportunity to make migrations for unique together, for legacy reasons, you can use locks or threading.Semaphore objects. Here is the pseudocode:
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
_lock = Lock()
def get_staff(data: dict):
_lock.acquire()
try:
staff, created = MyModel.objects.get_or_create(**data)
return staff
finally:
_lock.release()
with ThreadPoolExecutor(max_workers=50) as pool:
pool.map(get_staff, get_list_of_some_data())