SQLAlchemy Event interface - python

I'm using SQLAlchemy 0.7. I would like some 'post-processing' to occur after a session.flush(), namely, I need to access the instances involved in the flush() and iterate through them. The flush() call will update the database, but the instances involved also store some data in an LDAP database, I would like SQLAlchemy to trigger an update to that LDAP database by calling an instance method.
I figured I'd be using the after_flush(session, flush_context) event, detailed here, but how do I get a list of update()'d instances?
On a side note, how can I determine which columns have changed (or are 'dirty') on an instance. I've been able to find out if an instance as a whole is dirty, but not individual properties.

According to the link you provided:
Note that the session’s state is still in pre-flush, i.e. ‘new’, ‘dirty’, and ‘deleted’ lists still show pre-flush state as well as the history settings on instance attributes.
This means that you should be able to get an access of all the dirty objects in the session.dirty list. You'll note that the first parameter of the event callback is the current session object.
As for the second part, you can use the sqlalchemy.orm.attributes.get_history function to figure out which columns have been changed. It returns a History object for a given attribute which contains a has_changes() method.
If you're trying to listen for changes on specific class attributes, consider using Attribute Events instead.

Related

How to add values to an object attribute using setattr()?

I want to add a value to an attribute of an object. I would use the setattribute function but as far as I know I can then only set it to a value, but not add anything to the existing value. I can of course first call the getattribute function, calculate the new value and then set the attribute, but I was wondering if there is a more concise way of doing it
Edit:
The reason why I need setattribute/getattribute is because I'm building a webapp and depending on what the user answers, a different attribute needs to be accessed.
I was wondering if there was a better method, because the object I'm referring to is a sqlalchemy table. So I would need two calls to my database and would optimally only need one.
In my webapp there is are radio buttons. I get the value using
answer = request.form['value'] and then depending on the answer, which can either be 1 2 3 4 or 5, I access a attribute.

Does django's `save()` create or update?

The documentation just says
To save an object back to the database, call save()
That does not make it clear. Exprimenting, I found that if I include an id, it updates existing entry, while, if I don't, it creates a new row. Does the documentation specify what happens?
It's fuly documented here:
https://docs.djangoproject.com/en/2.2/ref/models/instances/#how-django-knows-to-update-vs-insert
You may have noticed Django database objects use the same save()
method for creating and changing objects. Django abstracts the need to
use INSERT or UPDATE SQL statements. Specifically, when you call
save(), Django follows this algorithm:
If the object’s primary key attribute is set to a value that evaluates
to True (i.e., a value other than None or the empty string), Django
executes an UPDATE. If the object’s primary key attribute is not set
or if the UPDATE didn’t update anything (e.g. if primary key is set to
a value that doesn’t exist in the database), Django executes an
INSERT. The one gotcha here is that you should be careful not to
specify a primary-key value explicitly when saving new objects, if you
cannot guarantee the primary-key value is unused. For more on this
nuance, see Explicitly specifying auto-primary-key values above and
Forcing an INSERT or UPDATE below.
As a side note: django is OSS so when in doubt you can always read the source code ;-)
Depends on how the Model object was created. If it was queried from the database, UPDATE. If it's a new object and has not been saved before, INSERT.

Forcing a sqlalchemy ORM get() outside identity map

Background
The get() method is special in SQLAlchemy's ORM because it tries to return objects from the identity map before issuing a SQL query to the database (see the documentation).
This is great for performance, but can cause problems for distributed applications because an object may have been modified by another process, so the local process has no ability to know that the object is dirty and will keep retrieving the stale object from the identity map when get() is called.
Question
How can I force get() to ignore the identity map and issue a call to the DB every time?
Example
I have a Company object defined in the ORM.
I have a price_updater() process which updates the stock_price attribute of all the Company objects every second.
I have a buy_and_sell_stock() process which buys and sells stocks occasionally.
Now, inside this process, I may have loaded a microsoft = Company.query.get(123) object.
A few minutes later, I may issue another call for Company.query.get(123). The stock price has changed since then, but my buy_and_sell_stock() process is unaware of the change because it happened in another process.
Thus, the get(123) call returns the stale version of the Company from the session's identity map, which is a problem.
I've done a search on SO(under the [sqlalchemy] tag) and read the SQLAlchemy docs to try to figure out how to do this, but haven't found a way.
Using session.expire(my_instance) will cause the data to be re-selected on access. However, even if you use expire (or expunge), the next data that is fetched will be based on the transaction isolation level. See the PostgreSQL docs on isolations levels (it applies to other databases as well) and the SQLAlchemy docs on setting isolation levels.
You can test if an instance is in the session with in: my_instance in session.
You can use filter instead of get to bypass the cache, but it still has the same isolation level restriction.
Company.query.filter_by(id=123).one()

Does a JsonProperty deserialize only upon access?

In Google App Engine NDB, there is a property type JsonProperty which takes a Python list or dictionary and serializes it automatically.
The structure of my model depends on the answer to this question, so I want to know when exactly an object is deserialized? For example:
# a User model has a property "dictionary" which is of type JsonProperty
# will the following deserialize the dictionary?
object = User.get_by_id(someid)
# or will it not get deserialized until I actually access the dictionary?
val = object.dictionary['value']
ndb.JsonProperty follows the docs and does things the same way you would when defining a custom property: it defines make_value_from_datastore and get_value_for_datastore methods.
The documentation doesn't tell you when these methods get called, because it's up to the db implementation within the app engine to decide when to call these methods.
However, it's pretty likely they're going to get called whenever the model has to access the database. For example, from the documentation for get_value_for_datastore:
A property class can override this to use a different data type for the datastore than for the model instance, or to perform other data conversion just prior to storing the model instance.
If you really need to verify what's going on, you can provide your own subclass of JsonProperty like this:
class LoggingJsonProperty(ndb.JsonProperty):
def make_value_from_datastore(self, value):
with open('~/test.log', 'a') as logfile:
logfile.write('make_value_from_datastore called\n')
return super(LoggingJson, self).make_value_from_datastore(value)
You can log the JSON string, the backtrace, etc. if you want. And obviously you can use a standard logging function instead of sticking things in a separate log. But this should be enough to see what's happening.
Another option, of course, is to read the code, which I believe is in appengine/ext/db/__init__.py.
Since it's not documented, the details could change from one version to the next, so you'll have to either re-run your tests or re-read the code each time you upgrade, if you need to be 100% sure.
The correct answer is that it does indeed load the item lazily, upon access:
https://groups.google.com/forum/?fromgroups=#!topic/appengine-ndb-discuss/GaUSM7y4XhQ

Which one data load method is the best for perfomance?

For example, I have object user stored in database (Redis)
It has several fields:
String nick
String password
String email
List posts
List comments
Set followers
and so on...
In Python programm I have class (User) with same fields for this object. Instances of this class maps to object in database. The question is how to get data from DB for best performance:
Load values for each field on instance creating and initialize fields with it.
Load field value each time on field value requesting.
As second one but after value load replace field property by loaded value.
p.s. redis runs in localhost
The method entirely depends on the requirements.
If there is only one client reading and modifying the properties, this is a rather simple problem. When modifying data, just change the instance attributes in your current Python program and -- at the same time -- keep the DB in sync while keeping your program responsive. To that end, you should outsource blocking calls to another thread or make use of greenlets. If there is only one client, there definitely is no need to fetch a property from the DB on each value lookup.
If there are multiple clients reading the data and only one client modifying the data, you have to think about which level of synchronization you need. If you need 100 % synchronization, you will have to fetch data from the DB on each value lookup.
If there are multiple clients changing the data in the database you better look into a rock-solid industry standard solution rather than writing your own DB cache/mapper.
Your distinction between (2) and (3) does not really make sense. If you fetch data on every lookup, there is no need to 'store' data. You see, if there can be multiple clients involved these things quickly become quite complex and it's really hard to get it right.

Categories

Resources