SQLAlchemy Commit Changes - python

Inside of the declarative base I define a function like this:
def update_me(self):
if self.raw_info==1: self.changed_info=10
else: self.changed_info=20
I know this can be done with hybrid_property but I actually do more complicated manipulations, the above is just for illustrative purposes, this has to be done through a method. How can I commit these changes, from inside the declarative base, without passing it the session object? It seems logical that there would be a way, if I can access the object and change its values without a session object, then it seems like I should be able to save it here somehow. Of course adding this code to the end of the function above fails
self.commit()

It seems to me that you might want to reconsider your design. However, if you are sure you want to go this way, you can use the object_session method:
object_session(self).commit()
Warning: This will commit your whole session (not just one object) and it will only work if self is already attached to a session (by Session.add or having queried for it, for example). Thus, it will be a fragile operation and I would not recommend it unless absolutely necessary.
But you are right, when you say
It seems logical that there would be a way, if I can access the object and change its values without a session object, then it seems like I should be able to save it here somehow.
The object and the session are connected and thus changes will be delivered to the database. We get this connected session with the method above.

Related

SQLAlchemy: What happens when calling list() on a Query object?

So I get that when I call list() on a Query object, that executes the query and returns the results as a list.
What I'm wondering is what exactly is happening in the source code to make the Query fire SQL and grab the results.
When do you list(query), python invokes the method __iter__(), as for any container, for the class Query.
This method initializes context and eventually calls _execute_and_instances(), and internal method of class Query, which gets the connection from the session and executes the query statement.
def _execute_and_instances(self, querycontext):
conn = self._connection_from_session(
mapper=self._mapper_zero_or_none(),
clause=querycontext.statement,
close_with_result=True)
result = conn.execute(querycontext.statement, self._params)
return loading.instances(self, result, querycontext)
So the query is executed just when the list object is created.
It's not much detail, and I hope it is enough to answer your question. Maybe some context on why you are asking would allow to go into more relevant details. Hope it helps anyway.
EDIT: After clarifications via comments, as for your second question (i.e. how to add a traceback of your call as a comment in the query, to be logged in the server): I don't think is feasible just modifying in one single point.
Most queries go through the object Query, and its method _compile_context is a common point where the query is composed. If you launch either a usual query (a select with filters and so on) or a delete, it will go through this method.
However, session.add is completely different. At the moment of adding the object not much is really done (regarding the query I mean; of course the new object is registered). The objects are only inserted when you perform commit (as expected btw). session.flush starts this process. At that point, you have available SQLAlchemy's mapper and you could get the instance of the object you are actually adding (e.g. an instance of your class User). If you have added the traceback at the moment of creating your instance, you could recover it here and add it to the query as a comment.
At any rate, this would not be easy at all, difficult to maintain as well, while it is only for debugging. I understand what you are trying to do but it is too much effort for a result which is not so great (in my opinion).

DB-Connections Class as a Singleton in Python

So there has been a lot of hating on singletons in python. I generally see that having a singleton is usually no good, but what about stuff that has side effects, like using/querying a Database? Why would I make a new instance for every simple query, when I could reuse a present connection already setup again? What would be a pythonic approach/alternative to this?
Thank you!
Normally, you have some kind of object representing the thing that uses a database (e.g., an instance of MyWebServer), and you make the database connection a member of that object.
If you instead have all your logic inside some kind of function, make the connection local to that function. (This isn't too common in many other languages, but in Python, there are often good ways to wrap up multi-stage stateful work in a single generator function.)
If you have all the database stuff spread out all over the place, then just use a global variable instead of a singleton. Yes, globals are bad, but singletons are just as bad, and more complicated. There are a few cases where they're useful, but very rare. (That's not necessarily true for other languages, but it is for Python.) And the way to get rid of the global is to rethink you design. There's a good chance you're effectively using a module as a (singleton) object, and if you think it through, you can probably come up with a good class or function to wrap it up in.
Obviously just moving all of your globals into class attributes and #classmethods is just giving you globals under a different namespace. But moving them into instance attributes and methods is a different story. That gives you an object you can pass around—and, if necessary, an object you can have 2 of (or maybe even 0 under some circumstances), attach a lock to, serialize, etc.
In many types of applications, you're still going to end up with a single instance of something—every Qt GUI app has exactly one MyQApplication, nearly every web server has exactly one MyWebServer, etc. No matter what you call it, that's effectively a singleton or global. And if you want to, you can just move everything into attributes of that god object.
But just because you can do so doesn't mean you should. You've still got function parameters, local variables, globals in each module, other (non-megalithic) classes with their own instance attributes, etc., and you should use whatever is appropriate for each value.
For example, say your MyWebServer creates a new ClientConnection instance for each new client that connects to you. You could make the connections write MyWebServer.instance.db.execute whenever they want to execute a SQL query… but you could also just pass self.db to the ClientConnection constructor, and each connection then just does self.db.execute. So, which one is better? Well, if you do it the latter way, it makes your code a lot easier to extend and refactor. If you want to load-balance across 4 databases, you only need to change code in one place (where the MyWebServer initializes each ClientConnection) instead of 100 (every time the ClientConnection accesses the database). If you want to convert your monolithic web app into a WSGI container, you don't have to change any of the ClientConnection code except maybe the constructor. And so on.
If you're using an object oriented approach, then abamet's suggestion of attaching the database connection parameters as class attributes makes sense to me. The class can then establish a single database connection which all methods of the class refer to as self.db_connection, for example.
If you're not using an object oriented approach, a separate database connection module can provide a functional-style equivalent. Devote a module to establishing a database connection, and simply import that module everywhere you want to use it. Your code can then refer to the connection as db.connection, for example. Since modules are effectively singletons, and the module code is only run on the first import, you will be re-using the same database connection each time.

Is it possible to extend SQLAlchemy to add the ability to lock the database?

I'm beginning to develop a site in Pyramid and, before I commit to using SQLAlchemy, would like to know if it's possible to wrap/extend it to add in 'database lock' functionality.
One quick example as to why I'd like this functionality is for write throttling. My wrapper will be able to detect if a user is flooding the database with writes and, if they are, they'll be prevented from further writes for X amount of time.
I was looking into extending sqlalchemy.org.session.Session and overriding the add method which would perform this throttle check. If the user passes the check, it would simply pass the query off to super(MyWrapper, self).query(*args, **kargs)
This is easy enough to do. However, it only adds the throttle functionality to DBSession.query. If somewhere in my code I use DBSession.execute, the throttle check is bypassed.
Is there a cleaner way to accomplish this?
Detecting excessive network traffic from particular clients is something you might be doing outside the ORM, even outside the Python app, like at the network or database client configuration level.
If within the Python app, definitely not in the ORM. add() doesn't correspond very cleanly to a SQL statement in any case (no SQL is emitted until flush(), and only if the given object was previously pending. add() also cascades to many objects and can result in any number of INSERT statements).
For a simple count on statements, cursor execute events are the best way to go. This gives you a hook at the point of calling execute() on the DBAPI cursor. See before_cursor_execute() at http://www.sqlalchemy.org/docs/core/events.html#connection-events.

Design question in Python: should this be one generic function or two specific ones?

I'm creating a basic database utility class in Python. I'm refactoring an old module into a class. I'm now working on an executeQuery() function, and I'm unsure of whether to keep the old design or change it. Here are the 2 options:
(The old design:) Have one generic executeQuery method that takes the query to execute and a boolean commit parameter that indicates whether to commit (insert, update, delete) or not (select), and determines with an if statement whether to commit or to select and return.
(This is the way I'm used to, but that might be because you can't have a function that sometimes returns something and sometimes doesn't in the languages I've worked with:) Have 2 functions, executeQuery and executeUpdateQuery (or something equivalent). executeQuery will execute a simple query and return a result set, while executeUpdateQuery will make changes to the DB (insert, update, delete) and return nothing.
Is it accepted to use the first way? It seems unclear to me, but maybe it's more Pythonistic...? Python is very flexible, maybe I should take advantage of this feature that can't really be accomplished in this way in more strict languages...
And a second part of this question, unrelated to the main idea - what is the best way to return query results in Python? Using which function to query the database, in what format...?
It's propably just me and my FP fetish, but I think a function executed solely for side effects is very different from a non-destructive function that fetches some data, and therefore have different names. Especially if the generic function would do something different depending on exactly that (the part on the commit parameter seems to imply that).
As for how to return results... I'm a huge fan of generators, but if the library you use for database connections returns a list anyway, you might as well pass this list on - a generator wouldn't buy you anything in this case. But if it allows you to iterate over the results (one at a time), seize the opportunity to save a lot of memory on larger queries.
I don't know how to answer the first part of your question, it seems like a matter of style more than anything else. Maybe you could invoke the Single Responsibility Principle to argue that it should be two separate functions.
When you're going to return a sequence of indeterminate length, it's best to use a Generator.
I'd have two methods, one which updates the database and one which doesn't. Both could delegate to a common private method, if they share a lot of code.
By separating the two methods, it becomes clear to callers what the different semantics are between the two, makes documenting the different methods easier, and clarifies what return types to expect. Since you can pull out shared code into private methods on the object, there's no worry about duplicating code.
As for returning query results, it'll depend on whether you're loading all the results from the database before returning, or returning a cursor object. I'd be tempted to do something like the following:
with db.executeQuery('SELECT * FROM my_table') as results:
for row in results:
print row['col1'], row['col2']
... so the executeQuery method returns a ContextManager object (which cleans up any open connections, if needed), which also acts as a Generator. And the results from the generator act as read-only dicts.

Per-session transactions in Django

I'm making a Django web-app which allows a user to build up a set of changes over a series of GETs/POSTs before committing them to the database (or reverting) with a final POST. I have to keep the updates isolated from any concurrent database users until they are confirmed (this is a configuration front-end), ruling out committing after each POST.
My preferred solution is to use a per-session transaction. This keeps all the problems of remembering what's changed (and how it affects subsequent queries), together with implementing commit/rollback, in the database where it belongs. Deadlock and long-held locks are not an issue, as due to external constraints there can only be one user configuring the system at any one time, and they are well-behaved.
However, I cannot find documentation on setting up Django's ORM to use this sort of transaction model. I have thrown together a minimal monkey-patch (ew!) to solve the problem, but dislike such a fragile solution. Has anyone else done this before? Have I missed some documentation somewhere?
(My version of Django is 1.0.2 Final, and I am using an Oracle database.)
Multiple, concurrent, session-scale transactions will generally lead to deadlocks or worse (worse == livelock, long delays while locks are held by another session.)
This design is not the best policy, which is why Django discourages it.
The better solution is the following.
Design a Memento class that records the user's change. This could be a saved copy of their form input. You may need to record additional information if the state changes are complex. Otherwise, a copy of the form input may be enough.
Accumulate the sequence of Memento objects in their session. Note that each step in the transaction will involve fetches from the data and validation to see if the chain of mementos will still "work". Sometimes they won't work because someone else changed something in this chain of mementos. What now?
When you present the 'ready to commit?' page, you've replayed the sequence of Mementos and are pretty sure they'll work. When the submit "Commit", you have to replay the Mementos one last time, hoping they're still going to work. If they do, great. If they don't, someone changed something, and you're back at step 2: what now?
This seems complex.
Yes, it does. However it does not hold any locks, allowing blistering speed and little opportunity for deadlock. The transaction is confined to the "Commit" view function which actually applies the sequence of Mementos to the database, saves the results, and does a final commit to end the transaction.
The alternative -- holding locks while the user steps out for a quick cup of coffee on step n-1 out of n -- is unworkable.
For more information on Memento, see this.
In case anyone else ever has the exact same problem as me (I hope not), here is my monkeypatch. It's fragile and ugly, and changes private methods, but thankfully it's small. Please don't use it unless you really have to. As mentioned by others, any application using it effectively prevents multiple users doing updates at the same time, on penalty of deadlock. (In my application, there may be many readers, but multiple concurrent updates are deliberately excluded.)
I have a "user" object which persists across a user session, and contains a persistent connection object. When I validate a particular HTTP interaction is part of a session, I also store the user object on django.db.connection, which is thread-local.
def monkeyPatchDjangoDBConnection():
import django.db
def validConnection():
if django.db.connection.connection is None:
django.db.connection.connection = django.db.connection.user.connection
return True
def close():
django.db.connection.connection = None
django.db.connection._valid_connection = validConnection
django.db.connection.close = close
monkeyPatchDBConnection()
def setUserOnThisThread(user):
import django.db
django.db.connection.user = user
This last is called automatically at the start of any method annotated with #login_required, so 99% of my code is insulated from the specifics of this hack.
I came up with something similar to the Memento pattern, but different enough that I think it bears posting. When a user starts an editing session, I duplicate the target object to a temporary object in the database. All subsequent editing operations affect the duplicate. Instead of saving the object state in a memento at each change, I store operation objects. When I apply an operation to an object, it returns the inverse operation, which I store.
Saving operations is much cheaper for me than mementos, since the operations can be described with a few small data items, while the object being edited is much bigger. Also I apply the operations as I go and save the undos, so that the temporary in the db always corresponds to the version in the user's browser. I never have to replay a collection of changes; the temporary is always only one operation away from the next version.
To implement "undo," I pop the last undo object off the stack (as it were--by retrieving the latest operation for the temporary object from the db) apply it to the temporary and return the transformed temporary. I could also push the resultant operation onto a redo stack if I cared to implement redo.
To implement "save changes," i.e. commit, I de-activate and time-stamp the original object and activate the temporary in it's place.
To implement "cancel," i.e. rollback, I do nothing! I could delete the temporary, of course, because there's no way for the user to retrieve it once the editing session is over, but I like to keep the canceled edit sessions so I can run stats on them before clearing them out with a cron job.

Categories

Resources