SQLAlchemy: What happens when calling list() on a Query object?

SQLAlchemy: What happens when calling list() on a Query object? - python

So I get that when I call list() on a Query object, that executes the query and returns the results as a list.
What I'm wondering is what exactly is happening in the source code to make the Query fire SQL and grab the results.

When do you list(query), python invokes the method __iter__(), as for any container, for the class Query.
This method initializes context and eventually calls _execute_and_instances(), and internal method of class Query, which gets the connection from the session and executes the query statement.
def _execute_and_instances(self, querycontext):
conn = self._connection_from_session(
mapper=self._mapper_zero_or_none(),
clause=querycontext.statement,
close_with_result=True)
result = conn.execute(querycontext.statement, self._params)
return loading.instances(self, result, querycontext)
So the query is executed just when the list object is created.
It's not much detail, and I hope it is enough to answer your question. Maybe some context on why you are asking would allow to go into more relevant details. Hope it helps anyway.
EDIT: After clarifications via comments, as for your second question (i.e. how to add a traceback of your call as a comment in the query, to be logged in the server): I don't think is feasible just modifying in one single point.
Most queries go through the object Query, and its method _compile_context is a common point where the query is composed. If you launch either a usual query (a select with filters and so on) or a delete, it will go through this method.
However, session.add is completely different. At the moment of adding the object not much is really done (regarding the query I mean; of course the new object is registered). The objects are only inserted when you perform commit (as expected btw). session.flush starts this process. At that point, you have available SQLAlchemy's mapper and you could get the instance of the object you are actually adding (e.g. an instance of your class User). If you have added the traceback at the moment of creating your instance, you could recover it here and add it to the query as a comment.
At any rate, this would not be easy at all, difficult to maintain as well, while it is only for debugging. I understand what you are trying to do but it is too much effort for a result which is not so great (in my opinion).

Related

Pyramid Traversal: Changing getitem behavior if a resource isn't the last in the chain

I have a question regarding URL traversal in the Pyramid python web framework.
Imagine the following endpoints for a forum:
/forum/1 - Returns information about Forum #1
/forum/1/threads/1 - Returns Thread #1 in Forum #1
Here's how the traversal would work for the first URL:
A Root resource is created
Root.__getitem__("forum") is called. This returns a ForumDispatch resource.
ForumDispatch.__getitem__("1") is called. The database is queried for a Forum with ID 1. If it is not found, a KeyError is raised. If it is found, a Forum object is returned, and view lookup begins with Forum as a context.
Here's how the traversal would work for the second URL
A Root resource is created
Root.__getitem__("forum") is called. This returns a ForumDispatch resource.
ForumDispatch.__getitem__("1") is called. The database is queried for a Forum with ID 1. If it is not found, a KeyError is raised. If it is found, a Forum object is returned.
Forum.__getitem__("threads") is called. A ThreadsDispatch object is returned
ThreadsDispatch.__getitem__("1") is called. The database is queried for a Thread #1 in Forum #1. If it is found, a Thread object is returned and view lookup begins, or a KeyError is raised.
Now, for the first URL, a single query is issued. It would look like SELECT ... FROM forums WHERE forums.id = 1;. In the second URL, two queries are issued. SELECT ... FROM forums WHERE forums.id = 1;, and SELECT ... FROM threads WHERE thread.id = 1 AND forum.id = 1;.
I don't want two queries to be issued. For the second URL, the query SELECT ... FROM forums LEFT JOIN threads ON threads.forum_id = forums.id WHERE threads.id = 1 AND forums.id = 1; would return all the information needed. Then, I could return a KeyError if no rows are returned, or if a Forum is returned but not a Thread.
In order to accomplish this, the ForumDispatch.__getitem__ needs to behave differently (e.g. change the query, or don't query at all) if it knows that "threads" is also going to be called next.
Is there any way to accomplish this?
I could, instead of returning actual database objects, create "dummy" resources to be returned by ForumDispatch.__getitem__ and the like, and then have the view perform the necessary query. But, I feel like I'm losing out on some of the traversal functionality by making the view worry about querying/raising 404 errors. Thoughts?

Your problem is a book-case example of premature optimization :)
Fetching a single row from a database via its primary key is the fastest possible operation your database can do. I would expect it to take about 1 millisecond or less.
A query with a join is somewhat more complex, involves accessing two tables and an index and performing the actual join. Most surely it'll take a bit longer - let's say, it'll be just 50% slower than a single row fetch, about 1.5 ms. Depending on the number of rows, it may actually take more than that, because joins are not totally free.
So, the total time to make two simple queries would be ~2ms versus ~1.5ms for a join query. So you're looking at ~0.5 ms difference at best. Or none. Or maybe a bit of a slow-down, you never know. Anyway, if you put it in the context of a web-application, any savings will be totally negligible compared to the network latency, HTTP round-trips, browser page reflows etc. You would get much more bang for your buck spending time optimizing the areas where you can achieve some measurable benefit :)
Surely, when you're finding that your page is making tens or hundreds of queries (often when displaying a listing) it's time to spend time configuring eager-loading in SQLAlchemy. Replacing two simple queries with one more complex query is only going to complicate things without bringing any measurable benefits.

SQLAlchemy Commit Changes

Inside of the declarative base I define a function like this:
def update_me(self):
if self.raw_info==1: self.changed_info=10
else: self.changed_info=20
I know this can be done with hybrid_property but I actually do more complicated manipulations, the above is just for illustrative purposes, this has to be done through a method. How can I commit these changes, from inside the declarative base, without passing it the session object? It seems logical that there would be a way, if I can access the object and change its values without a session object, then it seems like I should be able to save it here somehow. Of course adding this code to the end of the function above fails
self.commit()

It seems to me that you might want to reconsider your design. However, if you are sure you want to go this way, you can use the object_session method:
object_session(self).commit()
Warning: This will commit your whole session (not just one object) and it will only work if self is already attached to a session (by Session.add or having queried for it, for example). Thus, it will be a fragile operation and I would not recommend it unless absolutely necessary.
But you are right, when you say
It seems logical that there would be a way, if I can access the object and change its values without a session object, then it seems like I should be able to save it here somehow.
The object and the session are connected and thus changes will be delivered to the database. We get this connected session with the method above.

Is it possible to read multiple result sets using a ResultProxy object in sqlalchemy?

I'm trying to call a stored procedure that returns multiple result sets using SQLAlchemy. If it matters, underneath I'm using PyODBC and FreeTDS. I call the execute() method using a raw query with "exec" calling my stored procedure on a session object and get a ResultProxy object back.
With a raw pyodbc cursor, I can call the nextset() function to advance to the next result set. I see no way to do the same using the ResultProxy I get back from SQLAlchemy. Indeed, the docs say:
The DBAPI cursor will be closed by the ResultProxy when all of its
result rows (if any) are exhausted.
Is there a way to read multiple result sets with SQLAlchemy, or will I have to perform this query with the raw DBAPI?

support for nextset() is ticket 1635. It's two years old. It contains a partial patch which needs updating, in particular to work along with an execution option that passes along a hint that the statement will be returning multiple result sets, so that the resultproxy's existing autoclose behavior can remain the default. the feature would also need a lot of tests.
There's no major technical hurdle to this feature but there's generally very little interest in this use case. So at the moment you need to stick with the raw cursor, until people express enough interest in this feature to put momentum behind it again.

Have you read http://docs.sqlalchemy.org/en/rel_0_7/core/connections.html?highlight=resultproxy#basic-usage ? You can iterate over ResultProxy objects, close them, etc.

What is a generative method?

I'm familiar with Python generators, however I've just come across the term "generative method" which I am not familiar with and cannot find a satisfactory definition.
To put it in context, I found the term in SQLAlchemy's narrative documentation:
Full control of the “autocommit” behavior is available using the generative Connection.execution_options() method provided on Connection, Engine, Executable, using the “autocommit” flag which will turn on or off the autocommit for the selected scope.
What is a generative method? Trying to iterate the object returned by Connection.execution_options() doesn't work so I'm figuring it's something other than a standard generator.

It doesn't appear to be a common database concept, but SQLAlchemy uses the term generative in the sense "generated by your program iteratively at runtime". (So, no connection to python generators). An example from the tutorial:
The Query object is fully generative, meaning that most method calls
return a new Query object upon which further criteria may be added.
For example, to query for users named “ed” with a full name of “Ed
Jones”, you can call filter() twice, which joins criteria using AND:
>>> for user in session.query(User).\
... filter(User.name=='ed').\
... filter(User.fullname=='Ed Jones'):
... print user
This call syntax is more commonly known as "method chaining", and the design that allows it as a "fluent interface".
So, in the case of Connection.execution_options(), "generative" means that it returns the modified connection object, so that you can chain the calls as above.

Looking at the source code of Connection.execution_options (lib/sqlalchemy/engine/base.py), all that method does is add options to the connection.
The idea is that those options influence the future behaviour of e.g. queries.
As an example:
result = connection.execution_options(stream_results=True).\
execute(stmt)
Here, the behaviour was changed in the middle of the connection for just this query.
In a way, it "generates" or clones itself as an object that has a slightly different behaviour.
Here you can also set autocommit to True. Example
# obtain a connection
connection = ...
# do some stuff
# for the next section we want autocommit on
autocommitting_connection = connection.execution_options(autocommit=True)
autocommitting_connection.execute(some_insert)
result = autocommitting_connection.execute(some_query)
# done with this section. Continue using connection (no autocommit)
This is what is meant with that section of the documentation. "generative method" refers to a method that returns a modified copy of the same instance that you can continue working with. This is applicable to the classes Connection, Engine, Executable.

You would have to consult the specific documentation or source code of that project to really make sure, but I would guess that it returns a modified version of some object adapted to the requirements/behaviour defined by the arguments.
The documentation states:
The method returns a copy of this Connection which references the same
underlying DBAPI connection, but also defines the given execution
options which will take effect for a call to execute().

As #zzzeek comments above, this is now documented in the SQLAlchemy glossary.
generative means:
A term that SQLAlchemy uses to refer what’s normally known as method chaining; see that term for details.
And method chaining is:
An object-oriented technique whereby the state of an object is constructed by calling methods on the object. The object features any number of methods, each of which return a new object (or in some cases the same object) with additional state added to the object.

Design question in Python: should this be one generic function or two specific ones?

I'm creating a basic database utility class in Python. I'm refactoring an old module into a class. I'm now working on an executeQuery() function, and I'm unsure of whether to keep the old design or change it. Here are the 2 options:
(The old design:) Have one generic executeQuery method that takes the query to execute and a boolean commit parameter that indicates whether to commit (insert, update, delete) or not (select), and determines with an if statement whether to commit or to select and return.
(This is the way I'm used to, but that might be because you can't have a function that sometimes returns something and sometimes doesn't in the languages I've worked with:) Have 2 functions, executeQuery and executeUpdateQuery (or something equivalent). executeQuery will execute a simple query and return a result set, while executeUpdateQuery will make changes to the DB (insert, update, delete) and return nothing.
Is it accepted to use the first way? It seems unclear to me, but maybe it's more Pythonistic...? Python is very flexible, maybe I should take advantage of this feature that can't really be accomplished in this way in more strict languages...
And a second part of this question, unrelated to the main idea - what is the best way to return query results in Python? Using which function to query the database, in what format...?

It's propably just me and my FP fetish, but I think a function executed solely for side effects is very different from a non-destructive function that fetches some data, and therefore have different names. Especially if the generic function would do something different depending on exactly that (the part on the commit parameter seems to imply that).
As for how to return results... I'm a huge fan of generators, but if the library you use for database connections returns a list anyway, you might as well pass this list on - a generator wouldn't buy you anything in this case. But if it allows you to iterate over the results (one at a time), seize the opportunity to save a lot of memory on larger queries.

I don't know how to answer the first part of your question, it seems like a matter of style more than anything else. Maybe you could invoke the Single Responsibility Principle to argue that it should be two separate functions.
When you're going to return a sequence of indeterminate length, it's best to use a Generator.

I'd have two methods, one which updates the database and one which doesn't. Both could delegate to a common private method, if they share a lot of code.
By separating the two methods, it becomes clear to callers what the different semantics are between the two, makes documenting the different methods easier, and clarifies what return types to expect. Since you can pull out shared code into private methods on the object, there's no worry about duplicating code.
As for returning query results, it'll depend on whether you're loading all the results from the database before returning, or returning a cursor object. I'd be tempted to do something like the following:
with db.executeQuery('SELECT * FROM my_table') as results:
for row in results:
print row['col1'], row['col2']
... so the executeQuery method returns a ContextManager object (which cleans up any open connections, if needed), which also acts as a Generator. And the results from the generator act as read-only dicts.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy: What happens when calling list() on a Query object? - python

So I get that when I call list() on a Query object, that executes the query and returns the results as a list. What I'm wondering is what exactly is happening in the source code to make the Query fire SQL and grab the results.

Related

Pyramid Traversal: Changing getitem behavior if a resource isn't the last in the chain

SQLAlchemy Commit Changes

Is it possible to read multiple result sets using a ResultProxy object in sqlalchemy?

What is a generative method?

Design question in Python: should this be one generic function or two specific ones?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy: What happens when calling list() on a Query object? - python

So I get that when I call list() on a Query object, that executes the query and returns the results as a list. What I'm wondering is what exactly is happening in the source code to make the Query fire SQL and grab the results.

Related

Pyramid Traversal: Changing __getitem__ behavior if a resource isn't the last in the chain

SQLAlchemy Commit Changes

Is it possible to read multiple result sets using a ResultProxy object in sqlalchemy?

What is a generative method?

Design question in Python: should this be one generic function or two specific ones?

Categories

Resources

Pyramid Traversal: Changing getitem behavior if a resource isn't the last in the chain