I've been trying to test this out, but haven't been able to come to a definitive answer. I'm using SQLAlchemy on top of MySQL and trying to prevent having threads that do a select, get a SHARED_READ lock on some table, and then hold on to it (preventing future DDL operations until it's released). This happens when queries aren't committed. I'm using SQLAlchemy Core, where as far as I could tell .execute() essentially works in autocommit mode, issuing a COMMIT after everything it runs unless explicitly told we're in a transaction. Nevertheless, in show processlist, I'm seeing sleeping threads that still have SHARED_READ locks on a table they once queried. What gives?
Assuming from your post you're operating in "non-transactional" mode, either using an SQLAlchemy Connection without an ongoing transaction, or the shorthand engine.execute(). In this mode of operation SQLAlchemy will detect INSERT, UPDATE, DELETE, and DDL statements and issue a commit after automatically, but not for everything, like SELECT statements. See "Understanding Autocommit". For selects of mutating stored procedures and such that do require a commit, use
conn.execute(text('SELECT ...').execution_options(autocommit=True))
You should also consider closing connections when the thread is done with them for the time being. Closing will call rollback() on the underlying DBAPI connection, which per PEP-0249 is (probably) always in transactional state. This will remove the transactional state and/or locks, and returns the connection to the connection pool. This way you shouldn't need to worry about selects not autocommitting.
Related
I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?
I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.
If I want to start a transaction in my database through python do I have to execute the sql command 'BEGIN TRANSACTION' explicitly like this:
import sqlite3
conn = sqlite3.connect(db)
c = conn.cursor()
c.execute('BEGIN TRANSACTION;')
##... some updates on the database ...
conn.commit() ## or c.execute('COMMIT'). Are these two expressions the same?
Is the database locked for change from other clients when I establish the connection or when I begin the transaction or neither?
Only transactions lock the database.
However, Python tries to be clever and automatically begins transactions:
By default, the sqlite3 module opens transactions implicitly before a Data Modification Language (DML) statement (i.e. INSERT/UPDATE/DELETE/REPLACE), and commits transactions implicitly before a non-DML, non-query statement (i. e. anything other than SELECT or the aforementioned).
So if you are within a transaction and issue a command like CREATE TABLE ..., VACUUM, PRAGMA, the sqlite3 module will commit implicitly before executing that command. There are two reasons for doing that. The first is that some of these commands don’t work within transactions. The other reason is that sqlite3 needs to keep track of the transaction state (if a transaction is active or not).
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.
Otherwise leave it at its default, which will result in a plain “BEGIN” statement, or set it to one of SQLite’s supported isolation levels: “DEFERRED”, “IMMEDIATE” or “EXCLUSIVE”.
From python docs:
When a database is accessed by multiple connections, and one of the processes modifies the database, the SQLite database is locked until that transaction is committed. The timeout parameter specifies how long the connection should wait for the lock to go away until raising an exception. The default for the timeout parameter is 5.0 (five seconds).
If I want to start a transaction in my database through python do I have to execute the sql command 'BEGIN TRANSACTION' explicitly like this:
It depends if you are in auto-commit mode (in which case yes) or in manual commit mode (in which case no before DML statements, but unfortunately yes before DDL or DQL statements, as the manual commit mode is incorrectly implemented in the current version of the SQLite3 database driver, see below).
conn.commit() ## or c.execute('COMMIT'). Are these two expressions the same?
Yes.
Is the database locked for change from other clients when I establish the connection or when I begin the transaction or neither?
When you begin the transaction (cf. SQLite3 documentation).
For a better understanding of auto-commit and manual commit modes, read this answer.
I am currently using scoped_session provided by sqlalchemy with autocommit=True and autoflush=True.
I notice that autoflush is no called properly as some of the updated results are not flushed when my script finishes executing.
Is autoflush not meant to be run with scoped_session in a multithreaded environment?
Is autoflush not meant to be run with scoped_session in a multithreaded environment?
there is no such restriction, no.
I notice that autoflush is no called properly as some of the updated results are not flushed when my script finishes executing.
This is a misunderstanding of autoflush. Autoflush is intended to flush pending data to the database before a query emits a SELECT to the database. It does not provide the feature however that data is flushed immediately as each attribute of an object is changed, as this would be very inefficient and is not feasible with any kind of ORM, unit of work or not. So if you modify a bunch of objects, then throw away the Session without further interaction with it, those pending changes are lost.
Autoflush is intended to be used within the context of a transaction. In its default mode of usage, the Session begins a transaction for you, and you only need call commit() when a series of changes are ready to be finalized. See the docs for background http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#flushing as well as the strong recommendations to avoid autocommit at http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#autocommit-mode .
I have run a few trials and there seems to be some improvement in speed if I set autocommit to False.
However, I am worried that doing one commit at the end of my code, the database rows will not be updated. So, for example, I do several updates to the database, none are committed, does querying the database then give me the old data? Or, does it know it should commit first?
Or, am I completely mistaken as to what commit actually does?
Note: I'm using pyodbc and MySQL. Also, the table I'm using are InnoDB, does that make a difference?
There are some situations which trigger an implicit commit. However under most situations not commiting means data will be unavailable to other connections.
It also means that if another connection tries to perform an action that conflicts with an ongoing transaction (another connection locked that resource) the last request will have to wait for the lock to be released.
As for performance concerns, autocommit causes every change to be immediate. The performance hit will be quite noticeable on big tables as on each commit indexes and constraints need to be updated/checked too. If you only commit after a series of queries, indexes/constraints will only be updated at that time.
On the other hand, not committing frequently enough might cause the server to have too much work trying to maintain consistency between the two sets of data. So there is a trade-off.
And yes, using InnoDB makes a difference. If you were using for instance MyISAM you wouldn't have transactions at all, so any change will be permanent (similarly to autocommit=True). On MyISAM you can play with the delay-key-write option.
For more information about transactions have a look at the official documentation. For more tips about optimization have a look at this article.
The default transaction mode for InnoDB is REPEATABLE READ, all the read will be consistent within a transaction. If you insert rows and query them in the same transaction, you will not see the newly inserted row, but they will be stored when you commit the transaction. If you want to see the newly inserted row before you commit the transaction, you can set the isolation level to READ COMMITTED.
As long as you use the same connection, the database should show you a consistent view on the data, e.g. with all changes made so far in this transaction.
Once you commit, the changes will be written to disk and be visible to other (new) transactions and connections.
I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).