I'm writing my first SQLAlchemy (0.6.8)/Python (2.7.1) program, sitting on top of SQLite (3.7.6.3, I think), running on Windows Vista.
In order to perform unit-testing, I am pointing SQLite to a test database, and my unit-test scripts routinely delete the database file, so I am continuously working with a known initial state.
Sometimes my (single-threaded) unit-tests fail to remove the file:
WindowsError: [Error 32] The process cannot access the file because it is being used by another process
The only process that uses the file is the unit-test harness. Clearly, some lock is not being released by one of my completed unit-tests, preventing the next unit-test in the same process from deleting the file.
I have searched all the places I have created a session and confirmed there is a corresponding session.commit() or session.rollback().
I have searched for all session.commit() and session.rollback() calls in my code, and added a session.close() call immediately afterwards, in an attempt to explicitly release any transactional locks, but it hasn't helped.
Are there any secrets to ensuring the remaining locks are removed at the end of a transaction to permit the file to be deleted?
Someone had a similar problem: http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg20724.html
You should use a NullPool at the connection establishement to ensure that no active connection stay after session.close()
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
to_engine = create_engine('sqlite:///%s' % temp_file_name, poolclass=NullPool)
Reference: http://www.sqlalchemy.org/docs/06/core/pooling.html?highlight=pool#sqlalchemy.pool
This is only required in SQLAlchemy prior to 0.7.0. After 0.7.0, this became the default behaviour for SQLite. Reference: http://www.sqlalchemy.org/docs/core/pooling.html?highlight=pool#sqlalchemy.pool
Do you require shared access to the database during unit tests? If not, use a in-memory SQLite database for those tests. From the SQLAlchemy documentation:
The sqlite :memory: identifier is the default if no filepath is present. Specify sqlite:// and nothing else:
# in-memory database
e = create_engine('sqlite://')
No need to manage temporary files, no locking semantics, guaranteed a clean slate between unit tests, etc.
Related
I am using sqlite3 in a flask app (actually connexion).
I would like to stay in-memory but keep the db between the requests to the server.
So it should be destroyed after server is killed
When I use sqlite3.connect(':memory:') the db is destroyed after each response
So I followed this approach In memory SQLite3 shared database python and run sqlite3.connect('file::memory:?cache=shared&mode=memory', uri=True). But then, a file called file::memory:?cache=shared&mode=memory appears in the app root and does not disappear when I kill the server. When I start the server again, the db-init routine which creates the tables fails, because the tables are already created.
I tried this out on linux and Mac. Both have same behaviour. It seems like the db is saved to file instead of being mapped to memory.
My python version is 3.9 and sqlite3.sqlite_version_info is (3, 37, 0)
I am suspecting that sqlite is treating this 'file::memory:?cache=shared&mode=memory' as a file name. Therefore on execution, creates a database file with that "name", in it's root directory.
Now to the issue I would try connecting via:
sqlite3.connect(':memory:')
and to keep it alive you could try, opening a connection before starting to serve the app, store the connection object somewhere so it doesn't get garbage collected, and proceed as usual opening and closing other connections to it (on per-request basis).
SOS: Keep in mind I have only tested it in a single thread script to check if a new sqlite3.connect(':memory:') connects to the same database that we have already loaded (it does).
I do not know how well it would play with flask's threads, or sqlite it self.
UPDATE:
Here's my approach, more info below:
class db_test:
# DOES NOT INCLUDE LOADING THE FILE TO MEMORY AND VICE VERSA (out of the scope of the question)
def __init__(self):
self.db = sqlite3.connect(":memory:", check_same_thread=False)
def execute_insert(self, query: str, data: tuple):
cur = self.db.cursor()
with self.db:
cur.execute(query, data)
cur.close()
The above class is instantiated once in the beginning of my flask app, right after imports like so:
from classes import db_test
db = db_test()
This avoids garbage collection.
To use, simply call where is need like so:
#app.route("/db_test")
def db_test():
db.execute_insert("INSERT INTO table (entry) VALUES (?)", ('hello', ))
return render_template("db_test.html")
Notes:
You might have noticed the 2nd argument in self.db = sq.connect(":memory:", check_same_thread=False). This makes it possible to use connections and cursors created in different threads (as flask does), but at the risk of collisions and corrupting data/entries.
From my understanding (regarding my setup flask->waitress->nginx), unless explicitely set to some multithreaded/multiprocessing mode, flask will process each request start-to-finish and then proceed to the next. Thus rendering above danger, irrelevant.
I set up a rudimentary test to see if my theory holds up. I would insert an incremental number every time a page is requested. I then SPAMMED refresh on pc, laptop & mobile. The resulting 164 entries were checked for integrity manually and passed.
Finally: Keep in mind that I might be missing something, that my methodology is not of a stress-test and the differences between our setups.
Hope this helps!
PS: The first approach I suggested could not be replicated inside flask. I suspect that is due to flasks thread activity.
I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?
I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.
SQLite supports a "shared cache" for :memory: databases when they are opened with a special URI (according to sqlite.org):
[T]he same in-memory database can be opened by two or more database
connections as follows:
rc = sqlite3_open("file::memory:?cache=shared",&db);
I can take advantage of this in Python 3.4 by using the URI parameter for sqlite3.connect():
sqlite3.connect('file::memory:?cache=shared', uri=True)
However, I can't seem to get the same thing working for SQLAlchemy:
engine = sqlalchemy.create_engine('sqlite:///:memory:?cache=shared')
engine.connect()
...
TypeError: 'cache' is an invalid keyword argument for this function
Is there some way to get SQLAlchemy to make use of the shared cache?
Edit:
On Python 3.4, I can use the creator argument to create_engine to solve the problem, but the problem remains on other Python versions:
creator = lambda: sqlite3.connect('file::memory:?cache=shared', uri=True)
engine = sqlalchemy.create_engine('sqlite://', creator=creator)
engine.connect()
You should avoid passing uri=True on older Python versions and the problem will be fixed:
import sqlite3
import sys
import sqlalchemy
DB_URI = 'file::memory:?cache=shared'
PY2 = sys.version_info.major == 2
if PY2:
params = {}
else:
params = {'uri': True}
creator = lambda: sqlite3.connect(DB_URI, **params)
engine = sqlalchemy.create_engine('sqlite:///:memory:', creator=creator)
engine.connect()
SQLAlchemy docs about the SQLite dialect describe the problem and a solution in detail:
Threading/Pooling Behavior
Pysqlite’s default behavior is to prohibit
the usage of a single connection in more than one thread. This is
originally intended to work with older versions of SQLite that did not
support multithreaded operation under various circumstances. In
particular, older SQLite versions did not allow a :memory: database to
be used in multiple threads under any circumstances.
Pysqlite does include a now-undocumented flag known as
check_same_thread which will disable this check, however note that
pysqlite connections are still not safe to use in concurrently in
multiple threads. In particular, any statement execution calls would
need to be externally mutexed, as Pysqlite does not provide for
thread-safe propagation of error messages among other things. So while
even :memory: databases can be shared among threads in modern SQLite,
Pysqlite doesn’t provide enough thread-safety to make this usage worth
it.
SQLAlchemy sets up pooling to work with Pysqlite’s default behavior:
When a :memory: SQLite database is specified, the dialect by default
will use SingletonThreadPool. This pool maintains a single connection
per thread, so that all access to the engine within the current thread
use the same :memory: database - other threads would access a
different :memory: database.
When a file-based database is specified, the dialect will use NullPool
as the source of connections. This pool closes and discards
connections which are returned to the pool immediately. SQLite
file-based connections have extremely low overhead, so pooling is not
necessary. The scheme also prevents a connection from being used again
in a different thread and works best with SQLite’s coarse-grained file
locking.
Using a Memory Database in Multiple Threads
To use a :memory: database
in a multithreaded scenario, the same connection object must be shared
among threads, since the database exists only within the scope of that
connection. The StaticPool implementation will maintain a single
connection globally, and the check_same_thread flag can be passed to
Pysqlite as False:
from sqlalchemy.pool import StaticPool
engine = create_engine('sqlite://',
connect_args={'check_same_thread':False},
poolclass=StaticPool)
Note that using a :memory: database in multiple threads requires a recent version of SQLite.
Source: https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#threading-pooling-behavior
I've ran into a strange situation. I'm writing some test cases for my program. The program is written to work on sqllite or postgresqul depending on preferences. Now I'm writing my test code using unittest. Very basically what I'm doing:
def setUp(self):
"""
Reset the database before each test.
"""
if os.path.exists(root_storage):
shutil.rmtree(root_storage)
reset_database()
initialize_startup()
self.project_service = ProjectService()
self.structure_helper = FilesHelper()
user = model.User("test_user", "test_pass", "test_mail#tvb.org",
True, "user")
self.test_user = dao.store_entity(user)
In the setUp I remove any folders that exist(created by some tests) then I reset my database (drop tables cascade basically) then I initialize the database again and create some services that will be used for testing.
def tearDown(self):
"""
Remove project folders and clean up database.
"""
created_projects = dao.get_projects_for_user(self.test_user.id)
for project in created_projects:
self.structure_helper.remove_project_structure(project.name)
reset_database()
Tear down does the same thing except creating the services, because this test module is part of the same suite with other modules and I don't want things to be left behind by some tests.
Now all my tests run fine with sqllite. With postgresql I'm running into a very weird situation: at some point in the execution, which actually differs from run to run by a small margin (ex one or two extra calls) the program just halts. I mean no error is generated, no exception thrown, the program just stops.
Now only thing I can think of is that somehow I forget a connection opened somewhere and after I while it timesout and something happens. But I have A LOT of connections so before I start going trough all that code, I would appreciate some suggestions/ opinions.
What could cause this kind of behaviour? Where to start looking?
Regards,
Bogdan
PostgreSQL based applications freeze because PG locks tables fairly aggressively, in particular it will not allow a DROP command to continue if any connections are open in a pending transaction, which have accessed that table in any way (SELECT included).
If you're on a unix system, the command "ps -ef | grep 'post'" will show you all the Postgresql processes and you'll see the status of current commands, including your hung "DROP TABLE" or whatever it is that's freezing. You can also see it if you select from the pg_stat_activity view.
So the key is to ensure that no pending transactions remain - this means at a DBAPI level that any result cursors are closed, and any connection that is currently open has rollback() called on it, or is otherwise explicitly closed. In SQLAlchemy, this means any result sets (i.e. ResultProxy) with pending rows are fully exhausted and any Connection objects have been close()d, which returns them to the pool and calls rollback() on the underlying DBAPI connection. you'd want to make sure there is some kind of unconditional teardown code which makes sure this happens before any DROP TABLE type of command is emitted.
As far as "I have A LOT of connections", you should get that under control. When the SQLA test suite runs through its 3000 something tests, we make sure we're absolutely in control of connections and typically only one connection is opened at a time (still, running on Pypy has some behaviors that still cause hangs with PG..its tough). There's a pool class called AssertionPool you can use for this which ensures only one connection is ever checked out at a time else an informative error is raised (shows where it was checked out).
One solution I found to this problem was to call db.session.close() before any attempt to call db.drop_all(). This will close the connection before dropping the tables, preventing Postgres from locking the tables.
See a much more in-depth discussion of the problem here.
I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).