SQLite3 with flask in-memory mode

SQLite3 with flask in-memory mode - python

I am using sqlite3 in a flask app (actually connexion).
I would like to stay in-memory but keep the db between the requests to the server.
So it should be destroyed after server is killed
When I use sqlite3.connect(':memory:') the db is destroyed after each response
So I followed this approach In memory SQLite3 shared database python and run sqlite3.connect('file::memory:?cache=shared&mode=memory', uri=True). But then, a file called file::memory:?cache=shared&mode=memory appears in the app root and does not disappear when I kill the server. When I start the server again, the db-init routine which creates the tables fails, because the tables are already created.
I tried this out on linux and Mac. Both have same behaviour. It seems like the db is saved to file instead of being mapped to memory.
My python version is 3.9 and sqlite3.sqlite_version_info is (3, 37, 0)

I am suspecting that sqlite is treating this 'file::memory:?cache=shared&mode=memory' as a file name. Therefore on execution, creates a database file with that "name", in it's root directory.
Now to the issue I would try connecting via:
sqlite3.connect(':memory:')
and to keep it alive you could try, opening a connection before starting to serve the app, store the connection object somewhere so it doesn't get garbage collected, and proceed as usual opening and closing other connections to it (on per-request basis).
SOS: Keep in mind I have only tested it in a single thread script to check if a new sqlite3.connect(':memory:') connects to the same database that we have already loaded (it does).
I do not know how well it would play with flask's threads, or sqlite it self.
UPDATE:
Here's my approach, more info below:
class db_test:
# DOES NOT INCLUDE LOADING THE FILE TO MEMORY AND VICE VERSA (out of the scope of the question)
def __init__(self):
self.db = sqlite3.connect(":memory:", check_same_thread=False)
def execute_insert(self, query: str, data: tuple):
cur = self.db.cursor()
with self.db:
cur.execute(query, data)
cur.close()
The above class is instantiated once in the beginning of my flask app, right after imports like so:
from classes import db_test
db = db_test()
This avoids garbage collection.
To use, simply call where is need like so:
#app.route("/db_test")
def db_test():
db.execute_insert("INSERT INTO table (entry) VALUES (?)", ('hello', ))
return render_template("db_test.html")
Notes:
You might have noticed the 2nd argument in self.db = sq.connect(":memory:", check_same_thread=False). This makes it possible to use connections and cursors created in different threads (as flask does), but at the risk of collisions and corrupting data/entries.
From my understanding (regarding my setup flask->waitress->nginx), unless explicitely set to some multithreaded/multiprocessing mode, flask will process each request start-to-finish and then proceed to the next. Thus rendering above danger, irrelevant.
I set up a rudimentary test to see if my theory holds up. I would insert an incremental number every time a page is requested. I then SPAMMED refresh on pc, laptop & mobile. The resulting 164 entries were checked for integrity manually and passed.
Finally: Keep in mind that I might be missing something, that my methodology is not of a stress-test and the differences between our setups.
Hope this helps!
PS: The first approach I suggested could not be replicated inside flask. I suspect that is due to flasks thread activity.

Related

Using rpyc to connect to database once and serve multiple queries

Trying to serve database query results to adhoc client requests, but do not want to open a connection for each individual query. I'm not sure if i'm doing it right.
Current solution is something like this on the "server" side (heavily cut down for clarity):
import rpyc
from rpyc.utils.server import ThreadedServer
import cx_Oracle
conn = cx_Oracle.conect('whatever connect string')
cursor = conn.cursor()
def get_some_data(barcode):
# do something
return cursor.execute("whatever query",{'barcode':barcode})
class data_service(rpyc.Service):
def exposed_get_some_data(self, brcd):
return get_some_data(brcd)
if __name__ == '__main__':
s = ThreadedServer(data_service, port=12345, auto_register=False)
s.start()
This runs okay for a while. However from time to time the program crashes and so far i haven't been able to track when it does that.
What i wish to confirm, is see how the database connection is created outside of the data_service class. Is this in itself likely to cause problems?
Many thanks any thoughts appreciated.

I don't think the problem is that you're creating the connection outside of the class, that should be fine.
I think the problem is that you are creating just one cursor and using it for a long time, which as far as I understand is not how cursors are meant to be used.
You can use conn.execute without manually creating a cursor, which should be fine for how you're using the database. If I remember correctly, behind the scenes this creates a new cursor for each SQL command. You could also do this yourself in get_some_data(): create a new cursor, use it once, and then close it before returning the data.
In the long run, if you wish your server to be more robust, you'll need to add some error-handling for when database operations fail or the connection is lost.
A final note: Essentially you've written a very basic database proxy server. There are probably various existing solutions for this already, which already handle many issues you are likely to run in to. I recommend at least considering using an existing solution.

python sqlalchemy + postgresql program freezes

I've ran into a strange situation. I'm writing some test cases for my program. The program is written to work on sqllite or postgresqul depending on preferences. Now I'm writing my test code using unittest. Very basically what I'm doing:
def setUp(self):
"""
Reset the database before each test.
"""
if os.path.exists(root_storage):
shutil.rmtree(root_storage)
reset_database()
initialize_startup()
self.project_service = ProjectService()
self.structure_helper = FilesHelper()
user = model.User("test_user", "test_pass", "test_mail#tvb.org",
True, "user")
self.test_user = dao.store_entity(user)
In the setUp I remove any folders that exist(created by some tests) then I reset my database (drop tables cascade basically) then I initialize the database again and create some services that will be used for testing.
def tearDown(self):
"""
Remove project folders and clean up database.
"""
created_projects = dao.get_projects_for_user(self.test_user.id)
for project in created_projects:
self.structure_helper.remove_project_structure(project.name)
reset_database()
Tear down does the same thing except creating the services, because this test module is part of the same suite with other modules and I don't want things to be left behind by some tests.
Now all my tests run fine with sqllite. With postgresql I'm running into a very weird situation: at some point in the execution, which actually differs from run to run by a small margin (ex one or two extra calls) the program just halts. I mean no error is generated, no exception thrown, the program just stops.
Now only thing I can think of is that somehow I forget a connection opened somewhere and after I while it timesout and something happens. But I have A LOT of connections so before I start going trough all that code, I would appreciate some suggestions/ opinions.
What could cause this kind of behaviour? Where to start looking?
Regards,
Bogdan

PostgreSQL based applications freeze because PG locks tables fairly aggressively, in particular it will not allow a DROP command to continue if any connections are open in a pending transaction, which have accessed that table in any way (SELECT included).
If you're on a unix system, the command "ps -ef | grep 'post'" will show you all the Postgresql processes and you'll see the status of current commands, including your hung "DROP TABLE" or whatever it is that's freezing. You can also see it if you select from the pg_stat_activity view.
So the key is to ensure that no pending transactions remain - this means at a DBAPI level that any result cursors are closed, and any connection that is currently open has rollback() called on it, or is otherwise explicitly closed. In SQLAlchemy, this means any result sets (i.e. ResultProxy) with pending rows are fully exhausted and any Connection objects have been close()d, which returns them to the pool and calls rollback() on the underlying DBAPI connection. you'd want to make sure there is some kind of unconditional teardown code which makes sure this happens before any DROP TABLE type of command is emitted.
As far as "I have A LOT of connections", you should get that under control. When the SQLA test suite runs through its 3000 something tests, we make sure we're absolutely in control of connections and typically only one connection is opened at a time (still, running on Pypy has some behaviors that still cause hangs with PG..its tough). There's a pool class called AssertionPool you can use for this which ensures only one connection is ever checked out at a time else an informative error is raised (shows where it was checked out).

One solution I found to this problem was to call db.session.close() before any attempt to call db.drop_all(). This will close the connection before dropping the tables, preventing Postgres from locking the tables.
See a much more in-depth discussion of the problem here.

File locks in SQLite

I'm writing my first SQLAlchemy (0.6.8)/Python (2.7.1) program, sitting on top of SQLite (3.7.6.3, I think), running on Windows Vista.
In order to perform unit-testing, I am pointing SQLite to a test database, and my unit-test scripts routinely delete the database file, so I am continuously working with a known initial state.
Sometimes my (single-threaded) unit-tests fail to remove the file:
WindowsError: [Error 32] The process cannot access the file because it is being used by another process
The only process that uses the file is the unit-test harness. Clearly, some lock is not being released by one of my completed unit-tests, preventing the next unit-test in the same process from deleting the file.
I have searched all the places I have created a session and confirmed there is a corresponding session.commit() or session.rollback().
I have searched for all session.commit() and session.rollback() calls in my code, and added a session.close() call immediately afterwards, in an attempt to explicitly release any transactional locks, but it hasn't helped.
Are there any secrets to ensuring the remaining locks are removed at the end of a transaction to permit the file to be deleted?

Someone had a similar problem: http://www.mail-archive.com/sqlalchemy#googlegroups.com/msg20724.html
You should use a NullPool at the connection establishement to ensure that no active connection stay after session.close()
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
to_engine = create_engine('sqlite:///%s' % temp_file_name, poolclass=NullPool)
Reference: http://www.sqlalchemy.org/docs/06/core/pooling.html?highlight=pool#sqlalchemy.pool
This is only required in SQLAlchemy prior to 0.7.0. After 0.7.0, this became the default behaviour for SQLite. Reference: http://www.sqlalchemy.org/docs/core/pooling.html?highlight=pool#sqlalchemy.pool

Do you require shared access to the database during unit tests? If not, use a in-memory SQLite database for those tests. From the SQLAlchemy documentation:
The sqlite :memory: identifier is the default if no filepath is present. Specify sqlite:// and nothing else:
# in-memory database
e = create_engine('sqlite://')
No need to manage temporary files, no locking semantics, guaranteed a clean slate between unit tests, etc.

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?

Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.

It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.

Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?

This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLite3 with flask in-memory mode - python

Related

Using rpyc to connect to database once and serve multiple queries

python sqlalchemy + postgresql program freezes

File locks in SQLite

Psycopg / Postgres : Connections hang out randomly

caching issues in MySQL response with MySQLdb in Django

Categories

Resources