"Database is Locked" error while deploying django webapp on azure - python

I am trying to deploy a django webapp on azure. It appears on the web totally functional, except for one thing, it doesn't let me create a superuser on webssh. Every time I try to run python manage.py createsuperuser and after giving all credentials it throws an error
django.db.utils.OperationalError: database is locked
I am using default database of django. What could be the reason for this?

SQLite is meant to be a lightweight database, and thus can't support a high level of concurrency. OperationalError: database is locked errors indicate that your application is experiencing more concurrency than sqlite can handle in default configuration. This error means that one thread or process has an exclusive lock on the database connection and another thread timed out waiting for the lock the be released.
Python's SQLite wrapper has a default timeout value that determines how long the second thread is allowed to wait on the lock before it times out and raises the OperationalError: database is locked error.
If you're getting this error, you can solve it by:
Switching to another database backend. At a certain point SQLite becomes too "lite" for real-world applications, and these sorts of concurrency errors indicate you've reached that point.
Rewriting your code to reduce concurrency and ensure that database transactions are short-lived.
Increase the default timeout value by setting the timeout database option

Related

Log Stacktrace of current Python Interpreter via PostgreSQL trigger

I am trying to find a bug which happens from time to time on our production server, but could not be reproduced otherwise: some value in the DB gets changed in a way which I don't want it to.
I could write a PostgreSQL trigger which fires if this bug happens, and raise an exception from said trigger. I would see the Python traceback which executes the unwanted SQL statement.
But in this case I don't want to stop the processing of the request.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
I know that this is not trival since the DB code runs under a different linux process with a different user id.
I am using Python, Django, PostgreSQL, Linux.
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
Please ask if you need further information.
Update
One solution might be to overwrite connection.notices of psycopg2.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
No, there is not
The (SQL) query is executed on the DBMS-server, and so is the code inside the trigger
The Python code is executed on the client which is a different process, possibly executed by a different user, and maybe even on a different machine.
The only connection between the server (which detects the condition) and the client (which needs to perform the stackdump) is the connected socket. You could try to extend the server's reply (if there is one) by some status code, which is used by the client to stackddump itself. This will only work if the trigger is part of the current transaction, not of some unrelated process.
The other way is: massive logging. Make the DBMS write every submitted SQL to its logfile. This can cause huge amounts of log entries, which you have to inspect.
Given this setup
(django/python) -[SQL connection]-> (PostgreSQL server)
your intuition that
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
is correct. At least, we won't be able to do this exactly the way you want it; not without much acrobatics.
However, there are options, each with drawbacks:
If you are using django with SQLAlchemy, you can register event listeners (either ORM events or Core Events) that detect this bad SQL statement you are hunting, and log a traceback.
Write a wrapper around your SQL driver, check for the bad SQL statement you are hunting, and log the traceback every time it's detected.
Give every SQL transaction, or every django request, an ID (could just be some UUID in werkzeug's request-bound storage manager). From here, we gain more options:
Configure the logger to log this request ID everywhere, and log all SQL statements in SQLAlchemy. This lets you correlate Django requests, and specific function invocations, with SQL statements. You can do this with echo= in SQLAlchemy.
Include this request ID in every SQL statement (extra column?), then log this ID in the PostgreSQL trigger with RAISE NOTICE. This lets you correlate client-side activity in django against server-side activity in PostgreSQL.
In the spirit of "Test in Production" espoused by Charity Majors, send every request to a sandbox copy of your Django app that reads/writes a sandboxed copy of your production database. In the sandbox database, raise the exception and log your traceback.
You can take this idea further and create smaller "async" setups. For example, you can, for each request, trigger a async duplicate (say, with celery) of the same request that hits a DB configured with your PostgreSQL trigger to fail and log the traceback.
Use RAISE EXCEPTION in the PostgreSQL trigger to rollback the current transaction. In Python, catch that specific exception, log it, then repeat the transaction, changing the data slightly (extra column?) to indicate that this is a retry and the trigger should not fail.
Is there a reason you can't SELECT all row values into Python, then do the detection in Python entirely?
So if you're able to detect the condition after the queries execute, then you can log the condition and/or throw an exception.
Then what you need is tooling like Sentry or New Relic.
You could use LISTEN+NOTIFY.
First let some daemon thread LISTEN and in the db trigger you can execute a NOTIFY.
The daemon thread receives the notify event and can dump the stacktrace of the main thread.
If you use psycopg2, you can use this
# Overwriting connetion.notices via Django
class MyAppConfig(AppConfig):
def ready(self):
connection_created.connect(connection_created_check_for_notice_in_connection)
class ConnectionNoticeList(object):
def append(self, message):
if not 'some_magic_of_db_trigger' in message:
return
logger.warn('%s %s' % (message, ''.join(traceback.format_stack())))
def connection_created_check_for_notice_in_connection(sender, connection, **kwargs):
connection.connection.notices=ConnectionNoticeList()

sqlite3: avoiding "database locked" collision

I am running two python files on one cpu in parallel, both of which make use of the same sqlite3 database. I am handling the sqlite3 database using sqlalchemy and my understanding is that sqlalchemy handles all the threading database issues within one app. My question is how to handle the access from the two different apps?
One of my two programs is a flask application and the other is a cronjob which updates the database from time to time.
It seems that even read-only tasks on the sqlite database lock the database, meaning that if both apps want to read or write at the same time I get an error.
OperationalError: (sqlite3.OperationalError) database is locked
Lets assume that my cronjob app runs every 5min. How can I make sure that there are no collisions between my two apps? I could write some read flag into a file which I check before accessing the database, but it seems to me there should be a standard way to do this?
Furthermore I am running my app with gunicorn and in principle it is possible to have multiple jobs running... so I eventually want more than 2 parallel jobs for my flask app...
thanks
carl
It's true. Sqlite isn't built for this kind of application. Sqlite is really for lightweight single-threaded, single-instance applications.
Sqlite connections are one per instance, and if you start getting into some kind of threaded multiplexer (see https://www.sqlite.org/threadsafe.html) it'd be possible, but it's more trouble than it's worth. And there are other solutions that provide that function-- take a look at Postgresql or MySQL. Those DB's are open source, are well documented, well supported, and support the kind of concurrency you need.
I'm not sure how SQLAlchemy handles connections, but if you were using Peewee ORM then the solution is quite simple.
When your Flask app initiates a request, you will open a connection to the DB. Then when Flask sends the response, you close the DB.
Similarly, in your cron script, open a connection when you start to use the DB, then close it when the process is finished.
Another thing you might consider is using SQLite in WAL mode. This can improve concurrency. You set the journaling mode with a PRAGMA query when you open your connection.
For more info, see http://charlesleifer.com/blog/sqlite-small-fast-reliable-choose-any-three-/

The right way to use postgresql with concurrent processes

I am accessing a postgresql table from Python with psycopg2. I am doing this from several processes. I've been using serialization transaction isolation to maintain the integrity of the data. I do this by checking if there is a TransactionRollback exception while updating / inserting, I try again until the process gets through. I am experiencing many errors while doing this (in the form of current transaction is aborted, commands ignored until end of transaction block. More than half the data is successfully written to the database, the rest fails due to the above error (which occurs in all of the processes attempting to write.)
Am I approaching postgresql concurrency / transaction isolation with Python and psycopg2 the correct way? Phrasing another way: is it acceptable to use postgresql serialization transaction isolation, accessing the table from multiple separate processes concurrently?
At a guess, you are trapping a connection exception but not then issuing a ROLLBACK or conn.rollback() on the underlying PostgreSQL connection. So the connection still has an open aborted transaction.
The key thing to understand is that catching a psycopg2 exception does not issue a rollback on the underlying connection. It's marked aborted by PostgreSQL, and can't process new work until you issue a ROLLBACK on the connection.

Python storm/web.py: Correctly handling DisconnectionError with MySQL database

I am writing a web service based on web.py, using storm as an ORM layer for querying records from a MySQL database. The web service is deployed via mod_wsgi using Apache2 on a Linux box. I create a connection to the MySQL database server when the script is started using storm's create_database() method. This is also the point where I create a Store object, which is used later on to perform queries when a request comes in.
After some hours of inactivity, store.find() throws a DisconnectionError: (2006, 'MySQL server has gone away'). I am not surprised that the database connection is dropped as Apache/mod_wsgi reuses the Python processes without reinitializing them for a long time. My question is how to correctly deal with this?
I have tried setting up a mechanism to keep alive the connection to the MySQL server by sending it a recurring "SELECT 1" (every 300 seconds). Unfortunately that fixed the problem on our testing machine, but not on our demo deployments (ouch) while both share the same MySQL configuration (wait_timeout is set to 8 hours).
I have searched for solutions for reconnecting the storm store to the database, but didn't find anything sophisticated. The only recommendation seems to be that one has to catch the exception, treat it like an inconsistency, call rollback() on the store and then retry. However, this would imply that I either have to wrap the whole Store class or implement the same retry-mechanism over and over. Is there a better solution or am I getting something completely wrong here?
Update: I have added a web.py processor that gracefully handles the disconnection error by recreating the storm Store if the exception is caught and then retrying the operation like recommended by Andrey. However, this is an incomplete and suboptimal solution as (a) the store is referenced by a handful of objects for re-use, which requires an additional mechanism to re-wire the store reference on each of these objects, and (b), it doesn't cover transaction handling (rollbacks) when performing writes on the database. However, at least it's an acceptable fix for all read operations on the store for now.
Perhaps you can use web.py's application processor to wrap your controller methods and catch DisconnectionError from them. Something like this:
def my_processor(handler):
tries = 3
while True:
try:
return handler()
except DisconnectionError:
tries -= 1
if tries == 0:
raise
Or you may check cookbook entry of how application processor is used to have SqlAlchemy with web.py: http://webpy.org/cookbook/sqlalchemy and make something similar for storm:
def load_storm(handler):
web.ctx.store = Store(database)
try:
return handler()
except web.HTTPError:
web.ctx.store.commit()
raise
except:
web.ctx.store.rollback()
raise
finally:
web.ctx.store.commit()
app.add_processor(load_storm)

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Categories

Resources