The right way to use postgresql with concurrent processes

The right way to use postgresql with concurrent processes - python

I am accessing a postgresql table from Python with psycopg2. I am doing this from several processes. I've been using serialization transaction isolation to maintain the integrity of the data. I do this by checking if there is a TransactionRollback exception while updating / inserting, I try again until the process gets through. I am experiencing many errors while doing this (in the form of current transaction is aborted, commands ignored until end of transaction block. More than half the data is successfully written to the database, the rest fails due to the above error (which occurs in all of the processes attempting to write.)
Am I approaching postgresql concurrency / transaction isolation with Python and psycopg2 the correct way? Phrasing another way: is it acceptable to use postgresql serialization transaction isolation, accessing the table from multiple separate processes concurrently?

At a guess, you are trapping a connection exception but not then issuing a ROLLBACK or conn.rollback() on the underlying PostgreSQL connection. So the connection still has an open aborted transaction.
The key thing to understand is that catching a psycopg2 exception does not issue a rollback on the underlying connection. It's marked aborted by PostgreSQL, and can't process new work until you issue a ROLLBACK on the connection.

Related

"Database is Locked" error while deploying django webapp on azure

I am trying to deploy a django webapp on azure. It appears on the web totally functional, except for one thing, it doesn't let me create a superuser on webssh. Every time I try to run python manage.py createsuperuser and after giving all credentials it throws an error
django.db.utils.OperationalError: database is locked
I am using default database of django. What could be the reason for this?

SQLite is meant to be a lightweight database, and thus can't support a high level of concurrency. OperationalError: database is locked errors indicate that your application is experiencing more concurrency than sqlite can handle in default configuration. This error means that one thread or process has an exclusive lock on the database connection and another thread timed out waiting for the lock the be released.
Python's SQLite wrapper has a default timeout value that determines how long the second thread is allowed to wait on the lock before it times out and raises the OperationalError: database is locked error.
If you're getting this error, you can solve it by:
Switching to another database backend. At a certain point SQLite becomes too "lite" for real-world applications, and these sorts of concurrency errors indicate you've reached that point.
Rewriting your code to reduce concurrency and ensure that database transactions are short-lived.
Increase the default timeout value by setting the timeout database option

SQLAlchemy long running script: User was holding a relation lock for too long

I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?

I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?

Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.

It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.

Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

Python + SQLAlchemy problem: The transaction is inactive due to a rollback in a subtransaction

I have a problem with Python + SQLAlchemy.
When something goes wrong (in my case it is an integrity error, due to a race condition) and the database error is raised, all following requests result in the error being raised:
InvalidRequestError: The transaction is inactive due to a rollback in a subtransaction. Issue rollback() to cancel the transaction.
While I can prevent this original error (race condition) from happening, but I would like a more robust solution, I want to prevent a single error from crashing the entire application.
What is the best way to do this? Is there a way to tell Python to rollback the failed transaction?

The easiest thing is to make sure you are using a new SQLAlchemy Session when you start work in your controller. in /project/lib/base.py, add a method for BaseController:
def __before__(self):
model.Session.close()
Session.close() will clear out the session and close any open transactions if there are any. You want to make sure that each time you use a session it's cleared when you're done with your work in the controller. Doing it at the start of the controller's handling of the request will make sure that it's always cleared, even if the thread's previous request had an exception and there is a rollback waiting.

Do you use in your controllers yoursapp.lib.base.BaseController?
You can look at
Handle mysql restart in SQLAlchemy
Also you can catch SA exception in BaseController try-finally block and do session rollback()
In BaseController SA Session removed http://www.sqlalchemy.org/docs/05/session.html#lifespan-of-a-contextual-session

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?

This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

The right way to use postgresql with concurrent processes - python

Related

"Database is Locked" error while deploying django webapp on azure

SQLAlchemy long running script: User was holding a relation lock for too long

Psycopg / Postgres : Connections hang out randomly

Python + SQLAlchemy problem: The transaction is inactive due to a rollback in a subtransaction

caching issues in MySQL response with MySQLdb in Django

Categories

Resources