python sqlalchemy + postgresql program freezes - python

I've ran into a strange situation. I'm writing some test cases for my program. The program is written to work on sqllite or postgresqul depending on preferences. Now I'm writing my test code using unittest. Very basically what I'm doing:
def setUp(self):
"""
Reset the database before each test.
"""
if os.path.exists(root_storage):
shutil.rmtree(root_storage)
reset_database()
initialize_startup()
self.project_service = ProjectService()
self.structure_helper = FilesHelper()
user = model.User("test_user", "test_pass", "test_mail#tvb.org",
True, "user")
self.test_user = dao.store_entity(user)
In the setUp I remove any folders that exist(created by some tests) then I reset my database (drop tables cascade basically) then I initialize the database again and create some services that will be used for testing.
def tearDown(self):
"""
Remove project folders and clean up database.
"""
created_projects = dao.get_projects_for_user(self.test_user.id)
for project in created_projects:
self.structure_helper.remove_project_structure(project.name)
reset_database()
Tear down does the same thing except creating the services, because this test module is part of the same suite with other modules and I don't want things to be left behind by some tests.
Now all my tests run fine with sqllite. With postgresql I'm running into a very weird situation: at some point in the execution, which actually differs from run to run by a small margin (ex one or two extra calls) the program just halts. I mean no error is generated, no exception thrown, the program just stops.
Now only thing I can think of is that somehow I forget a connection opened somewhere and after I while it timesout and something happens. But I have A LOT of connections so before I start going trough all that code, I would appreciate some suggestions/ opinions.
What could cause this kind of behaviour? Where to start looking?
Regards,
Bogdan

PostgreSQL based applications freeze because PG locks tables fairly aggressively, in particular it will not allow a DROP command to continue if any connections are open in a pending transaction, which have accessed that table in any way (SELECT included).
If you're on a unix system, the command "ps -ef | grep 'post'" will show you all the Postgresql processes and you'll see the status of current commands, including your hung "DROP TABLE" or whatever it is that's freezing. You can also see it if you select from the pg_stat_activity view.
So the key is to ensure that no pending transactions remain - this means at a DBAPI level that any result cursors are closed, and any connection that is currently open has rollback() called on it, or is otherwise explicitly closed. In SQLAlchemy, this means any result sets (i.e. ResultProxy) with pending rows are fully exhausted and any Connection objects have been close()d, which returns them to the pool and calls rollback() on the underlying DBAPI connection. you'd want to make sure there is some kind of unconditional teardown code which makes sure this happens before any DROP TABLE type of command is emitted.
As far as "I have A LOT of connections", you should get that under control. When the SQLA test suite runs through its 3000 something tests, we make sure we're absolutely in control of connections and typically only one connection is opened at a time (still, running on Pypy has some behaviors that still cause hangs with PG..its tough). There's a pool class called AssertionPool you can use for this which ensures only one connection is ever checked out at a time else an informative error is raised (shows where it was checked out).

One solution I found to this problem was to call db.session.close() before any attempt to call db.drop_all(). This will close the connection before dropping the tables, preventing Postgres from locking the tables.
See a much more in-depth discussion of the problem here.

Related

Log Stacktrace of current Python Interpreter via PostgreSQL trigger

I am trying to find a bug which happens from time to time on our production server, but could not be reproduced otherwise: some value in the DB gets changed in a way which I don't want it to.
I could write a PostgreSQL trigger which fires if this bug happens, and raise an exception from said trigger. I would see the Python traceback which executes the unwanted SQL statement.
But in this case I don't want to stop the processing of the request.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
I know that this is not trival since the DB code runs under a different linux process with a different user id.
I am using Python, Django, PostgreSQL, Linux.
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
Please ask if you need further information.
Update
One solution might be to overwrite connection.notices of psycopg2.
Is there a way to log the Python/Django traceback from within a PostgreSQL trigger?
No, there is not
The (SQL) query is executed on the DBMS-server, and so is the code inside the trigger
The Python code is executed on the client which is a different process, possibly executed by a different user, and maybe even on a different machine.
The only connection between the server (which detects the condition) and the client (which needs to perform the stackdump) is the connected socket. You could try to extend the server's reply (if there is one) by some status code, which is used by the client to stackddump itself. This will only work if the trigger is part of the current transaction, not of some unrelated process.
The other way is: massive logging. Make the DBMS write every submitted SQL to its logfile. This can cause huge amounts of log entries, which you have to inspect.
Given this setup
(django/python) -[SQL connection]-> (PostgreSQL server)
your intuition that
I guess this is not easy since the DB trigger runs in a different context than the python interpreter.
is correct. At least, we won't be able to do this exactly the way you want it; not without much acrobatics.
However, there are options, each with drawbacks:
If you are using django with SQLAlchemy, you can register event listeners (either ORM events or Core Events) that detect this bad SQL statement you are hunting, and log a traceback.
Write a wrapper around your SQL driver, check for the bad SQL statement you are hunting, and log the traceback every time it's detected.
Give every SQL transaction, or every django request, an ID (could just be some UUID in werkzeug's request-bound storage manager). From here, we gain more options:
Configure the logger to log this request ID everywhere, and log all SQL statements in SQLAlchemy. This lets you correlate Django requests, and specific function invocations, with SQL statements. You can do this with echo= in SQLAlchemy.
Include this request ID in every SQL statement (extra column?), then log this ID in the PostgreSQL trigger with RAISE NOTICE. This lets you correlate client-side activity in django against server-side activity in PostgreSQL.
In the spirit of "Test in Production" espoused by Charity Majors, send every request to a sandbox copy of your Django app that reads/writes a sandboxed copy of your production database. In the sandbox database, raise the exception and log your traceback.
You can take this idea further and create smaller "async" setups. For example, you can, for each request, trigger a async duplicate (say, with celery) of the same request that hits a DB configured with your PostgreSQL trigger to fail and log the traceback.
Use RAISE EXCEPTION in the PostgreSQL trigger to rollback the current transaction. In Python, catch that specific exception, log it, then repeat the transaction, changing the data slightly (extra column?) to indicate that this is a retry and the trigger should not fail.
Is there a reason you can't SELECT all row values into Python, then do the detection in Python entirely?
So if you're able to detect the condition after the queries execute, then you can log the condition and/or throw an exception.
Then what you need is tooling like Sentry or New Relic.
You could use LISTEN+NOTIFY.
First let some daemon thread LISTEN and in the db trigger you can execute a NOTIFY.
The daemon thread receives the notify event and can dump the stacktrace of the main thread.
If you use psycopg2, you can use this
# Overwriting connetion.notices via Django
class MyAppConfig(AppConfig):
def ready(self):
connection_created.connect(connection_created_check_for_notice_in_connection)
class ConnectionNoticeList(object):
def append(self, message):
if not 'some_magic_of_db_trigger' in message:
return
logger.warn('%s %s' % (message, ''.join(traceback.format_stack())))
def connection_created_check_for_notice_in_connection(sender, connection, **kwargs):
connection.connection.notices=ConnectionNoticeList()

How to diagnose extra SQLAlchemy connections in Pyramid

When my app runs, I'm very frequently getting issues around the connection pooling (one is "QueuePool limit of size 5 overflow 10 reached", another is "FATAL: remaining connection slots are reserved for non-replication superuser connections").
I have a feeling that it's due to some code not closing connections properly, or other code greedily trying to open new ones when it shouldn't, but I'm using the default SQL Alchemy settings so I assume the pool connection defaults shouldn't be unreasonable. We are using the scoped_session(sessionmaker()) way of creating the session so multiple threads are supported.
So my main question is if there is a tool or way to find out where the connections are going? Short of being able to see as soon as a new one is created (that is not supposed to be created), are there any obvious anti-patterns that might result in this effect?
Pyramid is very un-opinionated and with DB connections, there seem to be two main approaches (equally supported by Pyramid it would seem). In our case, the code base when I started the job used one approach (I'll call it the "globals" approach) and we've agreed to switch to another approach that relies less on globals and more on Pythonic idioms.
About our architecture: the application comprises one repo which houses the Pyramid project and then sources a number of other git modules, each of which had their own connection setup. The "globals" way connects to the database in a very non-ORM fashion, eg.:
(in each repo's __init__ file)
def load_database:
global tables
tables['table_name'] = Table(
'table_name', metadata,
Column('column_name', String),
)
There are related globals that are frequently peppered all over the code:
def function_needing_data(field_value):
global db, tables
select = sqlalchemy.sql.select(
[tables['table_name'].c.data], tables['table_name'].c.name == field_value)
return db.execute(select)
This tables variable is latched onto within each git repo which adds some more tables definitions and somehow the global tables manages to work, providing access to all of the tables.
The approach that we've moved to (although at this time, there are parts of both approaches still in the code) is via a centralised connection, binding all of the metadata to it and then querying the db in an ORM approach:
(model)
class ModelName(MetaDataBase):
__tablename__ = "models_table_name"
... (field values)
(function requiring data)
from models.db import DBSession
from models.model_name import ModelName
def function_needing_data(field_value):
return DBSession.query(ModelName).filter(
ModelName.field_value == field_value).all()
We've largely moved the code over to the latter approach which feels right, but perhaps I'm mistaken in my intentions. I don't know if there is anything inherently good or bad in either approach but could this (one of the approaches) be part of the problem so we keep running out of connections? Is there a telltale sign that I should look out for?
It appears that Pyramid functions best (in terms of handling the connection pool) when you use the Pyramid transaction manager (pyramid_tm). This excellent article by Jon Rosebaugh provides some helpful insight into both how Pyramid apps typically set up their database connections and how they should set them up.
In my case, it was necessary to include the pyramid_tm package and then remove a few occurrences where we were manually committing session changes since pyramid_tm will automatically commit changes if it doesn't see a reason not to.
[Update]
I continued to have connection pooling issues although much fewer of them. After a lot of debugging, I found that the pyramid transaction manager (if you're using it correctly) should not be the issue at all. The issue to the other connection pooling issues I had had to do with scripts that ran via cron jobs. A script will release it's connections when it's finished, but bad code design may result in situations where the same script can be opened up and starts running while the previous one is running (causing them both to run slower, slow enough to have both running while a third instance of the script starts and so on).
This is a more language- and database-agnostic error since it stems from poor job-scripting design but it's worth keeping in mind. In my case, the script had an "&" at the end so that each instance started as a background process, waited 10 seconds, then spawned another, rather than making sure the first job started AND completed, then waited 10 seconds, then started another.
Hope this helps when debugging this very frustrating and thorny issue.

SQLAlchemy long running script: User was holding a relation lock for too long

I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?
I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Categories

Resources