After enabling Write-Ahead Log mode for a volatile SQLite3 database (PRAGMA journal_mode = WAL), my concurrency tests began raising this error. I discovered that this happens when the Python process is forked and a connection is left open to a database in WAL mode. Any subsequent execute() on that database, even with a new connection, throws this 'locking protocol' exception.
Disabling WAL mode (PRAGMA journal_mode = DELETE) makes the problem disappear, and neither does any 'database is locked' error occur either. The 'locking protocol' exception seems reflect the SQLITE_PROTOCOL code underneath which is documented as:
The SQLITE_PROTOCOL result code indicates a problem with the file locking protocol used by SQLite.
I'm using Python 2.7.10 on Mac OS X 10.12.6 Sierra. I think the problem is in Python's sqlite3 module and how it deals with being forked, rather than an issue in SQLite3 itself. I know now how to work around the issue but as per the main question, what is the root cause of this issue?
P.S. - I'm not using any threads and am forking by spawning a daemon child.
SQLite3 is obviously not thread-safe as per the FAQ but, as CL pointed out in the comments to my question, there is a line relating to forking there:
Under Unix, you should not carry an open SQLite database across a fork() system call into the child process.
This doesn't exactly provide an answer as to the cause, however it does point out a solution: close ALL SQLite connections in (or before) a fork() process! Holding onto forked connections prevents new connections from taking place across any process!
Related
Problem
I am working on a long-running python process that performs a lot of database access (mostly reads, occasional writes). Sometimes it may be necessary to terminate the process before it finishes (e.g. by using the kill command) and when this happens I would like to log a value to the database indicating that the particular run was canceled. (I am also logging the occurrence to a log file; I would like to have the information in both places.)
I have found that if I interrupt the process while the database connection is active, the connection becomes unusable; specifically, it hangs the process if I try to use it in any way.
Minimum working example
The actual application is rather large and complex, but this snippet reproduces the problem reliably.
The table test in the database has two columns, id (serial) and message (text). I prepopulated it with one row so the UPDATE statement below would have something to change.
import psycopg2
import sys
import signal
pg_host = 'localhost'
pg_user = 'redacted'
pg_password = 'redacted'
pg_database = 'test_db'
def write_message(msg):
print "Writing: " + msg
cur.execute("UPDATE test SET message = %s WHERE id = 1", (msg,))
conn.commit()
def signal_handler(signal, frame):
write_message('Interrupt!')
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if __name__ == '__main__':
conn = psycopg2.connect(host=pg_host, user=pg_user, password=pg_password, database=pg_database)
cur = conn.cursor()
write_message("Starting")
for i in xrange(10000):
# I press ^C somewhere in here
cur.execute("SELECT * FROM test")
cur.fetchall()
write_message("Finishing")
When I run this script without interruption, it completes as expected. That is, the row in the database is updated to say "Starting" then "Finishing".
If I press ctrl-C during the loop indicated by the comment, python hangs indefinitely. It no longer responds to keyboard input, and the process has to be killed from elsewhere. Looking in my postgresql log, the UPDATE statement with "Interrupted!" is never received by the database server.
If I add a debugging breakpoint at the beginning of signal_handler() I can see that doing almost anything with the database connection at that point causes the same hang. Trying to execute a SELECT, issuing a conn.rollback(), conn.commit(), conn.close() or conn.reset() all cause the hang. Executing conn.cancel() does not cause a hang, but it doesn't improve the situation; subsequent use of the connection still causes a hang. If I remove the database access from write_message() then the script is able to exit gracefully when interrupted, so the hang is definitely database connection related.
Also worth noting: if I change the script so that I am interrupting something other than database activity, it works as desired, logging "Interrupted!" to the database. E.g., if I replace the for i in xrange(10000) loop with a simple sleep(10) and interrupt that, it works fine. So the problem seems to be specifically related to interrupting psycopg2 with a signal while it is performing database access, then trying to use the connection.
Questions
Is there any way to salvage the existing psycopg2 connection and use it to update the database after this kind of interruption?
If not, is there at least a way to terminate it cleanly so if some subsequent code tried to use it, it wouldn't cause a hang?
Finally, is this somehow expected behavior, or is it a bug that should be reported? It makes sense to me that the connection could be in a bad state after this kind of interruption, but ideally it would throw an exception indicating the problem rather than hanging.
Workaround
In the meantime, I have discovered that if I create an entirely new connection with psycopg2.connect() after the interrupt and am careful not to access the old one, I can still update the database from the interrupted process. This is probably what I'll do for now, but it feels untidy.
Environment
OS X 10.11.6
python 2.7.11
psycopg2 2.6.1
postgresql 9.5.1.0
I filed an issue for this on the psycopg2 github and received a helpful response from the developer. In summary:
The behavior of an existing connection within a signal handler is OS dependent and there's probably no way to use the old connection reliably; creating a new one is the recommended solution.
Using psycopg2.extensions.set_wait_callback(psycopg2.extras.wait_select) improves the situation a bit (at least in my environment) by causing execute() statements called from within the signal handler to throw an exception rather than hang. However, doing other things with the conneciton (e.g. reset()) still caused a hang for me, so ultimately it's still best to just create a new connection within the signal handler rather than trying to salvage the existing one.
On some occasions, my python program won't response because there seems to be a deadlock. Since I have no idea where this deadlock happens, I'd like to set a breakpoint or dump the stack of all threads after 10 seconds in order to learn what my program is waiting for.
Use the logging module and put e.g. Logger.debug() calls in strategic places through your program. You can disable these messages by one single setting (Logger.setLevel) if you want to. And you can choose if you want to write them to e.g. stderr or to a file.
import pdb
from your_test_module import TestCase
testcase = TestCase()
testcase.setUp()
pdb.runcall(testcase.specific_test)
And then ctrl-c at your leisure. The KeyboardInterupt will cause pdb to drop into debugger prompt.
Well, as it turns out, it was because my database was locked (a connection wasn't closed) and when the tests were tearing down (and the database schema was being erased so that the database is clean for the next tests), psycopg2 just ignored the KeyboardInterrupt exception.
I solved me problem using the faulthandler module (for earlier versions, there is a pypi repo). Fault handler allows me to dump the stack trace to any file (including sys.stderr) after a period of time (repeatingly) using faulthandler.dump_traceback_later(3, repeat=True). That allowed me to set the breakpoint where my program stopped responding and tackle the issue down effectively.
I am fairly new to databases and have just figured out how to use MongoDB in python2.7 on Ubuntu 12.04. An application I'm writing uses multiple python modules (imported into a main module) that connect to the database. Basically, each module starts by opening a connection to the DB, a connection which is then used for various operations.
However, when the program exits, the main module is the only one that 'knows' about the exiting, and closes its connection to MongoDB. The other modules do not know this and have no chance of closing their connections. Since I have little experience with databases, I wonder if there are any problems leaving connections open when exiting.
Should I:
Leave it like this?
Instead open the connection before and close it after each operation?
Change my application structure completely?
Solve this in a different way?
You can use one pymongo connection across different modules. You can open it in a separate module and import it to other modules on demand. After program finished working, you are able to close it. This will be the best option.
About other questions:
You can leave like this (all connections will be closed when script finishes execution), but leaving something unclosed is a bad form.
You can open/close connection for each operation (but establishing connection is a time-expensive operation.
That what I'd advice you (see this answer's first paragraph)
I think this point can be merged with 3.
I've ran into a strange situation. I'm writing some test cases for my program. The program is written to work on sqllite or postgresqul depending on preferences. Now I'm writing my test code using unittest. Very basically what I'm doing:
def setUp(self):
"""
Reset the database before each test.
"""
if os.path.exists(root_storage):
shutil.rmtree(root_storage)
reset_database()
initialize_startup()
self.project_service = ProjectService()
self.structure_helper = FilesHelper()
user = model.User("test_user", "test_pass", "test_mail#tvb.org",
True, "user")
self.test_user = dao.store_entity(user)
In the setUp I remove any folders that exist(created by some tests) then I reset my database (drop tables cascade basically) then I initialize the database again and create some services that will be used for testing.
def tearDown(self):
"""
Remove project folders and clean up database.
"""
created_projects = dao.get_projects_for_user(self.test_user.id)
for project in created_projects:
self.structure_helper.remove_project_structure(project.name)
reset_database()
Tear down does the same thing except creating the services, because this test module is part of the same suite with other modules and I don't want things to be left behind by some tests.
Now all my tests run fine with sqllite. With postgresql I'm running into a very weird situation: at some point in the execution, which actually differs from run to run by a small margin (ex one or two extra calls) the program just halts. I mean no error is generated, no exception thrown, the program just stops.
Now only thing I can think of is that somehow I forget a connection opened somewhere and after I while it timesout and something happens. But I have A LOT of connections so before I start going trough all that code, I would appreciate some suggestions/ opinions.
What could cause this kind of behaviour? Where to start looking?
Regards,
Bogdan
PostgreSQL based applications freeze because PG locks tables fairly aggressively, in particular it will not allow a DROP command to continue if any connections are open in a pending transaction, which have accessed that table in any way (SELECT included).
If you're on a unix system, the command "ps -ef | grep 'post'" will show you all the Postgresql processes and you'll see the status of current commands, including your hung "DROP TABLE" or whatever it is that's freezing. You can also see it if you select from the pg_stat_activity view.
So the key is to ensure that no pending transactions remain - this means at a DBAPI level that any result cursors are closed, and any connection that is currently open has rollback() called on it, or is otherwise explicitly closed. In SQLAlchemy, this means any result sets (i.e. ResultProxy) with pending rows are fully exhausted and any Connection objects have been close()d, which returns them to the pool and calls rollback() on the underlying DBAPI connection. you'd want to make sure there is some kind of unconditional teardown code which makes sure this happens before any DROP TABLE type of command is emitted.
As far as "I have A LOT of connections", you should get that under control. When the SQLA test suite runs through its 3000 something tests, we make sure we're absolutely in control of connections and typically only one connection is opened at a time (still, running on Pypy has some behaviors that still cause hangs with PG..its tough). There's a pool class called AssertionPool you can use for this which ensures only one connection is ever checked out at a time else an informative error is raised (shows where it was checked out).
One solution I found to this problem was to call db.session.close() before any attempt to call db.drop_all(). This will close the connection before dropping the tables, preventing Postgres from locking the tables.
See a much more in-depth discussion of the problem here.
Basically, i have a list of 30,000 URLs.
The script goes through the URLs and downloads them (with a 3 second delay in between).
And then it stores the HTML in a database.
And it loops and loops...
Why does it randomly get "Killed."? I didn't touch anything.
Edit: this happens on 3 of my linux machines.
The machines are on a Rackspace cloud with 256 MB memory. Nothing else is running.
Looks like you might be running out of memory -- might easily happen on a long-running program if you have a "leak" (e.g., due to accumulating circular references). Does Rackspace offer any easily usable tools to keep track of a process's memory, so you can confirm if this is the case? Otherwise, this kind of thing is not hard to monitor with normal Linux tools from outside the process. Once you have determined that "out of memory" is the likely cause of death, Python-specific tools such as pympler can help you track exactly where the problem is coming from (and thus determine how to avoid those references -- be it by changing them to weak references, or other simpler approaches -- or otherwise remove the leaks).
In cases like this, you should check the log files.
I use Debian and Ubuntu, so the main log file for me is: /var/log/syslog
If you use Red Hat, I think that log is: /var/log/messages
If something happens that is as exceptional as the kernel killing your process, there will be a log event explaining it.
I suspect you are being hit by the Out Of Memory Killer.
Is it possible that it's hitting an uncaught exception? Are you running this from a shell, or is it being run from cron or in some other automated way? If it's automated, the output may not be displayed anywhere.
Are you using some sort of queue manager or process manager of some sort ?
I got apparently random killed messages when the batch queue manager I was using was sending SIGUSR2 when the time was up.
Otherwise I strongly favor the out of memory option.
For those who came here with mysql, I found this answers may by helpful:
use SSCursor as suggented by this
conn = MySQLdb.connect(host=DB_HOST, user=DB_USER, db=DB_NAME,
passwd=DB_PASSWORD, charset="utf8",
cursorclass=MySQLdb.cursors.SSCursor)
and iterate over cursor as suggested by this
cursor = conn.cursor()
cursor.execute("select * from very_big_table;")
for row in cur:
# do what you want here
pass
Do pay attention to what the doc says You MUST retrieve the entire result set and close() the cursor before additional queries can be peformed on the connection., so if you want write and the same time, you should use another connection, or you will get
`_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")`