Concurrent writing with sqlite3 [duplicate] - python

This question already has answers here:
SQLite Concurrent Access
(8 answers)
Closed 9 years ago.
I'm using the sqlite3 python module to write the results from batch jobs to a common .db file. I chose SQLite because multiple processes may try to write at the same time, and as I understand it SQLite should handel this well. What I'm unsure of is what happens when multiple processes finish and try to write at the same time. So if several processes that look like this
conn = connect('test.db')
with conn:
for v in xrange(10):
tup = (str(v), v)
conn.execute("insert into sometable values (?,?)", tup)
execute at once, will they throw an exception? Wait politely for the other processes to write? Is there some better way to do this?

The sqlite library will lock the database per process when writing to the database and each process will wait for the lock to be released to get their turn.
The database doesn't need to be written to until commit time however. You are using the connection as a context manager (good!) so the commit takes place after your loop has completed and all insert statements have been executed.
If your database has uniqueness constraints in place, it may be that the commit fails because one process has already added rows that another process conflicts with.

If each process holds it's own connection than it should be fine.
What will happen is that when writing the process will lock the DB,
so all other process will block. They will throw an exception if the timeout
to wait for the DB to be free is exceeded. The timeout can be configured through the connect call:
http://docs.python.org/2/library/sqlite3.html#sqlite3.connect
It is not recommended that you have your DB file in a network share.
Update:
You may also want to check the isolation level: http://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.isolation_level

The good news is that SQLLite library implicitly uses a transaction that locks a database whenever executing a DML. This means that other concurrent accesses to the database will wait till the executing DML request completes by commiting/rolling back a transaction. Note however that multiple processes can perform SELECT at the same time.
Also, please refer to the Python SQL Lite 3.0 module under section 11.13.6 - Controlling Transactions that details how transactions can be controlled.

Related

Redshift unload terminates when called by sqlalchemy

I'm running a few large UNLOAD queries from Redshift to S3 from a python script using SQLAlchemy. (along with the sqlalchemy-redshift package)
The first couple work but the last, which runs the longs (~30 minutes) is marked Terminated in the Redshift Query Dashboard. Some data is loaded to S3 but I suspect it's not ALL of it.
I'm fairly confident the query itself works because I've used it to download locally in the past.
Does SQLAlchemy close queries that take too long? Is there a way to set or lengthen the query-timeout? The script itself continues as if nothing went wrong and the Redshift logs don't indicate a problem either but when a query is marked Terminated it usually means something external has killed the process.
There are two places where you can control timeouts in Redshift:
In the workload manager console, you get an option to specify timeout for each queue.
The ODBC/ JDBC driver settings. Update your registry based on the steps in the link below,
http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-firewall-guidance.html
It turned out to be more an issue with sqlalchemy than AWS/Redshift.
SQLAlchemy does not implicitly "Commit Transactions" so if the connection is closed while uncommitted transactions are still open (even if the query itself appears to be finished), all transactions within that connection are marked Terminated.
Solution is to finish your connection or each transaction with "commit transaction;"
conn = engine.connect()
conn.execute("""SELECT .... """)
conn.execute("""COMMIT TRANSACTION""")

SQLAlchemy long running script: User was holding a relation lock for too long

I have an SQLAlchemy session in a script. The script is running for a long time, and it only fetches data from database, never updates or inserts.
I get quite a lot of errors like
sqlalchemy.exc.DBAPIError: (TransactionRollbackError) terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
The way I understand it, SQLAlchemy creates a transaction with the first select issued, and then reuses it. As my script may run for about an hour, it is very likely that a conflict comes up during the lifetime of that transaction.
To get rid of the error, I could use autocommit in te deprecated mode (without doing anything more), but this is explicitly discouraged by the documentation.
What is the right way to deal with the error? Can I use ORM queries without transactions at all?
I ended up closing the session after (almost) every select, like
session.query(Foo).all()
session.close()
since I do not use autocommit, a new transaction is automatically opened.

django connections shared across threads causes ora-01000 maximum open cursors exeeded

I want to stop executing SQL statement if it takes too long to run.
To achieve this I hacked django.core.db.backends.oracle.base. In FormatStylePlaceholderCursor.execute and executemany instead of:
return self.cursor.execute(TIMEOUT, query, self._param_generator(params))
I do:
return timelimited(TIMEOUT, self.cursor.execute, query, self._param_generator(params))
And timelimited is a function from this recipe: http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/. It wraps a function (i.e. cursor.execute) in separate thread and waits TIMEOUT. If function doesn't return the thread is stopped.
With this modification the application I'm running is throwing ora-01000 maximum cursors exceeded after some short period of time. I'm wandering why wrapping cursor.execute is causing this problem, how to fix it and what are other available solution to this problem.
I'm not familiar with Django nor Python. I can tell you what OCI drivers offer to users.
You must close the query handle - or whatever is it's name in Python. Otherwise you're leaking resources on database side
If the query is still active, you can interrupt it using OCIBreak call. This one is thread safe, and can be called from any thread, regardless what background thread is doing with the connection
Try to check, whether Python drivers for Oracle do allow you to call OCIBreak and OCIReset
This is what you need. Connection.cancel()

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Categories

Resources