Django multiprocessing database concurrency

Django multiprocessing database concurrency - python

I am stuck with the following database concurrency problem.
I am building a django webapp using psycopg2 to comunicate with a PostgreSQL database. The queries are run from different processes.
In order to lock rows all the queries are inside a transaction atomic block.
with transaction.atomic():
self.analysis_obj = Analysis.objects.select_for_update().get(pk=self.analysis_id)
However sometimes I get random error like:
'no results to fetch',
'pop from empty list',
'error with status PGRES_TUPLES_OK and no message from the libpq'.
Any idea to deal with this problem?
Many thanks.

For whom is interested I found a solution.
When using multiprocessing in Python, you should close all connection every time a process is spawned.
from django.db import connection
connection.close()

Related

SQLAlchemy core engine.execute() vs connection.execute()

I'm playing around with SQLAlchemy core in Python, and I've read over the documentation numerous times and still need clarification about engine.execute() vs connection.execute().
As I understand it, engine.execute() is the same as doing connection.execute(), followed by connection.close().
The tutorials I've followed let me to use this in my code:
Initial setup in script
try:
engine = db.create_engine("postgres://user:pass#ip/dbname", connect_args={'connect_timeout': 5})
connection = engine.connect()
metadata = db.MetaData()
except exc.OperationalError:
print_error(f":: Could not connect to {db_ip}!")
sys.exit()
Then, I have functions that handle my database access, for example:
def add_user(a_username):
query = db.insert(table_users).values(username=a_username)
connection.execute(query)
Am I supposed to be calling connection.close() before my script ends? Or is that handled efficiently enough by itself? Would I be better off closing the connection at the end of add_user(), or is that inefficient?
If I do need to be calling connection.close() before the script ends, does that mean interrupting the script will cause hanging connections on my Postgres DB?

I found this post helpful to better understand the different interaction paradigms in sqlalchemy, in case you haven't read it yet.
Regarding your question as to when to close your db connection: It is indeed very inefficient to create and close connections for every statement execution. However you should make sure that your application does not have connection leaks in it's global flow.

The right way to use postgresql with concurrent processes

I am accessing a postgresql table from Python with psycopg2. I am doing this from several processes. I've been using serialization transaction isolation to maintain the integrity of the data. I do this by checking if there is a TransactionRollback exception while updating / inserting, I try again until the process gets through. I am experiencing many errors while doing this (in the form of current transaction is aborted, commands ignored until end of transaction block. More than half the data is successfully written to the database, the rest fails due to the above error (which occurs in all of the processes attempting to write.)
Am I approaching postgresql concurrency / transaction isolation with Python and psycopg2 the correct way? Phrasing another way: is it acceptable to use postgresql serialization transaction isolation, accessing the table from multiple separate processes concurrently?

At a guess, you are trapping a connection exception but not then issuing a ROLLBACK or conn.rollback() on the underlying PostgreSQL connection. So the connection still has an open aborted transaction.
The key thing to understand is that catching a psycopg2 exception does not issue a rollback on the underlying connection. It's marked aborted by PostgreSQL, and can't process new work until you issue a ROLLBACK on the connection.

mysql issue in python while using concurrent.futures module

I'm using concurrent.futures module to run jobs in parallel. It runs quite fine.
The start time and completion time gets updated in the mysql database whenever a job starts/ends. Also, each job gets its input files from the database and saves the output files in the database. I'm getting the errors
"Error 2006:MySQL server has gone away"
and
"Error 2013: Lost connection to MySQL server during query" while running the script.
I don't face these errors while running a single job.
Sample Script:
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers=pool_size)
futures = []
for i in self.parent_job.child_jobs:
futures.append(executor.submit(invokeRunCommand, i))
def invokeRunCommand(self)
self.saveStartTime()
self.getInputFiles()
runShellCommand()
self.saveEndTime()
self.saveOutputFiles()
I'm using a single database connection and cursor to execute all the queries. Some queries are time consuming ones. Not sure of the reason for hitting this error. Could someone clarify?
-Thanks

Yes, a single connection to the database is not thread safe, so if you're using the same database connection for multiple threads, things will fail.
If your pseudo code is representative, just start and use a separate database connection for each thread in your invokeRunCommand and things should be fine.

How to manage db connection especially in case of multi threading

I am working on an online judge.I am using python 2.7 and Mysql ( as I am working on back end-part)
My Method:
I create a main thread which pulls out submissions from database( 10 at a time) and puts them in a queue.Then I have multiple threads that take submissions from queue, evaluate it and write the result back to database.
Now I have some doubts(I know they are doubts from different topics but approach to some of them also is highly appreciated).
Currently when I start the threads I give them their own db connections, Which they use.Is this a good practice to give one connection per thread. Does sharing of connections between threads create problems.How do I go about this.
My main thread uses a single connection as its only work is to pull submissions from db and put then in queue(also update their status in db to Assessing Submission). But sometimes I get the error: Lost connection to Mysql server while querying. I keep getting it even when I stop the program and start it again.What do I do about it? Also should I implement a Pool of connections for only the main thread?
Also does a db connection stay alive for ever? What to do when its session memory etc gets exhausted how to handle that?

Use a connection pool. Sharing the database connection is not always bad but you have to be careful about it. You can try SQLAlchemy to manage a lot of this for you: http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#unitofwork-contextual
The server might be out of connections, your connection might have been killed because it uses too many resources.. etc. A connection pool could help you solve this.
It all depends, it could stay alive indefinitely theoretically, but usually you have a timeout somewhere.

If you give the same connection to every thread then the threads will not be able to query the database and race condition will occur. So you need to provide separate connection to every thread and indeed it is a good idea. Use a Connection Pool for the purpose it will help you get different connections.
Connection Pool will surely help.
Release the connection once your work is over. There is a limit to connection which is termed as connection time out. So you need to use some third party library to handle that, c3p0 is a good library which can help you in this.
Please refer the below link to configure it:
Best configuration of c3p0

Django ORM and multiprocessing

I am using Django ORM in my python script in a decoupled fashion i.e. it's not running in context of a normal Django Project.
I am also using the multi processing module. And different process in turn are making queries.
The process ran successfully for an hr and exited with this message
"IOError: [Errno 32] Broken pipe"
Upon futhur diagnosis and debugging this error pops up when I call save() on the model instance.
I am wondering
Is Django ORM Process save ?
Why would this error arise else ?
Cheers
Ankur
Found the Answer I was calling a return after starting the process. This error sneaked in as i did a small cut and paste of a function.

It's a little hard to say without more information, but the problem is probably caused by having an open database connection as you spawn new processes, and then trying to use that database connection in the separate processes. Don't re-use database connections from the parent process in multiprocessing workers you spawn; always recreate database connections.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django multiprocessing database concurrency - python

For whom is interested I found a solution. When using multiprocessing in Python, you should close all connection every time a process is spawned. from django.db import connection connection.close()

Related

SQLAlchemy core engine.execute() vs connection.execute()

The right way to use postgresql with concurrent processes

mysql issue in python while using concurrent.futures module

How to manage db connection especially in case of multi threading

Django ORM and multiprocessing

Categories

Resources