mysql issue in python while using concurrent.futures module

mysql issue in python while using concurrent.futures module - python

I'm using concurrent.futures module to run jobs in parallel. It runs quite fine.
The start time and completion time gets updated in the mysql database whenever a job starts/ends. Also, each job gets its input files from the database and saves the output files in the database. I'm getting the errors
"Error 2006:MySQL server has gone away"
and
"Error 2013: Lost connection to MySQL server during query" while running the script.
I don't face these errors while running a single job.
Sample Script:
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers=pool_size)
futures = []
for i in self.parent_job.child_jobs:
futures.append(executor.submit(invokeRunCommand, i))
def invokeRunCommand(self)
self.saveStartTime()
self.getInputFiles()
runShellCommand()
self.saveEndTime()
self.saveOutputFiles()
I'm using a single database connection and cursor to execute all the queries. Some queries are time consuming ones. Not sure of the reason for hitting this error. Could someone clarify?
-Thanks

Yes, a single connection to the database is not thread safe, so if you're using the same database connection for multiple threads, things will fail.
If your pseudo code is representative, just start and use a separate database connection for each thread in your invokeRunCommand and things should be fine.

Related

Slow speed of cx_Oracle connection

i am using cx_oracle with python 3.7 to connect to oracle database and execute stored procedures stored in oracle database.
right now i am connecting to database as follows
dbconstr = "username/password#databaseip/sid"
db_connection = cx_Oracle.connect(dbconstr)
cursor = db_connection.cursor()
#calling sp here
cursor.close()
db_connection.close()
but in this code connection time for cx_Oracle.connect(dbconstr) is about 250ms and whole code will run in about 500ms what i want is to reduce conenction time of 250ms.
I am using flask rest-api in python and this code is used for that, 250ms for connection is too long when entire response time is 500ms.
i have also tried maintaining a connection a for a life time of application by declaring global variable for connect object and creating and closing cursors only as shown below will give result in 250ms
dbconstr = "username/password#databaseip/sid"
db_connection = cx_Oracle.connect(dbconstr)
def api_response():
cursor = db_connection.cursor()
#calling sp here
cursor.close()
return result
by this method response time is reduced but a connection is getting maintained even when no one is using the application. After some time of being idle execution speed will get reduced for first request after some idle time, it is in seconds which is very bad.
so, i want help in creating stable code with good response time.

Creating a connection involves a lot of work on the database server: process startup, memory allocation, authentication etc.
Your solution - or using a connection pool - are the ways to reduce connection times in Oracle applications. A pool with an acquire & release around the point of use in the app has benefits for planned and unplanned DB maintenance. This is due to the internal implementation of the pool.
What's the load on your service? You probably want to start a pool and aquire/release connections, see
How to use cx_Oracle session pool with Flask gracefuly? and Unresponsive requests- understanding the bottleneck (Flask + Oracle + Gunicorn) and others. Pro tip: keep the pool small, and make the minimum & maximum size the same.
Is there a problem with having connections open? What is that impacting? There are some solutions such as Shared Servers, or DRCP but generally there shouldn't be any need to use them unless your database server is short of memory.

Django multiprocessing database concurrency

I am stuck with the following database concurrency problem.
I am building a django webapp using psycopg2 to comunicate with a PostgreSQL database. The queries are run from different processes.
In order to lock rows all the queries are inside a transaction atomic block.
with transaction.atomic():
self.analysis_obj = Analysis.objects.select_for_update().get(pk=self.analysis_id)
However sometimes I get random error like:
'no results to fetch',
'pop from empty list',
'error with status PGRES_TUPLES_OK and no message from the libpq'.
Any idea to deal with this problem?
Many thanks.

For whom is interested I found a solution.
When using multiprocessing in Python, you should close all connection every time a process is spawned.
from django.db import connection
connection.close()

"server closed the connection unexpectedly" in long running task after trying to save stuff to db

I have django app and celery workers.
One celery task is quite huge and can run for over 15 minutes. When main calculations are done and I try to save results to db I get an error: psycopg2.OperationalError: server closed the connection unexpectedly.
#celery_app.task
def task(param):
Model1.objects.create(...)
huge_calculations(param) # may run for over 15 minutes
Model2.objects.create(...) # <- error here
Everything that I managed to google refers to the simple solution: "update everything", but I already did, have latest versions of every package in project and still have this error.
For short task (even same task w/ different params) everything works fine.
I've also tried to adjust db connection timeout, but no luck :/

Did you try this?
I'm having the same issue. For now as a workaround had to create wrapper around
connection object with used methods to hook them. In cursor() method had to do a
SELECT 1 "ping" check and reconnect if needed to return a valid working cursor.

Asynchronous query execution in Snowflake: SQL execution canceled

I am using Snowflake Database-as-a-service to store and process our data. Due to handling huge amounts of data, I want to run a query, get the query ID and let it the query execute asynchronously. Another part of the system will monitor the status of the query by checking the query history table using that query ID.
I am using the Snowflake Python Connector.
Here is a sample of what I have so far:
from __future__ import print_function
import io, os, sys, time, datetime
modules_path = os.path.join(os.path.dirname(__file__), 'modules')
sys.path.append(modules_path)
import snowflake.connector
def async_query(data):
connection = snowflake.connector.connect(
user=data['user'],
password=data['password'],
account=data['account'],
region=data['region'],
database=data['database'],
warehouse=data['warehouse'],
schema=data['schema']
)
cursor = connection.cursor()
cursor.execute(data['query'], _no_results=True)
print(cursor.sfqid)
return cursor.sfqid
This piece of code seems to be working, i.e I am getting the query ID, but there is one problem - the SQL query fails with error "SQL execution canceled." in Snowflake. If I remove the _no_results=True parameter, the query works well, but then I have to wait it to complete, which is not the desired behaviour.
Any ideas what is causing the "SQL execution canceled" failure?
A little bit of more info: The reason why I don't want to wait for it, is that I am running the code on AWS Lambda and Lambdas have a maximum running time of 5 minutes.

If _no_results=True is not specified, the execution is synchronized, so the application has to wait for the query to finish. If specified, the query becomes async, so the application will continue running, but the destructor of connection will close the session in the end, and all active queries will be canceled. It seems that's the cause of "SQL execution canceled".
AWS lambda limits the execution time to 5 min, so if the query takes more than the limit, it won't work.
Btw _no_results=True is an internal parameter used for SnowSQL, and its behavior is subject to change in the future.

Why still has "commands out of sync; you can't run this command now" error

I am using Python mysqldb library to connect mysql db. I have a web server with 4 worker process which has 1 conn and 1 cursor to mysql db. so every worker process will use its connection/cursor to execute sql sentence.
Now, I am have several client to simultaneously to send request to server, and server will query mysql db, and return some result to client. I encounter error. 2014, "Commands out of sync; you can't run this command now"
I have check sql, it just simple as SELECT a, b, c from table WHERE a = 1. There is no semicolon, or store procedure, and I also try below code as Python, "commands out of sync; you can't run this command now" suggest. but it still same error.
self.cursor.execute(sql, data)
self.conn.commit()
result = result + self.cursor.fetchall()
self.cursor.close()
self.cursor = self.conn.cursor()

Finally, I fixed this issue. My app has multithread to use the same connection, it seems is not a proper way to access mysql, so when I do not share connection, the issue is gone.
Under 'threadSafety' in the MySQLdb User Guide:
The MySQL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MySQLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
mysql_store_result()), it is complicated by SSCursor (which uses
mysql_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed. It is further
complicated by the addition of transactions, since transactions start
when a cursor execute a query, but end when COMMIT or ROLLBACK is
executed by the Connection object. Two threads simply cannot share a
connection while a transaction is in progress, in addition to not
being able to share it during query execution. This excessively
complicated the code to the point where it just isn't worth it.
The general upshot of this is: Don't share connections between
threads. It's really not worth your effort or mine, and in the end,
will probably hurt performance, since the MySQL server runs a separate
thread for each connection. You can certainly do things like cache
connections in a pool, and give those connections to one thread at a
time. If you let two threads use a connection simultaneously, the
MySQL client library will probably upchuck and die. You have been
warned.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

mysql issue in python while using concurrent.futures module - python

Related

Slow speed of cx_Oracle connection

Django multiprocessing database concurrency

"server closed the connection unexpectedly" in long running task after trying to save stuff to db

Asynchronous query execution in Snowflake: SQL execution canceled

Why still has "commands out of sync; you can't run this command now" error

Categories

Resources