Python MySQL connection randomly lost - python

I have the following problem:
I am using Python with MySQLdb and a SSDictCursor to iterate over a pretty large database (250M lines). Because I am unable to load everything into RAM, I am using the streaming API. On the MySQL server, the max_allowed_packet size is already set to 512M.
I ran my script from different computers (also from the server machine itself) and it keeps crashing at random times after a random number of processed rows with the following error:
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
Exception _mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query') in <bound method SSDictCursor.__del__ of <MySQLdb.cursors.SSDictCursor object at 0x7fa360e1a690>> ignored
I'm not using threads or anything fancy. I'm totally at a loss why this happens. Has anyone any idea how to solve this otherwise?
EDIT: Some sample code for you.
mysql = MySQLdb.connect("host", "user", "pass", "db")
cursor = mysql.cursor(MySQLdb.cursors.SSDictCursor)
cursor.execute("select stuff from database order by date asc")
for row in cursor:
# just repacking all the information in the cursor row into some dict
cursor.close()
Hopefully, that piece of code helped you. In the for loop, I'm only doing some lookups in a local defaultdict.
Somewhere around 80M entries (or maybe also 40M or whatever), my program stops because of the above mentioned error. The data must be transferred in sequential order. Also, because of the random number of processed lines, I'm pretty sure that it's not because of some faulty line in the database. On the server itself, there is only the server instance running, no other programs, as far as I checked.

Related

mysql.connector to AWS RDS database timing out

I have an RDS database that a program I created using python and Mysql connect to, in order to keep track of usage of the program. Anytime the program is used, it adds 1 to a counter on the RDS database. Just this week the program has started throwing an error connecting to the RDS SQL database after about an hour of use. Previous to this, I could leave the software running for days without ever timing out. Closing the software and re-opening it, to re-establish the connection allows me to connect for approx another hour or so before it times out again.
I am connecting using the following parameters:
awsConn = mysql.connector.connect(host='myDatabase.randomStringofChars.us-east-1.rds.amazonaws.com', database='myDatabase', port=3306, user='username', password='password')
Did something recently change with AWS/RDS, do I just need to pass a different parameter into the connection string, or do I just need to add somewhere into my program to attempt to re-establish the connection every so often?
Thanks

Django can't close persistent Mysql connection

Problem:
Our Django application includes zipping large folders, which takes too long (up to 48 hours) so the Django connection with the database gets timed out and throws: "MySQL server has gone away error".
Description:
We have a Django version==3.2.1 application whose CONN_MAX_AGE value is set to 1500(seconds). The default wait_timeout in Mysql(MariaDB) is 8 hours.
ExportStatus table has the following attributes:
package_size
zip_time
Our application works this way:
Zip the folders
set the attribute 'package_size' of ExportStatus table after zipping and save in the database.
set the attribute 'zip_time' of ExportStatus table after zipping and save in the database.
Notice the setting of the columns' values in database. These require django connection with database, which gets timedout after long zipping process. Thus throws the MySQL server gone away error.
What we have tried so far:
from django.db import close_old_connections`
close_old_connections()
This solution doesn't work.
Just after zipping, if the time taken is more than 25 minutes, we close all the connections and ensure new connection as:
from django.db import connections
for connection in connections.all():
try:
# hack to check if the connection still persists with the database.
with connection.cursor() as c:
c.execute("DESCRIBE auth_group;")
c.fetchall()
except:
connection.close()
connection.ensure_connection()
Upon printing the value of the length of connections.all(), it is 2. What we don't understand is how Django persists those old connections and retrieves connections from the connection pool. When we close connections from connections.all(), aren't we closing all the connections in the thread pool?
We first set the package_size and then set the zip_time. The problem with this solution is that occasionally (not always), it throws the same error when setting the zip_time attribute. Sometimes, this solution does seem to work. There is no problem in setting the package_size but throws an error occasionally when setting the 'zip_time' attribute. So our question is if we already reset connections after zipping, why does this still take a stale connection from the connection pool and throws the MySQL server gone away error? Do we have any way to close all the old persistent connections and recreate new ones?

pymysql SELECT * only detecting changes made externally after instantiating a new connection

I have two applications that access the same DB. One application inserts data into a table. The other sits in a loop and waits for the data to be available. If I add a new connection and close the connection before I run the SELECT query I find the data in the table without issues. I am trying to reduce the number of connections. I tried to leave the connection open then just loop through and send the query. When I do this, I do not get any of the updated data that was inserted into the table since the original connection was made. I get I can just re-connect and close, but this is a lot of overhead if I am connecting and closing every second or 2. Any ideas how to get data that was added to a DB from an external source with a SELECT query without having to connect and close every time in a loop?
Do you commit your insert?
normally the best way is you close your connection, and it is not generating very overhead if you open a connection for the select query.

MySQLConnector (Python): New DB connection for each query vs. one single connection

I have this problem: I'm writing some Python scripts and while, up until now, I had no problems at all using a single MySQLConnector connection throughout the entire script (only closing it at the end of the script), lately I'm having some problems.
If I create a connection at the beginning of the script, something like (ignore the security concerns, I know):
db_conn = mysql.connector.connect(user='root', password='myPassword', host='127.0.0.1', database='my_db', autocommit=True)
and then always use it like:
db_conn.cursor(buffered=True).execute(...)
or fetch and other methods, I will get errors like:
Failed executing the SQL query: MySQL Connection not available.
OR
Failed executing the SQL query: No result set to fetch from.
OR
OperationalError: (2013, 'Lost connection to MySQL server during query')
The code is correct, I just don't understand why this happens. Maybe because I'm concurrently running the same function multiple times (tried with 2), in async, so maybe the concurrent access to the cursor causes this?
I found someone fixed it by using a different DB connection every time (here).
I tried to create a new connection for every single query to the DB and now there are no errors at all. It works fine but it seems an overkill. Imagine calling the async function 10 or 100 times...there would be a lot of DB connections created. Will it cause problems? Will it run out of memory? And, also, I guess it will slow down.
Is there a way to solve it by keeping the same connection for all the queries? Why does that happen?
MySQL is a stateful protocol (more like ftp than http in this way). This means if you are running multiple async threads that are sending and receiving packets on the same MySQL connection, the protocol can't handle that. The server and client will get confused, because messages will arrive in the wrong order.
What I mean is if different async routines are trying to use the database connection at the same time, you can easily get into trouble:
async1: sends query "select * from table1"
async2: sends query "insert into table2 ..."
async1: expects to fetch rows of result set, but receives only rows-affected and last insertid
It gets worse from there, for example, a query cannot execute while there's an existing query with a result set that hasn't closed its result set. Or even worse, you could prepare two queries that have parameters, then subsequently send parameters for the wrong query.
You can use the same database connection for many queries, but DO NOT share the same connection among concurrently executing async threads. To be safe, each async routine should open its own connection. Then the thread that opened a given connection can use that connection for multiple queries.
Think of it like a call center, where dozens of people each have their own phone line. They certainly should not try to share a single phone line and carry on multiple conversations! The only way that could work is if every word uttered on the phone carried some identifying information for which conversation it belonged to. "Hi this is Mr. Smith calling about case #1234, and the answer to the question you just asking me is..."
But MySQL's protocol doesn't do that. It assumes that each message is a continuation of the previous one, and both client and server remember what that is.

When using celery, SQLAlchmey keeps suffering from database connection issues

On our dev environment, we started exploring the use of celery. The problem is when a task is launched, SQLAlchemy often has a hard time connecting to our Amazon/AWS RDS instance. This tends to happen after a period of time regardless of what settings I've tried but it's hard to say just how long. Our database is a snapshot of our production database on AWS RDS with all of the same parameters/settings.
The errors include...
OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')...(Background on this error at: http://sqlalche.me/e/e3q8)
...or...
OperationalErrorOperation : MySQL Connection not available.
Our engine...
engine = sa.create_engine(SA_ENGINE, echo=False, pool_recycle=90, pool_pre_ping=True)
(I've tried tons of variations of pool_recycle)
On the database side, I've changed the following parameters (though some are extreme, I've tried all sorts of variations)...
interactive_timeout = 28800
wait_timeout = 28800
max_heap_table_size = 32000000000
I tried wrapping each query to reconnect and this didn't work either. Note this code is taken from a StackOverflow answer on similar topics...
def db_execute(conn, query):
try:
result = conn.execute(query)
print(result)
except sa.exc.OperationalError: # may need more exceptions here (or trap all)
conn = engine.connect() # replace your connection
result = conn.execute(query) # and retry
return result
This has been a three day wheel spin and I'm stuck... I hopes someone out there has some insight or guidance?
UPDATE
I've completely removed celery from the equation and now it's still randomly dropping out and even in between queries within the same function flow. On the production server, the software is nearly identical now.

Categories

Resources