When using celery, SQLAlchmey keeps suffering from database connection issues

When using celery, SQLAlchmey keeps suffering from database connection issues - python

On our dev environment, we started exploring the use of celery. The problem is when a task is launched, SQLAlchemy often has a hard time connecting to our Amazon/AWS RDS instance. This tends to happen after a period of time regardless of what settings I've tried but it's hard to say just how long. Our database is a snapshot of our production database on AWS RDS with all of the same parameters/settings.
The errors include...
OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')...(Background on this error at: http://sqlalche.me/e/e3q8)
...or...
OperationalErrorOperation : MySQL Connection not available.
Our engine...
engine = sa.create_engine(SA_ENGINE, echo=False, pool_recycle=90, pool_pre_ping=True)
(I've tried tons of variations of pool_recycle)
On the database side, I've changed the following parameters (though some are extreme, I've tried all sorts of variations)...
interactive_timeout = 28800
wait_timeout = 28800
max_heap_table_size = 32000000000
I tried wrapping each query to reconnect and this didn't work either. Note this code is taken from a StackOverflow answer on similar topics...
def db_execute(conn, query):
try:
result = conn.execute(query)
print(result)
except sa.exc.OperationalError: # may need more exceptions here (or trap all)
conn = engine.connect() # replace your connection
result = conn.execute(query) # and retry
return result
This has been a three day wheel spin and I'm stuck... I hopes someone out there has some insight or guidance?
UPDATE
I've completely removed celery from the equation and now it's still randomly dropping out and even in between queries within the same function flow. On the production server, the software is nearly identical now.

Related

mysql.connector to AWS RDS database timing out

I have an RDS database that a program I created using python and Mysql connect to, in order to keep track of usage of the program. Anytime the program is used, it adds 1 to a counter on the RDS database. Just this week the program has started throwing an error connecting to the RDS SQL database after about an hour of use. Previous to this, I could leave the software running for days without ever timing out. Closing the software and re-opening it, to re-establish the connection allows me to connect for approx another hour or so before it times out again.
I am connecting using the following parameters:
awsConn = mysql.connector.connect(host='myDatabase.randomStringofChars.us-east-1.rds.amazonaws.com', database='myDatabase', port=3306, user='username', password='password')
Did something recently change with AWS/RDS, do I just need to pass a different parameter into the connection string, or do I just need to add somewhere into my program to attempt to re-establish the connection every so often?
Thanks

psycopg2.OperationalError: server closed the connection unexpectedly (Airflow in AWS, connection drops on both sides)

We have an Airflow instance running in AWS Fargate. It connects to an on-premise Postgres server (on Windows) and tries to load data from a (complicated) view. It uses a PostgresHook for that. However, the task in the DAG fails in Airflow with this error:
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 120, in get_records
cur.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
A while ago, the error occurred after some 10-15 minutes. Now, it occurs faster, after 5 minutes or even faster.
I have looked in the Postgres logs, that shows (confusingly) that it was the client that closed the connection:
LOG: could not send data to client: An existing connection was forcibly closed by the remote host.
FATAL: connection to client lost
I have tried a bunch of potential solutions already.
Without Airflow
Connnecting to the server outside of Airflow, using psycopg2 directly: works (using the complicated view).
Different table
Trying to load data from a different table from Airflow in the cloud: works, finishes quickly too. So this "timeout" only occurs because the query takes a while.
Running the Airflow container locally
At first I could reproduce this issue, but I (think I) solved it by adding some extra parameters in the postgres connection string: keepalives=1&keepalives_idle=60&keepalives_interval=60. However, I cannot reproduce this fix in the Airflow in the cloud, because when I add these parameters there, the error remains.
Increase timeouts
See above, I added keepalives, but I also tried to reason about other potential timeouts. I added a timeout execution_timeout to the DAG arguments, to no avail. We also checked networking timeouts, but given the irregular pattern of the connection failures, it doesn't really sound like such a hard timeout...
I am at a loss here. Any suggestions?

Update: we have solved this problem through a workaround. Instead of keeping the connection open while the complex view is being queried, we have turned the connection into an asynchronous connection (i.e., aconn = psycopg2.connect(database='test', async=1) from psycopg docs). Furthermore, we have turned the view into a materialized view, such that we only call a REFRESH MATERIALIZED VIEW through the asynchronous connection, and then we can just SELECT * on the materialized view a while later, which is very fast.

MySQLConnector (Python): New DB connection for each query vs. one single connection

I have this problem: I'm writing some Python scripts and while, up until now, I had no problems at all using a single MySQLConnector connection throughout the entire script (only closing it at the end of the script), lately I'm having some problems.
If I create a connection at the beginning of the script, something like (ignore the security concerns, I know):
db_conn = mysql.connector.connect(user='root', password='myPassword', host='127.0.0.1', database='my_db', autocommit=True)
and then always use it like:
db_conn.cursor(buffered=True).execute(...)
or fetch and other methods, I will get errors like:
Failed executing the SQL query: MySQL Connection not available.
OR
Failed executing the SQL query: No result set to fetch from.
OR
OperationalError: (2013, 'Lost connection to MySQL server during query')
The code is correct, I just don't understand why this happens. Maybe because I'm concurrently running the same function multiple times (tried with 2), in async, so maybe the concurrent access to the cursor causes this?
I found someone fixed it by using a different DB connection every time (here).
I tried to create a new connection for every single query to the DB and now there are no errors at all. It works fine but it seems an overkill. Imagine calling the async function 10 or 100 times...there would be a lot of DB connections created. Will it cause problems? Will it run out of memory? And, also, I guess it will slow down.
Is there a way to solve it by keeping the same connection for all the queries? Why does that happen?

MySQL is a stateful protocol (more like ftp than http in this way). This means if you are running multiple async threads that are sending and receiving packets on the same MySQL connection, the protocol can't handle that. The server and client will get confused, because messages will arrive in the wrong order.
What I mean is if different async routines are trying to use the database connection at the same time, you can easily get into trouble:
async1: sends query "select * from table1"
async2: sends query "insert into table2 ..."
async1: expects to fetch rows of result set, but receives only rows-affected and last insertid
It gets worse from there, for example, a query cannot execute while there's an existing query with a result set that hasn't closed its result set. Or even worse, you could prepare two queries that have parameters, then subsequently send parameters for the wrong query.
You can use the same database connection for many queries, but DO NOT share the same connection among concurrently executing async threads. To be safe, each async routine should open its own connection. Then the thread that opened a given connection can use that connection for multiple queries.
Think of it like a call center, where dozens of people each have their own phone line. They certainly should not try to share a single phone line and carry on multiple conversations! The only way that could work is if every word uttered on the phone carried some identifying information for which conversation it belonged to. "Hi this is Mr. Smith calling about case #1234, and the answer to the question you just asking me is..."
But MySQL's protocol doesn't do that. It assumes that each message is a continuation of the previous one, and both client and server remember what that is.

cx_oracle connection lost on DB restart

I'm connecting my API layer to Oracle DB using the cx_oracle connector, the issue here is that my DB machine keeps on restarting due to some other reasons.
I want to immune my API Layer to reestablish the connection or try to reconnect, what's the best possible solution to this?
Please don't suggest try and catch.
My connection code :
import cx_Oracle
connection_string = "{user}/{password}#{server}:{port}/{sid}".format(
user=config.DB_USER,
password=config.DB_PASSWORD,
server=config.DB_HOST,
port=config.DB_PORT,
sid=config.DB_SID)
db_conn = cx_Oracle.connect(connection_string)
cursor = db_conn.cursor()
I don't know much about this, but would having a session/connection pool help here?

If you use a session pool (cx_Oracle.SessionPool) then dead sessions will be replaced whenever they are requested from the pool. That will not help you with existing sessions that have been acquired from the pool. But if you get an error, and you release the session back to the pool and then acquire a session again from the pool you will get a session that can be used. If you want more advanced protection from database failure you will need to explore some of the more advanced techniques that the Oracle Database has to offer like RAC (Real Application Clusters).

Postgres psycopg2 DatabaseError on CHECK EXISTS on multiprocessing application

I'm using Postgres and psycopg2 as my driver in a multiprocessing application. In only 2 processes I am getting this error (I've tried 8 and it blows up pretty fast).
cursor.execute("SELECT EXISTS(SELECT * FROM users WHERE name='{0}');".format(name))
DatabaseError: error with no message from the libpq
LOG: unexpected EOF on client connection with an open transaction
Googling this error message was no help since there are several reasons why that error can occur. It is possible that other transactions are happening on other processes, but they each create their own database connection. I'm also closing the database connection after each process is complete and reconnecting when it is restarted.
My theory is that there are a lot of database commands happening at the same time and postgres doesn't like this for whatever reason. I'm not sure how to solve this since the application has to run this way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.