I am running a several processes in python using multiprocessing. I am hitting a postgresql database and I keep getting this error:
(DatabaseError) server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing the request.
The db admin tells he is not seeing any errors on his side and I can't figure out what is causings this.
Related
Problem:
Our Django application includes zipping large folders, which takes too long (up to 48 hours) so the Django connection with the database gets timed out and throws: "MySQL server has gone away error".
Description:
We have a Django version==3.2.1 application whose CONN_MAX_AGE value is set to 1500(seconds). The default wait_timeout in Mysql(MariaDB) is 8 hours.
ExportStatus table has the following attributes:
package_size
zip_time
Our application works this way:
Zip the folders
set the attribute 'package_size' of ExportStatus table after zipping and save in the database.
set the attribute 'zip_time' of ExportStatus table after zipping and save in the database.
Notice the setting of the columns' values in database. These require django connection with database, which gets timedout after long zipping process. Thus throws the MySQL server gone away error.
What we have tried so far:
from django.db import close_old_connections`
close_old_connections()
This solution doesn't work.
Just after zipping, if the time taken is more than 25 minutes, we close all the connections and ensure new connection as:
from django.db import connections
for connection in connections.all():
try:
# hack to check if the connection still persists with the database.
with connection.cursor() as c:
c.execute("DESCRIBE auth_group;")
c.fetchall()
except:
connection.close()
connection.ensure_connection()
Upon printing the value of the length of connections.all(), it is 2. What we don't understand is how Django persists those old connections and retrieves connections from the connection pool. When we close connections from connections.all(), aren't we closing all the connections in the thread pool?
We first set the package_size and then set the zip_time. The problem with this solution is that occasionally (not always), it throws the same error when setting the zip_time attribute. Sometimes, this solution does seem to work. There is no problem in setting the package_size but throws an error occasionally when setting the 'zip_time' attribute. So our question is if we already reset connections after zipping, why does this still take a stale connection from the connection pool and throws the MySQL server gone away error? Do we have any way to close all the old persistent connections and recreate new ones?
I want some clarification on how the pre ping feature exactly works with SqlAlchemy db pools. Let's say I try to make a SQL query to my database with the db pool. If the db pool sends a pre ping to check the connection and the connection is broken, does it automatically handle this? By handling I mean that it reconnects and then sends the SQL query? Or do I have to handle this myself in my code?
Thanks!
From the docs, yes stale connections are handled transparently:
The calling application does not need to be concerned about organizing operations to be able to recover from stale connections checked out from the pool.
... unless:
If the database is still not available when “pre ping” runs, then the
initial connect will fail and the error for failure to connect will be
propagated normally. In the uncommon situation that the database is
available for connections, but is not able to respond to a “ping”, the
“pre_ping” will try up to three times before giving up, propagating
the database error last received.
We have an Airflow instance running in AWS Fargate. It connects to an on-premise Postgres server (on Windows) and tries to load data from a (complicated) view. It uses a PostgresHook for that. However, the task in the DAG fails in Airflow with this error:
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 120, in get_records
cur.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
A while ago, the error occurred after some 10-15 minutes. Now, it occurs faster, after 5 minutes or even faster.
I have looked in the Postgres logs, that shows (confusingly) that it was the client that closed the connection:
LOG: could not send data to client: An existing connection was forcibly closed by the remote host.
FATAL: connection to client lost
I have tried a bunch of potential solutions already.
Without Airflow
Connnecting to the server outside of Airflow, using psycopg2 directly: works (using the complicated view).
Different table
Trying to load data from a different table from Airflow in the cloud: works, finishes quickly too. So this "timeout" only occurs because the query takes a while.
Running the Airflow container locally
At first I could reproduce this issue, but I (think I) solved it by adding some extra parameters in the postgres connection string: keepalives=1&keepalives_idle=60&keepalives_interval=60. However, I cannot reproduce this fix in the Airflow in the cloud, because when I add these parameters there, the error remains.
Increase timeouts
See above, I added keepalives, but I also tried to reason about other potential timeouts. I added a timeout execution_timeout to the DAG arguments, to no avail. We also checked networking timeouts, but given the irregular pattern of the connection failures, it doesn't really sound like such a hard timeout...
I am at a loss here. Any suggestions?
Update: we have solved this problem through a workaround. Instead of keeping the connection open while the complex view is being queried, we have turned the connection into an asynchronous connection (i.e., aconn = psycopg2.connect(database='test', async=1) from psycopg docs). Furthermore, we have turned the view into a materialized view, such that we only call a REFRESH MATERIALIZED VIEW through the asynchronous connection, and then we can just SELECT * on the materialized view a while later, which is very fast.
I am using Python script to insert records into MySQL database table.
The script fails with the following error message.
MySQL version is 8.0.17 ,Python version 3.6.5
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query ([WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)')
(Background on this error at: http://sqlalche.me/e/e3q8)
The issue is for only few tables.
MySQL automatically closes connections that have been idle for a specific period of time (wait_timeout for non-interactive connections). Therefore it may happen, that your connections are closed if there is too much idle time and connections are not renewed or connections are invalidated because of server restarts.
SQL-Alchemy mentions several strategies on how to tackle the issue of automatic disconnects and database restarts in its documentation on how to deal with pool disconnects.
Two options that you should have a look at are the pool_pre_ping parameter that adds a SELECT 1 before each query to check if the connection is still valid, otherwise the connection will be recycled.
The other option is pool_recycle time that should always be less then your mysql wait_timeout. After this time the connection is automatically recycled to not run in the wait_timeout.
You can check your connections in MySQL using the command
SHOW PROCESSLIST;
where you should see all open connection an the status they are in.
I'm using Postgres and psycopg2 as my driver in a multiprocessing application. In only 2 processes I am getting this error (I've tried 8 and it blows up pretty fast).
cursor.execute("SELECT EXISTS(SELECT * FROM users WHERE name='{0}');".format(name))
DatabaseError: error with no message from the libpq
LOG: unexpected EOF on client connection with an open transaction
Googling this error message was no help since there are several reasons why that error can occur. It is possible that other transactions are happening on other processes, but they each create their own database connection. I'm also closing the database connection after each process is complete and reconnecting when it is restarted.
My theory is that there are a lot of database commands happening at the same time and postgres doesn't like this for whatever reason. I'm not sure how to solve this since the application has to run this way.