I'm using Postgres and psycopg2 as my driver in a multiprocessing application. In only 2 processes I am getting this error (I've tried 8 and it blows up pretty fast).
cursor.execute("SELECT EXISTS(SELECT * FROM users WHERE name='{0}');".format(name))
DatabaseError: error with no message from the libpq
LOG: unexpected EOF on client connection with an open transaction
Googling this error message was no help since there are several reasons why that error can occur. It is possible that other transactions are happening on other processes, but they each create their own database connection. I'm also closing the database connection after each process is complete and reconnecting when it is restarted.
My theory is that there are a lot of database commands happening at the same time and postgres doesn't like this for whatever reason. I'm not sure how to solve this since the application has to run this way.
Related
We have an Airflow instance running in AWS Fargate. It connects to an on-premise Postgres server (on Windows) and tries to load data from a (complicated) view. It uses a PostgresHook for that. However, the task in the DAG fails in Airflow with this error:
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 120, in get_records
cur.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
A while ago, the error occurred after some 10-15 minutes. Now, it occurs faster, after 5 minutes or even faster.
I have looked in the Postgres logs, that shows (confusingly) that it was the client that closed the connection:
LOG: could not send data to client: An existing connection was forcibly closed by the remote host.
FATAL: connection to client lost
I have tried a bunch of potential solutions already.
Without Airflow
Connnecting to the server outside of Airflow, using psycopg2 directly: works (using the complicated view).
Different table
Trying to load data from a different table from Airflow in the cloud: works, finishes quickly too. So this "timeout" only occurs because the query takes a while.
Running the Airflow container locally
At first I could reproduce this issue, but I (think I) solved it by adding some extra parameters in the postgres connection string: keepalives=1&keepalives_idle=60&keepalives_interval=60. However, I cannot reproduce this fix in the Airflow in the cloud, because when I add these parameters there, the error remains.
Increase timeouts
See above, I added keepalives, but I also tried to reason about other potential timeouts. I added a timeout execution_timeout to the DAG arguments, to no avail. We also checked networking timeouts, but given the irregular pattern of the connection failures, it doesn't really sound like such a hard timeout...
I am at a loss here. Any suggestions?
Update: we have solved this problem through a workaround. Instead of keeping the connection open while the complex view is being queried, we have turned the connection into an asynchronous connection (i.e., aconn = psycopg2.connect(database='test', async=1) from psycopg docs). Furthermore, we have turned the view into a materialized view, such that we only call a REFRESH MATERIALIZED VIEW through the asynchronous connection, and then we can just SELECT * on the materialized view a while later, which is very fast.
I am using Python script to insert records into MySQL database table.
The script fails with the following error message.
MySQL version is 8.0.17 ,Python version 3.6.5
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query ([WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)')
(Background on this error at: http://sqlalche.me/e/e3q8)
The issue is for only few tables.
MySQL automatically closes connections that have been idle for a specific period of time (wait_timeout for non-interactive connections). Therefore it may happen, that your connections are closed if there is too much idle time and connections are not renewed or connections are invalidated because of server restarts.
SQL-Alchemy mentions several strategies on how to tackle the issue of automatic disconnects and database restarts in its documentation on how to deal with pool disconnects.
Two options that you should have a look at are the pool_pre_ping parameter that adds a SELECT 1 before each query to check if the connection is still valid, otherwise the connection will be recycled.
The other option is pool_recycle time that should always be less then your mysql wait_timeout. After this time the connection is automatically recycled to not run in the wait_timeout.
You can check your connections in MySQL using the command
SHOW PROCESSLIST;
where you should see all open connection an the status they are in.
On our dev environment, we started exploring the use of celery. The problem is when a task is launched, SQLAlchemy often has a hard time connecting to our Amazon/AWS RDS instance. This tends to happen after a period of time regardless of what settings I've tried but it's hard to say just how long. Our database is a snapshot of our production database on AWS RDS with all of the same parameters/settings.
The errors include...
OperationalError: (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server during query')...(Background on this error at: http://sqlalche.me/e/e3q8)
...or...
OperationalErrorOperation : MySQL Connection not available.
Our engine...
engine = sa.create_engine(SA_ENGINE, echo=False, pool_recycle=90, pool_pre_ping=True)
(I've tried tons of variations of pool_recycle)
On the database side, I've changed the following parameters (though some are extreme, I've tried all sorts of variations)...
interactive_timeout = 28800
wait_timeout = 28800
max_heap_table_size = 32000000000
I tried wrapping each query to reconnect and this didn't work either. Note this code is taken from a StackOverflow answer on similar topics...
def db_execute(conn, query):
try:
result = conn.execute(query)
print(result)
except sa.exc.OperationalError: # may need more exceptions here (or trap all)
conn = engine.connect() # replace your connection
result = conn.execute(query) # and retry
return result
This has been a three day wheel spin and I'm stuck... I hopes someone out there has some insight or guidance?
UPDATE
I've completely removed celery from the equation and now it's still randomly dropping out and even in between queries within the same function flow. On the production server, the software is nearly identical now.
I am running a several processes in python using multiprocessing. I am hitting a postgresql database and I keep getting this error:
(DatabaseError) server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing the request.
The db admin tells he is not seeing any errors on his side and I can't figure out what is causings this.
I am getting the error OperationalError: FATAL: sorry, too many clients already when using psycopg2. I am calling the close method on my connection instance after I am done with it. I am not sure what could be causing this, it is my first experience with python and postgresql, but I have a few years experience with php, asp.net, mysql, and sql server.
EDIT: I am running this locally, if the connections are closing like they should be then I only have 1 connection open at a time. I did have a GUI open to the database but even closed I am getting this error. It is happening very shortly after I run my program. I have a function I call that returns a connection that is opened like:
psycopg2.connect(connectionString)
Thanks
Final Edit:
It was my mistake, I was recursively calling the same method on mistake that was opening the same method over and over. It has been a long day..
This error means what it says, there are too many clients connected to postgreSQL.
Questions you should ask yourself:
Are you the only one connected to this database?
Are you running a graphical IDE?
What method are you using to connect?
Are you testing queries at the same time that you running the code?
Any of these things could be the problem. If you are the admin, you can up the number of clients, but if a program is hanging it open, then that won't help for long.
There are many reasons why you could be having too many clients running at the same time.
Make sure your db connection command isn't in any kind of loop. I was getting the same error from my script until I moved my db.database() out of my programs repeating execution loop.
It simple means many clients are making transaction to PostgreSQL at same time.
I was running Postgis container and Django in different docker container. Hence for my case restarting both db and system container solved the problem.