Postgres "CREATE TABLE AS (SELECT ...)" stuck - python

I am using Python with psycopg2 2.8.6 against Postgresql 11.6 (also tried on 11.9)
When I am running a query
CREATE TABLE tbl AS (SELECT (row_number() over())::integer "id", "col" FROM tbl2)
Code is getting stuck (cursor.execute never returns), killing the transaction with pg_terminate_backend removes the query from the server, but the code is not released. Yet in this case, the target table is created.
Nothing locks the transaction. The internal SELECT query on its own was tested and it works well.
I tried analysing clues on the server and found out the following inside pg_stat_activity:
Transaction state is idle in transaction
wait_event_type is Client
wait_event is ClientRead
The same effect is happening when I am running the query from within SQL editor (pgModeler), but in this case, the query is stuck on Idle state and the target table is created.
I am not sure what is wrong and how to proceed from here.
Thanks!

I am answering my own question here, to make it helpful for others.
The problem was solved by modifying tcp_keepalives_idle Postgres setting from default 2 hours to 5 minutes.

The problem is not reporducible, you have to investigate more. You must share more details about your database table, your python code and server OS.
You can also share with us the strace attached to Python, so we can see what actually happens during the query.
wait_event_type = Client: The server process is waiting for some activity on a socket from user applications, and that the server expects something to happen that is independent from its internal processes. wait_event will identify the specific wait point.
wait_event = ClientRead: A session that waits for ClientRead is done processing the last query and waits for the client to send the next request. The only way that such a session can block anything is if its state is idle in transaction. All locks are held until the transaction ends, and no locks are held once the transaction finishes.
Idle in transaction: The activity can be idle (i.e., waiting for a client command), idle in transaction (waiting for client inside a BEGIN block), or a command type name such as SELECT. Also, waiting is appended if the server process is presently waiting on a lock held by another session.
The problem could be related to:
Network problems
Uncommitted transaction someplace that has created the same table name.
The transaction is not committed
You pointed out that is not a commit problem because the SQL editor do the same, but in your question you specify that the editor succesfully create the table.
In pgModeler you see idle, that means the session is idle, not the query.
If the session is idle, the "query" column of pg_stat_activity shows the last executed statement in that session.
So this simply means all those sessions properly ended their transaction using a ROLLBACK statement.
If sessions remain in state idle in transaction for a longer time, that is always an application bug where the application is not ending the transaction.
You can do two things:
Set the idle_in_transaction_session_timeout so that these transactions are automatically rolled back by the server after a while. This will keep locks from being held indefinitly, but your application will receive an error.
Fix the application as shown below
.commit() solution
The only way that I found to reproduce the problem is to omit the commit action.
The module psycopg2 is Python DB API-compliant, so the auto-commit feature is off by default.
Whit this option set to False you need to call conn.commit to commit any pending transaction to the database.
Enable auto-commit
You can enable the auto-commit as follow:
import psycopg2
connection = None
try:
connection = psycopg2.connect("dbname='myDB' user='myUser' host='localhost' password='myPassword'")
connection.autocommit = True
except:
print "Connection failed."
if(connection != None):
cursor = connection.cursor()
try:
cursor.execute("""CREATE TABLE tbl AS (SELECT (row_number() over())::integer 'id', 'col' FROM tbl2)""")
except:
print("Failed to create table.")
with statement
You can also use the with statement to auto-commit a transaction:
with connection, connection.cursor() as cursor: # start a transaction and create a cursor
cursor.execute("""CREATE TABLE tbl AS (SELECT (row_number() over())::integer 'id', 'col' FROM tbl2)""")
Traditional way
If you don't want to auto-commit the transaction you need to do it manually calling .commit() after your execute.

just remove the ( ) around the SELECT...
https://www.postgresql.org/docs/11/sql-createtableas.html

Related

Simultaneous queries in SQL

I have started writing a telegram bot, which is accessed by many users at the same time. The bot does numerous SQL queries simultaneously for each user, when the user is using the bot.
When multiple users are using the bot, the bot crashes due to the following error:
"psycopg2.ProgrammingError: no results to fetch"
I think this happens, when the bot does a query for one user, while also doing a different query for another user.
Example: the cursor has done an "INSERT INTO" for one user, while also trying to fetch the results for the second user from the same cursor.
Two simultaneous transactions:
FIRST:
cursor.execute('''INSERT INTO USER_DATA(USER_ID, TRIAL_START) VALUES (%s, %s) ''', (m1, trial_date,))
conn.commit()
cursor.close()
SECOND:
cursor = conn.cursor()
cursor.execute('''SELECT * FROM USER_DATA WHERE USER_ID = %s''', (m1,))
conn.commit()
result = cursor.fetchall()
cursor.close()
As I think, the cursors might have done the SELECT statement, then the INSERT statement at the same time, however, then it fetches the result for the second transaction (SELECT), which gives the error as the cursor has just done the INSERT statement.
Is there a possibility to handle such cases somehow?
Maybe you need a connection pool and threading and a queue.
You basically put a connection handler between program and database server.
Very simplified, it manages the connections to the database server and keeps them open so that you don't have to establish a (time and ressource consuming) connection every time. These connections are made available to you as a connection pool. If you need a connection, you take one from the pool. If no connection is available, you have to wait until one becomes free. If you used a connection and don't need it anymore, you give it back to the pool.
You can either manage it directly in your code (e.g. psycopg2.pool), on the database server (e.g. PgBouncer) or as a separate service in between.
To use the connections simultaneously, you could use e.g. multithreading or multiprocessing. But be careful what you do with it in your database.
The user requests could then be queued until they get processed.
Not sure, if that helps.
Connection pool:
https://www.psycopg.org/docs/pool.html
http://www.pgbouncer.org/
A brief PgBouncer overview:
https://betterprogramming.pub/database-connection-pooling-with-pgbouncer-d8766a8a2c85
If you don't know, what connection pools are, maybe this article helps:
https://pynative.com/psycopg2-python-postgresql-connection-pooling/

Redshift unload terminates when called by sqlalchemy

I'm running a few large UNLOAD queries from Redshift to S3 from a python script using SQLAlchemy. (along with the sqlalchemy-redshift package)
The first couple work but the last, which runs the longs (~30 minutes) is marked Terminated in the Redshift Query Dashboard. Some data is loaded to S3 but I suspect it's not ALL of it.
I'm fairly confident the query itself works because I've used it to download locally in the past.
Does SQLAlchemy close queries that take too long? Is there a way to set or lengthen the query-timeout? The script itself continues as if nothing went wrong and the Redshift logs don't indicate a problem either but when a query is marked Terminated it usually means something external has killed the process.
There are two places where you can control timeouts in Redshift:
In the workload manager console, you get an option to specify timeout for each queue.
The ODBC/ JDBC driver settings. Update your registry based on the steps in the link below,
http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-firewall-guidance.html
It turned out to be more an issue with sqlalchemy than AWS/Redshift.
SQLAlchemy does not implicitly "Commit Transactions" so if the connection is closed while uncommitted transactions are still open (even if the query itself appears to be finished), all transactions within that connection are marked Terminated.
Solution is to finish your connection or each transaction with "commit transaction;"
conn = engine.connect()
conn.execute("""SELECT .... """)
conn.execute("""COMMIT TRANSACTION""")

Tracking out of database connection issues with Python and PostgreSQL

I periodically get connection issues to PostgreSQL - either "FATAL: remaining connection slots are reserved for non-replication superuser connections" or "QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30" depending on whether psycopg or Pyramid is raising the exception. Having established that the transaction manager is properly installed, it's frustrating to not know why I am still running out of connections.
I know the connection data is in pg_stat_activity but it's a single snapshot. Is there any way of seeing connections over time so that I can see just what is actually running over a period of time (ideally from before it's an issue up until the time the issue requires an application restart)?
The first part is in properly identifying all of the queries running at a point in time. For that I used this query:
SELECT (SELECT COUNT(1) FROM pg_stat_activity) AS total_connections,
(SELECT COUNT(1) FROM pg_stat_activity
WHERE current_query in ('<IDLE>', '<IDLE> in transaction'))
AS idle_connections,
current_query
FROM pg_stat_activity
WHERE current_query NOT IN ('<IDLE>', '<IDLE> in transaction')
AND NOT procpid=pg_backend_pid();
NOTE! "current_query" is simply called "query" in later versions of postgresql (from 9.2 on)
This strips out all idle database connections (seeing IDLE connections is not going to help you fix it) and the "NOT procpid=pg_backend_pid()" bit excludes this query itself from showing up in the results (which would bloat your output considerably). You can also filter by datname if you want to isolate a particular database.
I needed these results in a way that was really easy to query them and so I used a table on the database. This should work:
CREATE TABLE connection_audit
(
snapshot timestamp without time zone NOT NULL DEFAULT now(),
total_connections integer,
idle_connections integer,
query text
)
WITH (
OIDS=FALSE
);
This will store the current timestamp in "snapshot", the total and idle connections and the query itself.
I wrote a script to insert the top query into the table and saved that into a file called "pg_connections.sql".
I ran a script to insert these results into the table every second:
while true ; do psql -U user -d database_name -f 'pg_connections.sql' >> connections.log ; sleep 1; done
What this is effectively doing is writing all CURRENTLY executing queries to the table.
Tailing the connections.log file showed me if the script was running as expected (but it isn't really required). Obviously, running a script like this every second can be taxing on a system but it's a short-term measure when you don't have any other way of finding this information out so it should be worth it. Run this script for as long as you need to accumulate sufficient data and hopefully it should pay dirt.

python sqlite "BEGIN TRANSACTION" and "COMMIT" commands

If I want to start a transaction in my database through python do I have to execute the sql command 'BEGIN TRANSACTION' explicitly like this:
import sqlite3
conn = sqlite3.connect(db)
c = conn.cursor()
c.execute('BEGIN TRANSACTION;')
##... some updates on the database ...
conn.commit() ## or c.execute('COMMIT'). Are these two expressions the same?
Is the database locked for change from other clients when I establish the connection or when I begin the transaction or neither?
Only transactions lock the database.
However, Python tries to be clever and automatically begins transactions:
By default, the sqlite3 module opens transactions implicitly before a Data Modification Language (DML) statement (i.e. INSERT/UPDATE/DELETE/REPLACE), and commits transactions implicitly before a non-DML, non-query statement (i. e. anything other than SELECT or the aforementioned).
So if you are within a transaction and issue a command like CREATE TABLE ..., VACUUM, PRAGMA, the sqlite3 module will commit implicitly before executing that command. There are two reasons for doing that. The first is that some of these commands don’t work within transactions. The other reason is that sqlite3 needs to keep track of the transaction state (if a transaction is active or not).
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.
Otherwise leave it at its default, which will result in a plain “BEGIN” statement, or set it to one of SQLite’s supported isolation levels: “DEFERRED”, “IMMEDIATE” or “EXCLUSIVE”.
From python docs:
When a database is accessed by multiple connections, and one of the processes modifies the database, the SQLite database is locked until that transaction is committed. The timeout parameter specifies how long the connection should wait for the lock to go away until raising an exception. The default for the timeout parameter is 5.0 (five seconds).
If I want to start a transaction in my database through python do I have to execute the sql command 'BEGIN TRANSACTION' explicitly like this:
It depends if you are in auto-commit mode (in which case yes) or in manual commit mode (in which case no before DML statements, but unfortunately yes before DDL or DQL statements, as the manual commit mode is incorrectly implemented in the current version of the SQLite3 database driver, see below).
conn.commit() ## or c.execute('COMMIT'). Are these two expressions the same?
Yes.
Is the database locked for change from other clients when I establish the connection or when I begin the transaction or neither?
When you begin the transaction (cf. SQLite3 documentation).
For a better understanding of auto-commit and manual commit modes, read this answer.

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

Categories

Resources