I have started writing a telegram bot, which is accessed by many users at the same time. The bot does numerous SQL queries simultaneously for each user, when the user is using the bot.
When multiple users are using the bot, the bot crashes due to the following error:
"psycopg2.ProgrammingError: no results to fetch"
I think this happens, when the bot does a query for one user, while also doing a different query for another user.
Example: the cursor has done an "INSERT INTO" for one user, while also trying to fetch the results for the second user from the same cursor.
Two simultaneous transactions:
FIRST:
cursor.execute('''INSERT INTO USER_DATA(USER_ID, TRIAL_START) VALUES (%s, %s) ''', (m1, trial_date,))
conn.commit()
cursor.close()
SECOND:
cursor = conn.cursor()
cursor.execute('''SELECT * FROM USER_DATA WHERE USER_ID = %s''', (m1,))
conn.commit()
result = cursor.fetchall()
cursor.close()
As I think, the cursors might have done the SELECT statement, then the INSERT statement at the same time, however, then it fetches the result for the second transaction (SELECT), which gives the error as the cursor has just done the INSERT statement.
Is there a possibility to handle such cases somehow?
Maybe you need a connection pool and threading and a queue.
You basically put a connection handler between program and database server.
Very simplified, it manages the connections to the database server and keeps them open so that you don't have to establish a (time and ressource consuming) connection every time. These connections are made available to you as a connection pool. If you need a connection, you take one from the pool. If no connection is available, you have to wait until one becomes free. If you used a connection and don't need it anymore, you give it back to the pool.
You can either manage it directly in your code (e.g. psycopg2.pool), on the database server (e.g. PgBouncer) or as a separate service in between.
To use the connections simultaneously, you could use e.g. multithreading or multiprocessing. But be careful what you do with it in your database.
The user requests could then be queued until they get processed.
Not sure, if that helps.
Connection pool:
https://www.psycopg.org/docs/pool.html
http://www.pgbouncer.org/
A brief PgBouncer overview:
https://betterprogramming.pub/database-connection-pooling-with-pgbouncer-d8766a8a2c85
If you don't know, what connection pools are, maybe this article helps:
https://pynative.com/psycopg2-python-postgresql-connection-pooling/
Related
I'm trying to refactor some code and have come up with this
def get_inpatients():
"""
Getting all the inpatients currently sitting in A&E
"""
cnxn = pyodbc.connect(f'DRIVER={DB_DRIVER};SERVER={DB_SERVER};DATABASE={DB_NAME};UID={DB_USER};PWD={DB_PASS}')
cursor = cnxn.cursor()
cursor.execute('EXEC spGetInpatients')
row = cursor.fetchone()
while row is not None:
yield row[0]
row = cursor.fetchone()
In the main file I then do this
for nhs_number in get_inpatients():
.... # This then goes and grabs details from several APIs meaning
# it will be a few seconds for each loop
My question is whether a genertaor is a good choice here. I previously had it so that the function would return a list. Thinking about it now, would this then mean the connection is open for as long as the for loop is running in the main file in which case I am better returning a list?
Yes, the connection will remain open. Whether that is a good idea depends on the circumstances. Normally it is a good idea to use the generator because it allows the processing in your application to run concurrently with the fetching of more rows by the database. It also reduces memory consumption and improves CPU cache efficiency in your application. When done right, it also reduces latency which is very user-visible.
But of course you could run into the maximum connection limit sooner. I'd argue that increasing the connection limit is better than artifically making your application perform worse.
Also note that you can have multiple cursors per connection. See for example
Max SQL connections with Python and pyodbc on a local database showing as 1
I am adding this answer for 2 reasons.
To point that the cursor is an iterator
To make more clear that the "maximum connection limit" (as per the answer of #Homer512) is a client side setting and not a server side one and defaults to 0 both for the database connection and the queries.
So:
According to pyodbc wiki you can avoid the boilerplate code:
The fetchall() function returns all remaining rows in a list. Bear
in mind those rows will all be stored in memory so if there a lot
of rows, you may run out of memory. If you are going to process the rows one at a time, you can use the
cursor itself as an iterator:
for row in cursor.execute("select user_id, user_name from users"):
print(row.user_id, row.user_name)
The connection limit lies in the client side and not the server side.
The comment on that answer reads:
You should clarify what server scoped means. SQL Server has a remote
query timeout value that refers to its queries issued on over linked
servers, not to queries issued by clients to it. I believe the query
timeout is a client property, not a server property. The server runs
the query indefinitely. There is such a thing as a query governor for
addressing this issue which is disabled by default.
Indeed, the docs verify:
This value applies to an outgoing connection initiated by the Database
Engine as a remote query. This value has no effect on queries received
by the Database Engine. A query will wait until it completes.
Regarding the question if it is safe to keep open a database connection for a long time, I found this old but relevant question which has an extended answer in favor of "yes, if you know what you are doing".
I am using Python with psycopg2 2.8.6 against Postgresql 11.6 (also tried on 11.9)
When I am running a query
CREATE TABLE tbl AS (SELECT (row_number() over())::integer "id", "col" FROM tbl2)
Code is getting stuck (cursor.execute never returns), killing the transaction with pg_terminate_backend removes the query from the server, but the code is not released. Yet in this case, the target table is created.
Nothing locks the transaction. The internal SELECT query on its own was tested and it works well.
I tried analysing clues on the server and found out the following inside pg_stat_activity:
Transaction state is idle in transaction
wait_event_type is Client
wait_event is ClientRead
The same effect is happening when I am running the query from within SQL editor (pgModeler), but in this case, the query is stuck on Idle state and the target table is created.
I am not sure what is wrong and how to proceed from here.
Thanks!
I am answering my own question here, to make it helpful for others.
The problem was solved by modifying tcp_keepalives_idle Postgres setting from default 2 hours to 5 minutes.
The problem is not reporducible, you have to investigate more. You must share more details about your database table, your python code and server OS.
You can also share with us the strace attached to Python, so we can see what actually happens during the query.
wait_event_type = Client: The server process is waiting for some activity on a socket from user applications, and that the server expects something to happen that is independent from its internal processes. wait_event will identify the specific wait point.
wait_event = ClientRead: A session that waits for ClientRead is done processing the last query and waits for the client to send the next request. The only way that such a session can block anything is if its state is idle in transaction. All locks are held until the transaction ends, and no locks are held once the transaction finishes.
Idle in transaction: The activity can be idle (i.e., waiting for a client command), idle in transaction (waiting for client inside a BEGIN block), or a command type name such as SELECT. Also, waiting is appended if the server process is presently waiting on a lock held by another session.
The problem could be related to:
Network problems
Uncommitted transaction someplace that has created the same table name.
The transaction is not committed
You pointed out that is not a commit problem because the SQL editor do the same, but in your question you specify that the editor succesfully create the table.
In pgModeler you see idle, that means the session is idle, not the query.
If the session is idle, the "query" column of pg_stat_activity shows the last executed statement in that session.
So this simply means all those sessions properly ended their transaction using a ROLLBACK statement.
If sessions remain in state idle in transaction for a longer time, that is always an application bug where the application is not ending the transaction.
You can do two things:
Set the idle_in_transaction_session_timeout so that these transactions are automatically rolled back by the server after a while. This will keep locks from being held indefinitly, but your application will receive an error.
Fix the application as shown below
.commit() solution
The only way that I found to reproduce the problem is to omit the commit action.
The module psycopg2 is Python DB API-compliant, so the auto-commit feature is off by default.
Whit this option set to False you need to call conn.commit to commit any pending transaction to the database.
Enable auto-commit
You can enable the auto-commit as follow:
import psycopg2
connection = None
try:
connection = psycopg2.connect("dbname='myDB' user='myUser' host='localhost' password='myPassword'")
connection.autocommit = True
except:
print "Connection failed."
if(connection != None):
cursor = connection.cursor()
try:
cursor.execute("""CREATE TABLE tbl AS (SELECT (row_number() over())::integer 'id', 'col' FROM tbl2)""")
except:
print("Failed to create table.")
with statement
You can also use the with statement to auto-commit a transaction:
with connection, connection.cursor() as cursor: # start a transaction and create a cursor
cursor.execute("""CREATE TABLE tbl AS (SELECT (row_number() over())::integer 'id', 'col' FROM tbl2)""")
Traditional way
If you don't want to auto-commit the transaction you need to do it manually calling .commit() after your execute.
just remove the ( ) around the SELECT...
https://www.postgresql.org/docs/11/sql-createtableas.html
I'm currently using mysql.connector in a python Flask project and, after users enter their information, the following query is executed:
"SELECT first, last, email, {} FROM {} WHERE {} <= {} AND ispaired IS NULL".format(key, db, class_data[key], key)
It would pose a problem if this query was executed in 2 threads concurrently, and returned the same row in both threads. I was wondering if there was a way to prevent SELECT mysql queries from executing concurrently, or if this was already the default behavior of mysql.connector? For additional information, all mysql.connector queries are executed after being authenticated with the same account credentials.
It is hard to say from your description, but if you're using Flask, you're most probably using (or will use in production) multiple processes, and you probably have a connection pool (i.e. multiple connections) in each process. So while each connection is executing queries sequentially, this query can be ran concurrently by multiple connections at the same time.
To prevent your application from obtaining the same row at the same time while handling different requests, you should use transactions and techniques like SELECT FOR UPDATE. The exact solution depends on your exact use case.
I am using Python mysqldb library to connect mysql db. I have a web server with 4 worker process which has 1 conn and 1 cursor to mysql db. so every worker process will use its connection/cursor to execute sql sentence.
Now, I am have several client to simultaneously to send request to server, and server will query mysql db, and return some result to client. I encounter error. 2014, "Commands out of sync; you can't run this command now"
I have check sql, it just simple as SELECT a, b, c from table WHERE a = 1. There is no semicolon, or store procedure, and I also try below code as Python, "commands out of sync; you can't run this command now" suggest. but it still same error.
self.cursor.execute(sql, data)
self.conn.commit()
result = result + self.cursor.fetchall()
self.cursor.close()
self.cursor = self.conn.cursor()
Finally, I fixed this issue. My app has multithread to use the same connection, it seems is not a proper way to access mysql, so when I do not share connection, the issue is gone.
Under 'threadSafety' in the MySQLdb User Guide:
The MySQL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MySQLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
mysql_store_result()), it is complicated by SSCursor (which uses
mysql_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed. It is further
complicated by the addition of transactions, since transactions start
when a cursor execute a query, but end when COMMIT or ROLLBACK is
executed by the Connection object. Two threads simply cannot share a
connection while a transaction is in progress, in addition to not
being able to share it during query execution. This excessively
complicated the code to the point where it just isn't worth it.
The general upshot of this is: Don't share connections between
threads. It's really not worth your effort or mine, and in the end,
will probably hurt performance, since the MySQL server runs a separate
thread for each connection. You can certainly do things like cache
connections in a pool, and give those connections to one thread at a
time. If you let two threads use a connection simultaneously, the
MySQL client library will probably upchuck and die. You have been
warned.
[Python/MySQLdb] - CentOS - Linux - VPS
I have a page that parses a large file and queries the datase up to 100 times for each run. The database is pretty large and I'm trying to reduce the execution time of this script.
My SQL functions are inside a class, currently the connection object is a class variable created when the class is instantiated. I have various fetch and query functions that create a cursor from the connection object every time they are called. Would it be faster to create the cursor when the connection object is created and reuse it or would it be better practice to create the cursor every time it's called?
import MySQLdb as mdb
class parse:
con = mdb.connect( server, username, password, dbname )
#cur = con.cursor() ## create here?
def q( self, q ):
cur = self.con.cursor() ## it's currently here
cur.execute( q )
Any other suggestions on how to speed up the script are welcome too. The insert statement is the same for all the queries in the script.
Opening and closing connections is never free, it always wastes some amount of performance.
The reason you wouldn't want to just leave the connection open is that if two requests were to come in at the same time the second request would have to wait till the first request had finished before it could do any work.
One way to solve this is to use connection pooling. You create a bunch of open connections and then reuse them. Every time you need to do a query you check a connection out of the pool, preform the request and then put it back into the pool.
Setting all this up can be quite tedious, so I would recommend using SQLAlchemy. It has built in connection pooling, relatively low overhead and supports MySQL.
Since you care about speed I would only use the core part of SQLAlchemy since the ORM part comes is a bit slower.