Trying to serve database query results to adhoc client requests, but do not want to open a connection for each individual query. I'm not sure if i'm doing it right.
Current solution is something like this on the "server" side (heavily cut down for clarity):
import rpyc
from rpyc.utils.server import ThreadedServer
import cx_Oracle
conn = cx_Oracle.conect('whatever connect string')
cursor = conn.cursor()
def get_some_data(barcode):
# do something
return cursor.execute("whatever query",{'barcode':barcode})
class data_service(rpyc.Service):
def exposed_get_some_data(self, brcd):
return get_some_data(brcd)
if __name__ == '__main__':
s = ThreadedServer(data_service, port=12345, auto_register=False)
s.start()
This runs okay for a while. However from time to time the program crashes and so far i haven't been able to track when it does that.
What i wish to confirm, is see how the database connection is created outside of the data_service class. Is this in itself likely to cause problems?
Many thanks any thoughts appreciated.
I don't think the problem is that you're creating the connection outside of the class, that should be fine.
I think the problem is that you are creating just one cursor and using it for a long time, which as far as I understand is not how cursors are meant to be used.
You can use conn.execute without manually creating a cursor, which should be fine for how you're using the database. If I remember correctly, behind the scenes this creates a new cursor for each SQL command. You could also do this yourself in get_some_data(): create a new cursor, use it once, and then close it before returning the data.
In the long run, if you wish your server to be more robust, you'll need to add some error-handling for when database operations fail or the connection is lost.
A final note: Essentially you've written a very basic database proxy server. There are probably various existing solutions for this already, which already handle many issues you are likely to run in to. I recommend at least considering using an existing solution.
Related
I am using sqlite3 in a flask app (actually connexion).
I would like to stay in-memory but keep the db between the requests to the server.
So it should be destroyed after server is killed
When I use sqlite3.connect(':memory:') the db is destroyed after each response
So I followed this approach In memory SQLite3 shared database python and run sqlite3.connect('file::memory:?cache=shared&mode=memory', uri=True). But then, a file called file::memory:?cache=shared&mode=memory appears in the app root and does not disappear when I kill the server. When I start the server again, the db-init routine which creates the tables fails, because the tables are already created.
I tried this out on linux and Mac. Both have same behaviour. It seems like the db is saved to file instead of being mapped to memory.
My python version is 3.9 and sqlite3.sqlite_version_info is (3, 37, 0)
I am suspecting that sqlite is treating this 'file::memory:?cache=shared&mode=memory' as a file name. Therefore on execution, creates a database file with that "name", in it's root directory.
Now to the issue I would try connecting via:
sqlite3.connect(':memory:')
and to keep it alive you could try, opening a connection before starting to serve the app, store the connection object somewhere so it doesn't get garbage collected, and proceed as usual opening and closing other connections to it (on per-request basis).
SOS: Keep in mind I have only tested it in a single thread script to check if a new sqlite3.connect(':memory:') connects to the same database that we have already loaded (it does).
I do not know how well it would play with flask's threads, or sqlite it self.
UPDATE:
Here's my approach, more info below:
class db_test:
# DOES NOT INCLUDE LOADING THE FILE TO MEMORY AND VICE VERSA (out of the scope of the question)
def __init__(self):
self.db = sqlite3.connect(":memory:", check_same_thread=False)
def execute_insert(self, query: str, data: tuple):
cur = self.db.cursor()
with self.db:
cur.execute(query, data)
cur.close()
The above class is instantiated once in the beginning of my flask app, right after imports like so:
from classes import db_test
db = db_test()
This avoids garbage collection.
To use, simply call where is need like so:
#app.route("/db_test")
def db_test():
db.execute_insert("INSERT INTO table (entry) VALUES (?)", ('hello', ))
return render_template("db_test.html")
Notes:
You might have noticed the 2nd argument in self.db = sq.connect(":memory:", check_same_thread=False). This makes it possible to use connections and cursors created in different threads (as flask does), but at the risk of collisions and corrupting data/entries.
From my understanding (regarding my setup flask->waitress->nginx), unless explicitely set to some multithreaded/multiprocessing mode, flask will process each request start-to-finish and then proceed to the next. Thus rendering above danger, irrelevant.
I set up a rudimentary test to see if my theory holds up. I would insert an incremental number every time a page is requested. I then SPAMMED refresh on pc, laptop & mobile. The resulting 164 entries were checked for integrity manually and passed.
Finally: Keep in mind that I might be missing something, that my methodology is not of a stress-test and the differences between our setups.
Hope this helps!
PS: The first approach I suggested could not be replicated inside flask. I suspect that is due to flasks thread activity.
I want to execute multiple queries without each blocking other. I created multiple cursors and did the following but got mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query
import mysql.connector as mc
from threading import Thread
conn = mc.connect(#...username, password)
cur1 = conn.cursor()
cur2 = conn.cursor()
e1 = Thread(target=cur1.execute, args=("do sleep(30)",)) # A 'time taking' task
e2 = Thread(target=cur2.execute, args=("show databases",)) # A simple task
e1.start()
e2.start()
But I got that OperationalError. And reading a few other questions, some suggest that using multiple connections is better than multiple cursors. So shall I use multiple connections?
I don't have the full context of your situation to understand the performance considerations. Yes, starting a new connection could be considered heavy if you are operating under strict timing constraints that are short relative to the time it takes to start a new connection and you were forced to do that for every query...
But you can mitigate that with a shared connection pool that you create ahead of time, and then distribute your queries (in separate threads) over those connections as resources allow.
On the other hand, if all of your query times are fairly long relative to the time it takes to create a new connection, and you aren't looking to run more than a handful of queries in parallel, then it can be a reasonable option to create connections on demand. Just be aware that you will run into limits with the number of open connections if you try to go too far, as well as resource limitations on the database system itself. You probably don't want to do something like that against a shared database. Again, this is only a reasonable option within some very specific contexts.
I'm playing around with SQLAlchemy core in Python, and I've read over the documentation numerous times and still need clarification about engine.execute() vs connection.execute().
As I understand it, engine.execute() is the same as doing connection.execute(), followed by connection.close().
The tutorials I've followed let me to use this in my code:
Initial setup in script
try:
engine = db.create_engine("postgres://user:pass#ip/dbname", connect_args={'connect_timeout': 5})
connection = engine.connect()
metadata = db.MetaData()
except exc.OperationalError:
print_error(f":: Could not connect to {db_ip}!")
sys.exit()
Then, I have functions that handle my database access, for example:
def add_user(a_username):
query = db.insert(table_users).values(username=a_username)
connection.execute(query)
Am I supposed to be calling connection.close() before my script ends? Or is that handled efficiently enough by itself? Would I be better off closing the connection at the end of add_user(), or is that inefficient?
If I do need to be calling connection.close() before the script ends, does that mean interrupting the script will cause hanging connections on my Postgres DB?
I found this post helpful to better understand the different interaction paradigms in sqlalchemy, in case you haven't read it yet.
Regarding your question as to when to close your db connection: It is indeed very inefficient to create and close connections for every statement execution. However you should make sure that your application does not have connection leaks in it's global flow.
[Python/MySQLdb] - CentOS - Linux - VPS
I have a page that parses a large file and queries the datase up to 100 times for each run. The database is pretty large and I'm trying to reduce the execution time of this script.
My SQL functions are inside a class, currently the connection object is a class variable created when the class is instantiated. I have various fetch and query functions that create a cursor from the connection object every time they are called. Would it be faster to create the cursor when the connection object is created and reuse it or would it be better practice to create the cursor every time it's called?
import MySQLdb as mdb
class parse:
con = mdb.connect( server, username, password, dbname )
#cur = con.cursor() ## create here?
def q( self, q ):
cur = self.con.cursor() ## it's currently here
cur.execute( q )
Any other suggestions on how to speed up the script are welcome too. The insert statement is the same for all the queries in the script.
Opening and closing connections is never free, it always wastes some amount of performance.
The reason you wouldn't want to just leave the connection open is that if two requests were to come in at the same time the second request would have to wait till the first request had finished before it could do any work.
One way to solve this is to use connection pooling. You create a bunch of open connections and then reuse them. Every time you need to do a query you check a connection out of the pool, preform the request and then put it back into the pool.
Setting all this up can be quite tedious, so I would recommend using SQLAlchemy. It has built in connection pooling, relatively low overhead and supports MySQL.
Since you care about speed I would only use the core part of SQLAlchemy since the ORM part comes is a bit slower.
I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?
Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.
It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.
Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.