I have recently been exploring the Tornado web framework to serve a lot of consistent connections by lots of different clients.
I have a request handler that basically takes an RSA encrypted string and decrypts it. The decrypted text is an XML string that gets parsed by a SAX document handler that I have written. Everything works perfectly fine and the execution time (per HTTP request) was roughly 100 milliseconds (with decryption and parsing).
The XML contains the Username and Password hash of the user. I want to connect to a MySQL server to verify that the username matches the password hash supplied by the application.
When I add basically the following code:
conn = MySQLdb.connect (host = "192.168.1.12",
user = "<useraccount>",
passwd = "<Password>",
db = "<dbname>")
cursor = conn.cursor()
safe_username = MySQLdb.escape_string(XMLLoginMessage.username)
safe_pass_hash = MySQLdb.escape_string(XMLLoginMessage.pass_hash)
sql = "SELECT * FROM `mrad`.`users` WHERE `username` = '" + safe_username + "' AND `password` = '" + safe_pass_hash + "' LIMIT 1;"
cursor.execute(sql)
cursor.close()
conn.close()
The time it takes to execute the HTTP request shoots up to 4 - 5 seconds! I believe this is incurred in the time it takes to connect to the MySql database server itself.
My question is how can I speed this up? Can I declare the MySQL connection in the global scope and access it in the request handlers by creating a new cursor, or will that run into concurrency issues because of the asynchronous design of Tornado?
Basically, how can I not have to incur a new connection to a MySQL server EVERY Http request, so it only takes a fraction of a second instead of multiple seconds to implement.
Also, please note, the SQL server is actually on the same physical machine as the Tornado Web Server instance
Update
I just ran a simple MySQL query through a profiler, the same code below.
The call to 'connections.py' init function took 4.944 seconds to execute alone. That doesn't seem right, does it?
Update 2
I think that running with one connection (or even a few with a very simple DB conn pool) will be fast enough to handle the throughput I'm expecting per tornado web server instance.
If 1,000 clients need to access a query, typical query times being in the thousands of seconds, the unluckiest client would only have to wait one second to retrieve the data.
Consider SQLAlchemy, which provides a nicer abstraction over DBAPI and also provides connection pooling, etc. (You can happily ignore its ORM and just use the SQL-toolkit)
(Also, you're not doing blocking database calls in the asynchronous request handlers?)
An SQL connection should not take 5 seconds. Try to not issue a query and see if that improves your performance - which it should.
The Mysqldb module has a threadsafety of "1", which means the module is thread safe, but connections cannot be shared amongst threads. You can implement a connection pool as an alternative.
Lastly, the DB-API has a parameter replacement form for queries which would not require manually concatenating a query and escaping parameters:
cur.execute("SELECT * FROM blach WHERE x = ? AND y = ?", (x,y))
Declare it in the base handler, it will be called once per application.
Related
I'm migrating tasks managing BigQuery tables from using Google's BigQuery Python client library to their SQLAlchemy plugin for BigQuery dialect. There are several issues during migration of some of the logics:
Query configuration -- in the original Python client, user can pass in a configuration instance (e.g., QueryJobConfig, LoadJobConfig, etc.) when issuing query through client instance. From SQLAlchemy plugin's documentation, it allows you to pass in configuration information during Engine instance initialization, but nothing mentioned when it comes to running query through Connection.execute(), Session.query(), or other methods alike. Is it possible to pass in query configuration like the client library do when issuing queries through SQLAlchemy Core/ORM?
Example 1:
Say I want to query against some dataset within location us-west1, but my default location setup is us-west3. Also I want to timeout when there's no response within a minute. In the original library, I can pass the location information when calling Clinet.query(), and set up timeout when running the job through result() --
from google.cloud import bigquery
client = bigquery.Client(project='my-project-id', location='us-west3')
query_job = client.query('SELECT id, name FROM my_dataset.my_table;', location='us-west1')
result_set = query_job.result(timeout=60)
But native SQLAlchemy doesn't provide these options when issuing queries.
Example 2:
I want to label a query job through QueryJobConfig, so that later I can query analytics of similar jobs from system table.
from google.cloud import bigquery
client = bigquery.Client(project='my-project-id', location='us-west3')
config = bigquery.job.QueryJobConfig(labels={"query_category": "common"})
query_job = client.query('SELECT id, name FROM my_dataset.my_table;', job_config=config, location='us-west1')
result_set = query_job.result()
Again, the SQLAlchemy plugin didn't mention how to pass in the configuration when issuing queries. From documentation the only place where you can set up job config is create_engine () for providing default job settings.
Accessing underlying query job instance, or at least job ID during query execution.
For example, if under some circumstance I need to cancel a query job, with the original Python client I can simply call cancel() on the job instance, but at dbapi / SQLAlchemy's end I can't do it by just calling close() on the connection/cursor instance, because seems that they only mark the connection/cursor as closed without dealing with the ongoing job, from underlying implementation. Hence it would be helpful if it's possible to access query job directly, or get the job ID so I can spin up an native BigQuery client then grab the details.
Example: Say I don't want to rely on client library's timeout mechanism, and would want to cancel the request myself. In the client library, you can send cancel request through QueryJob instance:
from time import sleep
from google.cloud import bigquery
client = bigquery.Client(project='my-project-id', location='us-west3')
long_query = ... # Some long running query
query_job = client.query(long_query) # With no timeout
time.sleep(10)
if not query_job.done():
query_job.cancel()
However, SQLAlchemy doesn't offer way to cancel a running query, hence you'll need to access underlying job details (either the job instance or job ID) so that you can cancel the query elsewhere.
I have an MS-SQL deployed on AWS RDS, that I'm writing a Flask front end for.
I've been following some intro Flask tutorials, all of which seem to pass the DB credentials in the connection string URI. I'm following the tutorial here:
https://medium.com/#rodkey/deploying-a-flask-application-on-aws-a72daba6bb80#.e6b4mzs1l
For deployment, do I prompt for the DB login info and add to the connection string? If so, where? Using SQLAlchemy, I don't see any calls to create_engine (using the code in the tutorial), I just see an initialization using config.from_object, referencing the config.py where the SQLALCHEMY_DATABASE_URI is stored, which points to the DB location. Trying to call config.update(dict(UID='****', PASSWORD='******')) from my application has no effect, and looking in the config dict doesn't seem to have any applicable entries to set for this purpose. What am I doing wrong?
Or should I be authenticating using Flask-User, and then get rid of the DB level authentication? I'd prefer authenticating at the DB layer, for ease of use.
The tutorial you are using uses Flask-Sqlalchemy to abstract the database setup stuff, that's why you don't see engine.connect().
Frameworks like Flask-Sqlalchemy are designed around the idea that you create a connection pool to the database on launch, and share that pool amongst your various worker threads. You will not be able to use that for what you are doing... it takes care of initializing the session and things early in the process.
Because of your requirements, I don't know that you'll be able to make any use of things like connection pooling. Instead, you'll have to handle that yourself. The actual connection isn't too hard...
engine = create_engine('dialect://username:password#host/db')
connection = engine.connect()
result = connection.execute("SOME SQL QUERY")
for row in result:
# Do Something
connection.close()
The issue is that you're going to have to do that in every endpoint. A database connection isn't something you can store in the session- you'll have to store the credentials there and do a connect/disconnect loop in every endpoint you write. Worse, you'll have to either figure out encrypted sessions or server side sessions (without a db connection!) to prevent keeping those credentials in the session from becoming a horrible security leak.
I promise you, it will be easier both now and in the long run to figure out a simple way to authenticate users so that they can share a connection pool that is abstracted out of your app endpoints. But if you HAVE to do it this way, this is how you will do it. (make sure you are closing those connections every time!)
I'm working on an API with Flask and SQLAlchemy, and here's what I would like to do :
I have a client application, working on multiple tablets, that have to send several requests to add content to the server.
But I don't want to use auto rollback at the end of each API request (default behavior with flask-sqlalchemy), because the sending of data is done with multiple requests, like in this very simplified example :
1. beginTransaction/?id=transactionId -> opens a new session for the client making that request. SessionManager.new_session() in the code below.
2. addObject/?id=objectAid -> adds an object to the PostGreSQL database and flush
3. addObject/?id=objectBid -> adds an object to the PostGreSQL database and flush
4. commitTransaction/?id= transactionId -> commit what happened since the beginTransaction. SessionManager.commit() in the code below.
The point here is to not add the data to the server if the client app crashed / lost his connection before the « commitTransaction » was sent, thus preventing from having incomplete data on the server.
Since I don't want to use auto rollback, I can't really use flask-SQLAlchemy, so I'm implementing SQLAlchemy by myself into my flask application, but I'm not sure how to use the sessions.
Here's the implementation I did in the __ init __.py :
db = create_engine('postgresql+psycopg2://admin:pwd#localhost/postgresqlddb',
pool_reset_on_return=False,
echo=True, pool_size=20, max_overflow=5)
Base = declarative_base()
metadata = Base.metadata
metadata.bind = db
# create a configured "Session" class
Session = scoped_session(sessionmaker(bind=db, autoflush=False))
class SessionManager(object):
currentSession = Session()
#staticmethod
def new_session():
#if a session is already opened by the client, close it
#create a new session
try:
SessionManager.currentSession.rollback()
SessionManager.currentSession.close()
except Exception, e:
print(e)
SessionManager.currentSession = Session()
return SessionManager.currentSession
#staticmethod
def flush():
try:
SessionManager.currentSession.flush()
return True
except Exception, e:
print(e)
SessionManager.currentSession.rollback()
return False
#staticmethod
def commit():
#commit and close the session
#create a new session in case the client makes a single request without using beginTransaction/
try:
SessionManager.currentSession.commit()
SessionManager.currentSession.close()
SessionManager.currentSession = Session()
return True
except Exception, e:
print(e)
SessionManager.currentSession.rollback()
SessionManager.currentSession.close()
SessionManager.currentSession = Session()
return False
But now the API doesn’t work when several clients make a request, it seems like every client share the same session.
How should I implement the sessions so that each client has a different session and can make requests concurrently ?
Thank you.
You seem to want several HTTP requests to share one transaction. It's impossible - incompatible with stateless nature of HTTP.
Please consider for example that one client would open transaction and fail to close it because it has lost connectivity. A server has no way of knowing it and would leave this transaction open forever, possibly blocking other clients.
Using transactions to bundle database request is reasonable for example for performance reasons when there's more than one write operation. Or for keeping database consistent. But it always has to be committed or rolled back on the same HTTP request it was open.
I know this is an old thread, but you can achieve this with djondb (NoSQL database),
With djondb you can create transactions and if something goes wrong, i.e. you lost the connection, it does not matter, the transaction could be there forever without affecting the performance, or creating locks, djondb has been made to support long-term transactions, so you can open the transaction, use it, commit it, roll it back or just discard it (close the connection and forget it was there) and it won't leave the database in any inconsistent state.
I know this may sounds weird for Relational guys, but that's the beauty of NoSQL it creates new paradigms supporting what SQL guys say it's impossible.
Hope this helps,
I've ready many posts about this problem. My understanding is the Application has a setting which says how long to keep idle database connections before dropping them and creating new ones. MySQL has a setting that says how long to keep idle connections. After no site activity, MySQL times out the Application's connections. But the Application doesn't know this and still tries using an existing connection, which fails. After the failure the Application drops the connection and makes a new one, and then it is fine.
I have wait_timeout set to 10 seconds on my local mysql server. I have pool_recycle set to 5 seconds on my locally running application. After 10 seconds of inactivity, I make a request, and am still getting this error. Making another request afterwards within 10 seconds, it is then fine. Waiting longer than 10 seconds, it gives this error again.
Any thoughts?
mysql> SELECT ##global.wait_timeout\G
*************************** 1. row ***************************
##global.wait_timeout: 10
1 row in set (0.00 sec)
.
sqlalchemy.twelvemt.pool_recycle = 5
.
engine = engine_from_config(settings, 'sqlalchemy.twelvemt.')
DBSession.configure(bind=engine)
.
OperationalError: (OperationalError) (2006, 'MySQL server has gone away') 'SELECT beaker_cache.data \nFROM beaker_cache \nWHERE beaker_cache.namespace = %s' ('7cd57e290c294c499e232f98354a1f70',)
It looks like the error you're getting is being thrown by your Beaker connection, not your DBSession connection-- the pool_recycle option needs to be set for each connection.
Assuming you're configuring Beaker in your x.ini file, you can pass sqlalchemy options via session.sa.*, so session.sa.pool_recycle = 5
See http://docs.pylonsproject.org/projects/pylons-webframework/en/v0.9.7/sessions.html#sa
Try setting sqlalchemy.pool_recycle for your connection
I always add this into my config file when using mySQL
sqlalchemy.pool_recycle = 3600
Without this I get MySQL server has gone away on the first request after any long pause in activity.
I fixed this by calling remove() on sessions after each request. You can do this by defining a global function:
def remove_session(request, response):
request.dbsession.remove()
Afterwards, you set this function to be run by every class involving requests and a database session:
def __init__(self, request):
request.dbsession = DBSession
request.add_response_callback(remove_session)
This works because SQLAlchemy expects its users to handle the opening and closing of database sessions. More information can be found in the documentation.
I'm using SQLAlchemy in WSGI python web app to query the database. If I do two concurrent requests, the second request invariably throws an exception, with SQL Server stating
[24000] [FreeTDS][SQL Server]Invalid cursor state (0) (SQLExecDirectW)
Unfortunately it looks like I can't use caching to prevent additional requests to the database. Is there another way to resolve this issue? Ideally using native python libraries (i.e. not relying on another python module)?
The only thing I can think of is using threads to put a lock on the function making the database queries, but I'm worried this will slow down the app.
Is there anything else that can be done? Is this a configuration issue?
I'm using FreeTDS v0.91 on a Centos 5.9 server, connecting to MS SQL Server 2008.
The webapp is based on Paste.
Are your two concurrent requests using different database connections? DBAPI connections are not generally threadsafe. At the ORM level, you'd make sure you're using session per request so that each request has its own Session and therefore dedicated DBAPI connection.