I'm currently using mysql.connector in a python Flask project and, after users enter their information, the following query is executed:
"SELECT first, last, email, {} FROM {} WHERE {} <= {} AND ispaired IS NULL".format(key, db, class_data[key], key)
It would pose a problem if this query was executed in 2 threads concurrently, and returned the same row in both threads. I was wondering if there was a way to prevent SELECT mysql queries from executing concurrently, or if this was already the default behavior of mysql.connector? For additional information, all mysql.connector queries are executed after being authenticated with the same account credentials.
It is hard to say from your description, but if you're using Flask, you're most probably using (or will use in production) multiple processes, and you probably have a connection pool (i.e. multiple connections) in each process. So while each connection is executing queries sequentially, this query can be ran concurrently by multiple connections at the same time.
To prevent your application from obtaining the same row at the same time while handling different requests, you should use transactions and techniques like SELECT FOR UPDATE. The exact solution depends on your exact use case.
Related
I want to execute multiple queries without each blocking other. I created multiple cursors and did the following but got mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query
import mysql.connector as mc
from threading import Thread
conn = mc.connect(#...username, password)
cur1 = conn.cursor()
cur2 = conn.cursor()
e1 = Thread(target=cur1.execute, args=("do sleep(30)",)) # A 'time taking' task
e2 = Thread(target=cur2.execute, args=("show databases",)) # A simple task
e1.start()
e2.start()
But I got that OperationalError. And reading a few other questions, some suggest that using multiple connections is better than multiple cursors. So shall I use multiple connections?
I don't have the full context of your situation to understand the performance considerations. Yes, starting a new connection could be considered heavy if you are operating under strict timing constraints that are short relative to the time it takes to start a new connection and you were forced to do that for every query...
But you can mitigate that with a shared connection pool that you create ahead of time, and then distribute your queries (in separate threads) over those connections as resources allow.
On the other hand, if all of your query times are fairly long relative to the time it takes to create a new connection, and you aren't looking to run more than a handful of queries in parallel, then it can be a reasonable option to create connections on demand. Just be aware that you will run into limits with the number of open connections if you try to go too far, as well as resource limitations on the database system itself. You probably don't want to do something like that against a shared database. Again, this is only a reasonable option within some very specific contexts.
I am using Snowflake Database-as-a-service to store and process our data. Due to handling huge amounts of data, I want to run a query, get the query ID and let it the query execute asynchronously. Another part of the system will monitor the status of the query by checking the query history table using that query ID.
I am using the Snowflake Python Connector.
Here is a sample of what I have so far:
from __future__ import print_function
import io, os, sys, time, datetime
modules_path = os.path.join(os.path.dirname(__file__), 'modules')
sys.path.append(modules_path)
import snowflake.connector
def async_query(data):
connection = snowflake.connector.connect(
user=data['user'],
password=data['password'],
account=data['account'],
region=data['region'],
database=data['database'],
warehouse=data['warehouse'],
schema=data['schema']
)
cursor = connection.cursor()
cursor.execute(data['query'], _no_results=True)
print(cursor.sfqid)
return cursor.sfqid
This piece of code seems to be working, i.e I am getting the query ID, but there is one problem - the SQL query fails with error "SQL execution canceled." in Snowflake. If I remove the _no_results=True parameter, the query works well, but then I have to wait it to complete, which is not the desired behaviour.
Any ideas what is causing the "SQL execution canceled" failure?
A little bit of more info: The reason why I don't want to wait for it, is that I am running the code on AWS Lambda and Lambdas have a maximum running time of 5 minutes.
If _no_results=True is not specified, the execution is synchronized, so the application has to wait for the query to finish. If specified, the query becomes async, so the application will continue running, but the destructor of connection will close the session in the end, and all active queries will be canceled. It seems that's the cause of "SQL execution canceled".
AWS lambda limits the execution time to 5 min, so if the query takes more than the limit, it won't work.
Btw _no_results=True is an internal parameter used for SnowSQL, and its behavior is subject to change in the future.
I'm new to Flask/Gunicorn and have a very basic understanding of SQL.
I have a Flask app that connects to a remote oracle database with cx_oracle. Depending on the app route selected, it runs one of two queries. I run the app using gunicorn -w 4 flask:app. The first query is a simple query on a table with ~70000 rows and is very responsive. The second one is more complex, and queries several tables, one of which contains ~150 million rows. Through sprinkling print statements around, I notice that sometimes the second query never even starts, especially if it is not the first app.route selected by the user and they're both to be running concurrently. Opening the app.route('/') multiple times will trigger its query multiple times quickly and run it in parallel, but not with app.route('/2'). I have multiple workers enabled, and threaded=True for oracle. Why is this happening? Is it doomed to be slow/downright unresponsive due to the size of the table?
import cx_Oracle
from flask import Flask
import pandas as pd
app = Flask(__name__)
connection = cx_Oracle.connect("name","pwd", threaded=True)
#app.route('/')
def Q1():
print("start q1")
querystring=""" select to_char(to_date(col1,'mm/dd/yy'),'Month'), sum(col2)
FROM tbl1"""
df=pd.read_sql(querystring=,con=connection)
print("q1 complete")
#app.route('/2')
def Q2():
print("start q2")
querystring=""" select tbl2.col1,
tbl2.col2,
tbl3.col3
FROM tbl2 INNER JOIN
tbl3 ON tbl2.col1 = tbl3.col1
WHERE tbl2.col2 like 'X%' AND
tbl2.col4 >=20180101"""
df=pd.read_sql(querystring=,con=connection)
print("q2 complete")
I have tried exporting the datasets for each query as csvs and have pandas read the csvs instead, in this scenario, both reads are can run concurrently very well, and doesn't miss a beat. Is this a SQL issue, thread issue, worker issue?
Be aware that a connection can only process one thing at a time. If the connection is busy executing one of the queries, it can't execute the other one. Once execution is complete and fetching has begun the two can operate together, but each one has to wait for the other one to complete its fetch operation before the other one can begin. To get around this you should use a session pool (http://cx-oracle.readthedocs.io/en/latest/module.html#cx_Oracle.SessionPool) and then in each of your routes add this code:
connection = pool.acquire()
None of that will help the performance of the one query, but at least it will prevent interference from it!
I am running two python files on one cpu in parallel, both of which make use of the same sqlite3 database. I am handling the sqlite3 database using sqlalchemy and my understanding is that sqlalchemy handles all the threading database issues within one app. My question is how to handle the access from the two different apps?
One of my two programs is a flask application and the other is a cronjob which updates the database from time to time.
It seems that even read-only tasks on the sqlite database lock the database, meaning that if both apps want to read or write at the same time I get an error.
OperationalError: (sqlite3.OperationalError) database is locked
Lets assume that my cronjob app runs every 5min. How can I make sure that there are no collisions between my two apps? I could write some read flag into a file which I check before accessing the database, but it seems to me there should be a standard way to do this?
Furthermore I am running my app with gunicorn and in principle it is possible to have multiple jobs running... so I eventually want more than 2 parallel jobs for my flask app...
thanks
carl
It's true. Sqlite isn't built for this kind of application. Sqlite is really for lightweight single-threaded, single-instance applications.
Sqlite connections are one per instance, and if you start getting into some kind of threaded multiplexer (see https://www.sqlite.org/threadsafe.html) it'd be possible, but it's more trouble than it's worth. And there are other solutions that provide that function-- take a look at Postgresql or MySQL. Those DB's are open source, are well documented, well supported, and support the kind of concurrency you need.
I'm not sure how SQLAlchemy handles connections, but if you were using Peewee ORM then the solution is quite simple.
When your Flask app initiates a request, you will open a connection to the DB. Then when Flask sends the response, you close the DB.
Similarly, in your cron script, open a connection when you start to use the DB, then close it when the process is finished.
Another thing you might consider is using SQLite in WAL mode. This can improve concurrency. You set the journaling mode with a PRAGMA query when you open your connection.
For more info, see http://charlesleifer.com/blog/sqlite-small-fast-reliable-choose-any-three-/
This question already has answers here:
SQLite Concurrent Access
(8 answers)
Closed 9 years ago.
I'm using the sqlite3 python module to write the results from batch jobs to a common .db file. I chose SQLite because multiple processes may try to write at the same time, and as I understand it SQLite should handel this well. What I'm unsure of is what happens when multiple processes finish and try to write at the same time. So if several processes that look like this
conn = connect('test.db')
with conn:
for v in xrange(10):
tup = (str(v), v)
conn.execute("insert into sometable values (?,?)", tup)
execute at once, will they throw an exception? Wait politely for the other processes to write? Is there some better way to do this?
The sqlite library will lock the database per process when writing to the database and each process will wait for the lock to be released to get their turn.
The database doesn't need to be written to until commit time however. You are using the connection as a context manager (good!) so the commit takes place after your loop has completed and all insert statements have been executed.
If your database has uniqueness constraints in place, it may be that the commit fails because one process has already added rows that another process conflicts with.
If each process holds it's own connection than it should be fine.
What will happen is that when writing the process will lock the DB,
so all other process will block. They will throw an exception if the timeout
to wait for the DB to be free is exceeded. The timeout can be configured through the connect call:
http://docs.python.org/2/library/sqlite3.html#sqlite3.connect
It is not recommended that you have your DB file in a network share.
Update:
You may also want to check the isolation level: http://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.isolation_level
The good news is that SQLLite library implicitly uses a transaction that locks a database whenever executing a DML. This means that other concurrent accesses to the database will wait till the executing DML request completes by commiting/rolling back a transaction. Note however that multiple processes can perform SELECT at the same time.
Also, please refer to the Python SQL Lite 3.0 module under section 11.13.6 - Controlling Transactions that details how transactions can be controlled.