SQLAlchemy multithreading without flesk

SQLAlchemy multithreading without flesk - python

I want to access a sqlite database file from the main thread and a background thread. The problem is no matter how i change my code i always get the following problem:
ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id -1250925472 and this is thread id -1225814016
my code looks something like that:
engine = create_engine('sqlite:///data/storage.sqlite', poolclass=NullPool)
Footprint.Base.metadata.create_all(engine)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
def storeData(fp):
s = Session()
s.add(fp)
s.commit()
Has anybody an idea how to fix this annoying problem?

Related

What is the correct way of using Flask-Sqlalchemy with multiprocessing?

I have a route in my Flask app that spawns a process (using multiprocessing.Process) to do some background work. That process needs to be able to write to the database.
__init__.py:
from flask_sqlalchemy import SQLAlchemy
from project.config import Config
db = SQLAlchemy()
# app factory
def create_app(config_class=Config):
app = Flask(__name__)
app.config.from_object(Config)
db.init_app(app)
return app
And this is the relevant code that illustrates that way i'm spawning the process and using the db connection:
def worker(row_id):
db_session = db.create_scoped_session()
# Do stuff with db_session here
db_session.close()
#app.route('/worker/<row_id>/start')
def start(row_id):
p = Process(target=worker, args=(row_id,))
p.start()
return redirect('/')
The problem is that sometimes (not always) i have errors like this one:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) insufficient data in "D" message lost synchronization with server: got message type "a", length 1668573551
I assume that this is related to the fact that there is another process accessing the database (because if i don't use a separate process, everything works fine) but i honestly can't find a way of fixing it. As you can see on my code, i tried used create_scoped_session() method as an attempt to fix this but the problem is the same.
Any help?

Ok so, i followed #antont 's hint and created a new sqlalchemy session inside the worker function this way and it worked flawlessly:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
def worker(row_id):
db_url = os.environ['DATABASE_URL']
db_engine = create_engine(db_url)
Session = sessionmaker(bind=db_engine)
db_session = Session()
# Do stuff with db_session here
db_session.close()

MySQL query errors when connecting from Celery task running on Heroku

I'm seeing wrong query results when executing queries against an external MySQL database, but only when connecting from Celery tasks running on Heroku. The same tasks, when run on my own machine do not show these errors, and the errors only appear about half of the time (although when they fail, all tasks are wrong).
The tasks are managed by Celery via Redis, and the MySQL database does not itself run on Heroku. Both my local machine and Heroku connect to the same MySQL database.
I connect to the database using MySQL, with the pymysql driver, using;
DB_URI = 'mysql+pymysql://USER:PW#SERVER/DB'
engine = create_engine(stats_config.DB_URI, convert_unicode=True, echo_pool=True)
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()
The tasks are executed one by one.
Here is an example of a task with different results:
#shared_task(bind=True, name="get_gross_revenue_task")
def get_gross_revenue_task(self, g_start_date, g_end_date, START_TIME_FORM):
db_session.close()
start_date = datetime.strptime(g_start_date, '%d-%m-%Y')
end_date = datetime.strptime(g_end_date, '%d-%m-%Y')
gross_rev_trans_VK = db_session.query(func.sum(UsersTransactionsVK.amount)).filter(UsersTransactionsVK.date_added >= start_date, UsersTransactionsVK.date_added <= end_date, UsersTransactionsVK.payed == 'Yes').scalar()
gross_rev_trans_Stripe = db_session.query(func.sum(UsersTransactionsStripe.amount)).filter(UsersTransactionsStripe.date_added >= start_date, UsersTransactionsStripe.date_added <= end_date, UsersTransactionsStripe.payed == 'Yes').scalar()
gross_rev_trans = db_session.query(func.sum(UsersTransactions.amount)).filter(UsersTransactions.date_added >= start_date, UsersTransactions.date_added <= end_date, UsersTransactions.on_hold == 'No').scalar()
if gross_rev_trans_VK is None:
gross_rev_trans_VK = 0
if gross_rev_trans_Stripe is None:
gross_rev_trans_Stripe = 0
if gross_rev_trans is None:
gross_rev_trans = 0
print ('gross', gross_rev_trans_VK, gross_rev_trans_Stripe, gross_rev_trans)
total_gross_rev = gross_rev_trans_VK + gross_rev_trans_Stripe + gross_rev_trans
return {'total_rev' : str(total_gross_rev / 100), 'current': 100, 'total': 100, 'statistic': 'get_gross_revenue', 'time_benchmark': (datetime.today() - START_TIME_FORM).total_seconds()}
# Selects gross revenue between selected dates
#app.route('/get-gross-revenue', methods=["POST"])
#basic_auth.required
#check_verified
def get_gross_revenue():
if request.method == "POST":
task = get_gross_revenue_task.apply_async([session['g_start_date'], session['g_end_date'], session['START_TIME_FORM']])
return json.dumps({}), 202, {'Location': url_for('taskstatus_get_gross_revenue', task_id=task.id)}
These are simple and fast tasks, completing within a few seconds.
The tasks fail by producing small differences. For example, for a task where the correct result would by 30111, when things break the task would produce 29811 instead. It is always the code that uses `db
What I tried:
I am already using the same timezone by executing:
db_session.execute("SET SESSION time_zone = 'Europe/Berlin'")
I checked for errors in the worker logs. Although there are some entries like
2013 Lost connection to MySQL
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically
2014 commands out of sync
I haven't found a correlation between SQL errors and wrong results. The wrong tasks results can appear without a lost connection.
A very dirty fix is to hard-code an expected result for one of the tasks, execute that first and then re-submit everything if the result produced is incorrect.
This is probably a cache or isolation level problem with the way I use the SQLAlchemy session. Because I only ever need to use SELECT (no inserts or updates), I also tried different settings for the isolation level, before running tasks, such as
#db_session.close()
#db_session.commit()
#db_session.execute('SET TRANSACTION READ ONLY')
These show an error when I run these on Heroku, but they work when I run them on my Windows machine.
I also tried to alter the connection itself with 'isolation_level="READ UNCOMMITTED', without any result.
I am certain that the workers are not reusing the same db_session.
It seems that only tasks which use db_session in the query can return wrong results. Code using the query attribute on the Base base class (a db_session.query_property() object, e.g. Users.query) does not appear to having issues. I thought this was basically the same thing?

You are re-using sessions between tasks in different workers. Create your session per Celery worker, or even per task.
Know that tasks are actually persisted per worker. You can use this to cache a session for each task, so you don't have to recreate the session each time the task is run. This is easiest done with a custom task class; the documentation uses database connection caching as an example there.
To do this with a SQLAlchemy session, use:
Session = scoped_session(sessionmaker(autocommit=True, autoflush=True))
class SQLASessionTask(Task):
_session = None
#property
def session(self):
if self._session is None:
engine = create_engine(
stats_config.DB_URI, convert_unicode=True, echo_pool=True)
self._session = Session(bind=engine)
return self._session
Use this as:
#shared_task(base=SQLASessionTask, bind=True, name="get_gross_revenue_task")
def get_gross_revenue_task(self, g_start_date, g_end_date, START_TIME_FORM):
db_session = self.session
# ... etc.
This only creates a SQLAlchemy session for the current task only if it needs one, the moment you access self.session.

SQLite & Sql Alchemy SingletonThreadPool - can I share a connection object?

I'm getting errors of the form:
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 139661426296576 and this is thread id 139662493492992
in my multithreaded application.
I'm instantiating my engine with:
from sqlalchemy.pool import SingletonThreadPool
db_path = "sqlite:///" + cwd + "/data/data.db"
create_engine(db_path, poolclass=SingletonThreadPool, pool_size=50)
I had expected the SingletonThreadPool to solve this problem. What am I missing?
(bonus question: for the sake of reduced headache, should I move to MySQL?)

If you are using sqlite3 then you just have to pass the check_same_thread parameter as below:
create_engine(db_path, connect_args={'check_same_thread': False})

try-finally with SqlAlchemy: is this a good habit?

I'm used to do this:
from sqlalchemy.orm import sessionmaker
from sqlalchemy.engine import create_engine
Session = sessionmaker()
engine = create_engine("some connection db string", echo=False)
Session.configure(bind=engine)
db_con = Session()
try:
# DB MANIPULATION
finally:
db_con.close()
Is this a good habit? If so, why sqlalchemy does not permit you to do simply:
with Session() as db_con:
# DB MANIPULATION
?

No, this isn't good practice. It's easy to forget, and will make the code more confusing.
Instead, you can use the contextlib.closing context manager, and make that the only way to get a session.
# Wrapped in a custom context manager for better readability
#contextlib.contextmanager
def get_session():
with contextlib.closing(Session()) as session:
yield session
with get_session() as session:
session.add(...)

Firstly if you are done with the session object you should close the session. session.close will return the connection back to engine pool and if you are exiting the program you should dispose the engine pool with engine.dispose.
Now to your question. In most cases sessions will be used on long running applications like web server. Where it makes sense to centralize the session management. For example in flask-sqlalchemy session is created with start of each web-request and closed when the request of over.

Why does SQLAlchemy/mysql keep timing out on me?

I have 2 functions that need to be executed and the first takes about 4 hours to execute. Both use SQLAlchemy:
def first():
session = DBSession
rows = session.query(Mytable).order_by(Mytable.col1.desc())[:150]
for i,row in enumerate(rows):
time.sleep(100)
print i, row.accession
def second():
print "going onto second function"
session = DBSession
new_row = session.query(Anothertable).order_by(Anothertable.col1.desc()).first()
print 'New Row: ', new_row.accession
first()
second()
And here is how I define DBSession:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy import create_engine
engine = create_engine('mysql://blah:blah#blah/blahblah',echo=False,pool_recycle=3600*12)
DBSession = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base()
Base.metadata.bind = engine
first() finishes fine (takes about 4 hrs) and I see "going onto second function" printed then it immediately gives me an error:
sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
From reading the docs I thought assigning session=DBSession would get two different session instances and so that second() wouldn't timeout. I've also tried playing with pool_recycle and that doesn't seem to have any effect here. In the real world, I can't split first() and second() into 2 scripts: second() has to execute immediately after first()

Your engine (not session) keeps a pool of connections. When a mysql connection has not been used for several hours, mysql server closes the socket, this causes a "Mysql server has gone away" error when you try to use this connection. If you have a simple single-threaded script then calling create_engine with pool_size=1 will probably do the trick. If not, you can use events to ping the connection when it is checked out of the pool. This great answer has all the details:
SQLAlchemy error MySQL server has gone away

assigning session=DBSession would get two different session instances
That simply isn't true. session = DBSession is a local variable assignment, and you cannot override local variable assignment in Python (you can override instance member assignment, but that's unrelated).
Another thing to note is that scoped_session produces, by default, a thread-local scoped session (i.e. all codes in the same thread all have the same session). Since you call first() and second() in the same thread, they are one and the same session.
One thing you can do is to use regular (unscoped) session, just manage your session scope manually and create a new session in both function. Alternatively, you can check the doc about how to define custom session scope.

It doesn't look like you're getting separate Session instances. If the first query is successfully committing, then your Session could be expiring after that commit.
Try setting auto-expire to false for your session:
DBSession = scoped_session(sessionmaker(expire_on_commit=False, autocommit=False, autoflush=False, bind=engine))
and then commit later.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy multithreading without flesk - python

Related

What is the correct way of using Flask-Sqlalchemy with multiprocessing?

MySQL query errors when connecting from Celery task running on Heroku

SQLite & Sql Alchemy SingletonThreadPool - can I share a connection object?

try-finally with SqlAlchemy: is this a good habit?

Why does SQLAlchemy/mysql keep timing out on me?

Categories

Resources