I want to make a Database Application Programming Interface written in Python and using SQLAlchemy (or any other database connectors if it is told that using SQLAlchemy for this kind of task is not the good way to go). The setup is a MySQL server running on Linux or BSD and a the Python software running on a Linux or BSD machine (Either foreign or local).
Basically what I want to do is spawn a new thread for each connections and the protocol would be custom and quite simple, although for each requests I would like to open a new transaction (or session as I have read) and then I need to commit the session. The problem I am facing right now is that there is high probability that another sessions happen at the same time from another connection.
My question here is what should I do to handle this situation?
Should I use a lock so only a single session can run at the same time?
Are sessions actually thread-safe and I am wrong about thinking that they are not?
Is there a better way to handle this situation?
Is threading the way not-to-go?
Session objects are not thread-safe, but are thread-local. From the docs:
"The Session object is entirely designed to be used in a non-concurrent fashion, which in terms of multithreading means "only in one thread at a time" .. some process needs to be in place such that mutltiple calls across many threads don’t actually get a handle to the same session. We call this notion thread local storage."
If you don't want to do the work of managing threads and sessions yourself, SQLAlchemy has the ScopedSession object to take care of this for you:
The ScopedSession object by default uses threading.local() as storage, so that a single Session is maintained for all who call upon the ScopedSession registry, but only within the scope of a single thread. Callers who call upon the registry in a different thread get a Session instance that is local to that other thread.
Using this technique, the ScopedSession provides a quick and relatively simple way of providing a single, global object in an application that is safe to be called upon from multiple threads.
See the examples in Contextual/Thread-local Sessions for setting up your own thread-safe sessions:
# set up a scoped_session
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# now all calls to Session() will create a thread-local session
some_session = Session()
# you can now use some_session to run multiple queries, etc.
# remember to close it when you're finished!
Session.remove()
Related
I have a file called db.py with the following code:
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine('sqlite:///my_db.sqlite')
session = scoped_session(sessionmaker(bind=engine,autoflush=True))
I am trying to import this file in various subprocesses started using a spawn context (potentially important, since various fixes that worked for fork don't seem to work for spawn)
The import statement is something like:
from db import session
and then I use this session ad libitum without worrying about concurrency, assuming SQLite's internal locking mechanism will order transactions as to avoid concurrency error, I don't really care about transaction order.
This seems to result in errors like the following:
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139813508335360 and this is thread id 139818279995200.
Mind you, this doesn't directly seem to affect my program, every transaction goes through just fine, but I am still worried about what's causing this.
My understanding was that scoped_session was thread-local, so I could import it however I want without issues. Furthermore, my assumption was that sqlalchemy will always handle the closing of connections and that sqllite will handle ordering (i.e. make a session wait for another seesion to end until it can do any transaction).
Obviously one of these assumptions is wrong, or I am misunderstanding something basic about the mechanism here, but I can't quite figure out what. Any suggestions would be useful.
The problem isn't about thread-local sessions, it's that the original connection object is in a different thread to those sessions. SQLite disables using a connection across different threads by default.
The simplest answer to your question is to turn off sqlite's same thread checking. In SQLAlchemy you can achieve this by specifying it as part of your database URL:
engine = create_engine('sqlite:///my_db.sqlite?check_same_thread=False')
I'm guessing that will do away with the errors, at least.
Depending on what you're doing, this may still be dangerous - if you're ensuring your transactions are serialised (that is, one after the other, never overlapping or simultaneous) then you're probably fine. If you can't guarantee that then you're risking data corruption, in which case you should consider a) using a database backend that can handle concurrent writes, or b) creating an intermediary app or service that solely manages sqlite reads and writes and that your other apps can communicate with. That latter option sounds fun but be warned you may end up reinventing the wheel when you're better off just spinning up a Postgres container or something.
I am working on a system which uses SQLAlchemy to read/write to a MySQL database. I have a Factory which makes multiple repositories each one with his own session. I read the documentation os SQLAlchemy and it states that one session should not be used by different processes.
I cannot use same session as it the code will run on different machines.
My question is, is it a good practice to make different sessions? Will it have concurrency problems or races?
Example:
If i have 2 sessions writing multiple records to the db, and have a collision of a record. Will the session.commit() abort everything?
SQLAlchemy sessions are light-weight objects so no trouble to create many sessions, and you have to use multiple sessions when you use multiple processes or multiple threads because a session cannot be shared between processes nor threads.
If there is a collision between 2 sessions, then the collision will be raised on session.commit or session autocommit depending on session configuration.
It's better to use sessions with transactions to ensure atomicity property:
session = get_session()
with session.begin():
session.add(db_obj0)
session.add(db_obj1)
both db_obj0 and db_obj1 will be created or none.
My Python endless loop detect actions, which run threads - Python next functions. Every of this threads is used SQLalchemy object to operate on (the same) MySQL DB. This threads are starting in period of many hours and even days or for several at once- depend on users activity.
What is /should be optimal way to use DB? All of them could use the same global db object or each of them should create the new-one?
Do note that SQLAlchemy session/connection is not thread safe.
The Session object is entirely designed to be used in a non-concurrent fashion, which in terms of multithreading means “only in one thread at a time”.
In a nutshell, every session has a backing store to track all objects (model instances) that is added/removed/modified within the session. Sharing session and the tracked objects across threads will not play nice with this backing store as you cannot guarantee thread safety.
You can use scoped_session that provides scoped management of session objects (and the underlying connection pool). This way your sessions are bind to a Thread Local Scope, and you can use a single session within your threaded operation without worrying about concurrency.
We call this notion thread local storage, which means, a special object is used that will maintain a distinct object per each application thread. Python provides this via the threading.local() construct.
A scoped_session uses theading.local() as storage, and a single session is maintained when called upon within the scope of a single thread. Callers from a different thread to scoped_session will get a different session object.
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
# Note that session_factory, some_engine,
# and scoped_session are global objects
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# Calls to `Session` will return a `session` object
# that is backed by a thread-local store
some_session = Session()
# Somewhere down the line you call `Session` again
# within the same thread will yield the same session object
some_other_session = Session()
some_session is some_other_session # True
Above, some_session is an instance of Session, which we can now use to talk to the database. This same Session is also present within the scoped_session registry we’ve created. If we call upon the registry a second time, we get back the same Session:
# All objects managed by `some_session` is stored in thread-local
some_session.add(..)
some_session.remove(..)
some_session.query(..)
some_session.commit() # or .rollback()
# Calling `Session.remove()` closes `some_session`
# and returns the `connection` back to the pool for reuse
Session.remove()
It is important to call Session.remove() (Session here being a scoped_session) instead of just some_session.close(). This has the effect of cleaning up work.
The scoped_session.remove() method first calls Session.close() on the current Session, which has the effect of releasing any connection/transactional resources owned by the Session first, then discarding the Session itself. “Releasing” here means that connections are returned to their connection pool and any transactional state is rolled back, ultimately using the rollback() method of the underlying DBAPI connection.
i have a real headache from trying to understand the cause of the following problem. We are using a combination of the following libraries:
pyTelegramBotAPI to process requests in a multi-threaded way
SQLAlchemy
sqlite
The SQLAlchemy was first using NullPool and now is configured to utilize QueuePool. I am also using the following idiom to have a new DB session firing up for each thread (as per my understanding)
Session = sessionmaker(bind=create_engine(classes.db_url, poolclass=QueuePool))
#contextmanager
def session_scope():
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
#bot.message_handler(content_types=['document'])
def method_handler:
with session_scope() as session:
do_database_stuff_here(session)
Nevertheless I am still getting this annoying exception: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread
Any ideas? ;) In particular, i don't get how it is possible for another tread to get somewhere in between the db operations...this is probably the reason of the pesky exception
update 1: if i change the poolclass to SingletonThreadPool, then there seems to be no more errors coming up. However, the documentation of SQLAlchemy tells that it's not production rife.
As you can see in the source, sqlite will raise this exception inside pysqlite_check_thread if the connection object is reused across any threads.
By using a QueuePool, you are telling SQLAchemy it is safe to reuse connections across multiple threads. It will therefore just pick a connection from the pool for any session no matter which thread it is on. This is why you're hitting the error. The first time you create and use a connection, you'll be fine; however the next use will probably be on a different thread and so fail the check.
This is why SQLAlchemy mandates the use of other pools such as SingletonThreadPool and NullPool.
Assuming you are using a file based database, you should use the NullPool. This will give you good concurrency on reads. Write access concurrency is always going to be an issue for sqlite; if you need this, you probably want a diffenet database.
Something that may be worth trying: use scoped_session instead of your contextmanager; scoped_session implicitly creates a thread-local session when it is accessed from a different thread. Be sure also to use NullPool.
from sqlalchemy.orm import scoped_session
sessionmaker(bind=create_engine(classes.db_url, poolclass=NullPool))
session = scoped_session()
Note that you can use this scoped session directly as if it were just a regular session, even though it is actually creating thread-local sessions behind the scenes when it is being used.
For scoped_session, should call session.remove() for after you're done (i.e., after each method_handler) and explicitly call session.commit() as needed.
In theory, your context manager should work in giving each thread its own session, but, for lack of better explanation, I wonder if there are multiple threads accessing that session within the context.
I'm running PostgreSQL 9.3 and SQLAlchemy 0.8.2 and experience database connections leaking. After deploying the app consumes around 240 connections. Over next 30 hours this number gradually grows to 500, when PostgreSQL will start dropping connections.
I use SQLAlchemy thread-local sessions:
from sqlalchemy import orm, create_engine
engine = create_engine(os.environ['DATABASE_URL'], echo=False)
Session = orm.scoped_session(orm.sessionmaker(engine))
For the Flask web app, the .remove() call to the Session proxy-object is send during request teardown:
#app.teardown_request
def teardown_request(exception=None):
if not app.testing:
Session.remove()
This should be the same as what Flask-SQLAlchemy is doing.
I also have some periodic tasks that run in a loop, and I call .remove() for every iteration of the loop:
def run_forever():
while True:
do_stuff(Session)
Session.remove()
What am I doing wrong which could lead to a connection leak?
If I remember correctly from my experiments with SQLAlchemy, the scoped_session() is used to create sessions that you can access from multiple places. That is, you create a session in one method and use it in another without explicitly passing the session object around.
It does that by keeping a list of sessions and associating them with a "scope ID". By default, to obtain a scope ID, it uses the current thread ID; so you have session per thread. You can supply a scopefunc to provide – for example – one ID per request:
# This is (approx.) what flask-sqlalchemy does:
from flask import _request_ctx_stack as context_stack
Session = orm.scoped_session(orm.sessionmaker(engine),
scopefunc=context_stack.__ident_func__)
Also, take note of the other answers and comments about doing background tasks.
First of all, it is a really really bad way to run background tasks. Try any ASync scheduler like celery.
Not 100% sure so this is a bit of a guess based on the information provided, but I wonder if each page load is starting a new db connection which is then listening for notifications. If this is the case, I wonder if the db connection is effectively removed from the pool and so gets created on the next page load.
If this is the case, my recommendation would be to have a separate DBI database handle dedicated to listening for notifications so that these are not active in the queue. This might be done outside your workflow.
Also
Particularly, the leak is happening when making more than one simultaneous requests. At the same time, I could see some of the requests were left with uncompleted query execution and timing out. You can write something to manage this yourself.