Parallelize integration tests with pytest-xdist - python

I was wondering if it is possible to use an in memory SQLite database to perform integration tests in parallel using pytest and pytrest-xdist on a FastAPI application?
Update
I have a good number of tests that I would like to run during my CI (GitLab CI), however, due to the number of IOPS that need to be executed for each test when using a file for SQLite the job times out so I would like to use an in-memory database, as well as parallelize the tests using pytest-xdist.
Every endpoint uses FastAPI's dependency injection for the db context, and what I have tried is to create a fixture for the app as so:
#pytest.fixture(scope="function")
def app():
"""
Pytest fixture that creates an instance of the FastAPI application.
"""
app = create_app()
app.dependency_overrides[get_db] = override_get_db
return app
def override_get_db():
SQLALCHEMY_DATABASE_URL = f"sqlite:///:memory:"
engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check_same_thread": False}
)
Base.metadata.drop_all(bind=engine)
Base.metadata.create_all(bind=engine)
TestLocalSession = sessionmaker(autocommit=False, autoflush=False, bind=engine)
init_db(session=TestLocalSession)
engine.execute("PRAGMA foreign_keys=ON;")
try:
db = TestLocalSession()
yield db
finally:
db.close()
Because the endpoints are all sync I also need to use httpx instead of the built in TestClient:
#pytest.fixture(scope='function')
async def client(app):
"""
Pytest fixture that creates an instance of the Flask test client.
"""
async with AsyncClient(
app=app, base_url=f"{settings.BASE_URL}{settings.API_PREFIX}"
) as client:
yield client
The issue I have when I run this test (without pytest-xdist) is that the database is being created in a seperate thread as that which is being injected into the endpoints so I always get a SQL error: sqlite3.OperationalError: no such table: certification
Any suggestion on how to solve this? Thanks.

Related

SQLAlchemy with asyncpg crashing with error: asyncpg.InterfaceError - cannot perform operation: another operation is in progress

I am developing a fastapi server using sqlalchemy and asyncpg to work with a postgres database. For each request, a new session is created (via fastapi dependency injection, as in the documentation). I used sqlite+aiosqlite before postgres+asyncpg and everything worked perfectly. After I switched from sqlite to postgres, every fastapi request crashed with the error:
sqlalchemy.dialects.postgresql.asyncpg.InterfaceError - cannot perform operation: another operation is in progress
This is how I create the engine and sessions:
from typing import Generator
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
user = os.getenv('PG_USER')
password = os.getenv('PG_PASSWORD')
domain = os.getenv('PG_DOMAIN')
db = os.getenv('PG_DATABASE')
# db_async_url = f'sqlite+aiosqlite:///database.sqlite3'
db_async_url = f'postgresql+asyncpg://{user}:{password}#{domain}/{db}'
async_engine = create_async_engine(
db_async_url, future=True, echo=True
)
create_async_session = sessionmaker(
async_engine, class_=AsyncSession, expire_on_commit=False
)
async def get_async_session() -> Generator[AsyncSession]:
async with create_async_session() as session:
yield session
The error disappeared after adding poolclass=NullPool to create_async_engine, so here's what engine creation looks like now:
from sqlalchemy.pool import NullPool
...
async_engine = create_async_engine(
db_async_url, future=True, echo=True, poolclass=NullPool
)
I spent more than a day to solve this problem. I hope my answer will save a lot of time for other developers. Perhaps there are other solutions, and if so, I will be glad to see them here.

'async_generator is not a callable object' FastAPI dependency issue app

I am trying to create a FastAPI and async sqlalchemy.
The get_db dependency causes a weird TypeError: <async_generator object get_db at 0x7ff6d9d9aa60> is not a callable object issue.
Here's my code:
db.py
from typing import Generator
from .db.session import SessionLocal
async def get_db() -> Generator:
try:
db = SessionLocal()
yield db
finally:
await db.close()
session.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from .core.config import settings
engine = create_async_engine(
settings.SQLALCHEMY_DATABASE_URI,
pool_pre_ping=True
)
SessionLocal = AsyncSession(
autocommit=False,
autoflush=False,
bind=engine
)
I followed almost of the instructions posted here: https://6060ff4ffd0e7c1b62baa6c7--fastapi.netlify.app/advanced/sql-databases-sqlalchemy/#more-info
I have figured this out, basically when you call the generator get_db() as a dependency for a FastAPI endpoint, you basically just call it as get_db without the parenthesis.
For example:
from typing import List, Any
from fastapi import APIRouter, HTTPException, Depends, status
from sqlalchemy.ext.asyncio import AsyncSession
from . import models, crud, schemas
from .deps.db import get_db
router = APIRouter()
#router.post('/',
response_model=schemas.StaffAccount,
status_code=status.HTTP_201_CREATED)
async def create_staff_account(
db: AsyncSession = Depends(get_db),
staff_acct: schemas.StaffAccountCreate = Depends(schemas.StaffAccountCreate)
) -> Any:
q = await crud.staff.create(db=db, obj_in=staff_acct)
if not q:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail='An error occurred while processing your request')
return q
This is such a minor detail, that can get in the way of some beginners (like me). So please look more closely at your code.
The problem is
engine = create_async_engine(
settings.SQLALCHEMY_DATABASE_URI,
pool_pre_ping=True
)
You are filling engine with a promise that has to be fulfilled yet. Basically the async functionality allows you to go on with the code while some I/O or networking stuff is still pending.
So, you are passing the engine as parameter, although the connection may not have been established yet.
You should await for the return of the engine before using it as a parameter for other functions.
Here's some more information about the async functionality of python
https://www.educba.com/python-async/

WSGI app factory pattern and import time created module level objects

I've been thinking about the factory pattern for WSGI applications, as recommended by the Flask docs, for a while now. Specifically about those functions usually being shown to make use of objects that have been created at module import time, like db in the example, as opposed to having been created in the factory function.
Would the factory function ideally create _everything_ anew or wouldn't that make sense for objects like the db engine?
(I'm thinking cleaner separation and better testability here.)
Here is some code, where I'm trying to accomplish creating all needed objects for the wsgi app. in its factory function.
# factories.py
def create_app(config, engine=None):
"""Create WSGI application to be called by WSGI server. Full factory function
that takes care to deliver entirely new WSGI application instance with all
new member objects like database engine etc.
Args:
config (dict): Dict to update the wsgi app. configuration.
engine (SQLAlchemy engine): Database engine to use.
"""
# flask app
app = Flask(__name__) # should be package name instead of __name__ acc. to docs
app.config.update(config)
# create blueprint
blueprint = ViewRegistrationBlueprint('blueprint', __name__, )
# request teardown behaviour, always called, even on unhandled exceptions
# register views for blueprint
from myapp.views import hello_world
# dynamically scrapes module and registers methods as views
blueprint.register_routes(hello_world)
# create engine and request scoped session for current configuration and store
# on wsgi app
if (engine is not None):
# delivers transactional scope when called
RequestScopedSession = scoped_session(
sessionmaker(bind=engine),
scopefunc=flask_request_scope_func
)
def request_scoped_session_teardown(*args, **kwargs):
"""Function to register and call by the framework when a request is finished
and the session should be removed.
"""
# wrapped in try/finally to make sure no error collapses call stack here
try:
RequestScopedSession.remove() # rollback all pending changes, close and return conn. to pool
except Exception as exception_instance:
msg = "Error removing session in request teardown.\n{}"
msg = msg.format(exception_instance)
logger.error(msg)
finally:
pass
app.config["session"] = RequestScopedSession
blueprint.teardown_request(request_scoped_session_teardown)
# register blueprint
app.register_blueprint(blueprint)
return app
def create_engine(config):
"""Create database engine from configuration
Args:
config (dict): Dict used to assemble the connection string.
"""
# connection_string
connection_string = "{connector}://{user}:{password}#{host}/{schema}"
connection_string = connection_string.format(**config)
# database engine
return sqlalchemy_create_engine(
connection_string,
pool_size=10,
pool_recycle=7200,
max_overflow=0,
echo=True
)
# wsgi.py (served by WSGI server)
from myapp.factories import create_app
from myapp.factories import create_engine
from myapp.configuration.config import Config
config = Config()
engine = create_engine(config.database_config)
app = create_app(config.application_config, engine=engine)
# conftest.py
from myapp.factories import create_app
from myapp.factories import create_engine
from myapp.configuration.config import Config
#pytest.fixture
def app():
config = TestConfig()
engine = create_engine(config.database_config)
app = create_app(config.application_config, engine=engine)
with app.app_context():
yield app
As you also tagged this with sanic I'll answer with that background. Sanic is async and thus relies on an event loop. An event loop is a resource and thus must not be shared between tests but created anew for each one. Hence, the database connection etc also need to be created for each test and cannot be re-used as it is async and depends on the event loop. Even without the async nature it would be cleanest to create db connections per test because they have state (like temp tables).
So I ended up with a create_app() that creates everything which allows me to create an arbitrary number of independent apps in a test run. (To be honest there are some global resources like registered event listeners, but tearing those down is easy with py.test factories.) For testability I'd try to avoid global resources that are created on module import. Although I've seen differently in big and successful projects.
That's not really a definite answer, I know...

Flask SQLAlchemy sessions out of sync

I have a Flask REST API, running with a gunicorn/nginx stack. There is global SQLAlchemy session set up once for each thread that the API runs on. I set up an endpoint /test/ for running the unit tests for the API. One test makes a POST request to add something to the database, then has a finally: clause to clean up:
def test_something():
try:
url = "http://myposturl"
data = {"content" : "test post"}
headers = {'content-type': 'application/json'}
result = requests.post(url, json=data, headers=headers).json()
validate(result, myschema)
finally:
db.sqlsession.query(MyTable).filter(MyTable.content == "test post").delete()
db.sqlsession.commit()
The problem is that the thread to which the POST request is made now has a "test post" object in its session, but the database has no such object because the thread on which the tests ran deleted that thing from the database. So when I make a GET request to the server, about 1 in 4 times (I have 4 gunicorn workers), I get the "test post" object, and 3 in 4 times I do not. This is because the threads each have their own session object, and they are getting out of sync, but I don't really know what to do about it....
Here is my setup for my SQLAlchemy session:
def connectSQLAlchemy():
import sqlalchemy
import sqlalchemy.orm
engine = sqlalchemy.create_engine(connection_string(DBConfig.USER, DBConfig.PASSWORD, DBConfig.HOST, DBConfig.DB))
session_factory = sqlalchemy.orm.sessionmaker(bind=engine)
Session = sqlalchemy.orm.scoped_session(session_factory)
return Session()
# Create a global session for everyone
sqlsession = connectSQLAlchemy()
Please use flask-sqlalchemy if you're using flask, it takes care of the lifecycle of the session for you.
If you insist on doing it yourself, the correct pattern is to create a session for each request instead of having a global session. You should be doing
Session = scoped_session(session_factory, scopefunc=flask._app_ctx_stack.__ident_func__)
return Session
instead of
Session = scoped_session(session_factory)
return Session()
And do
session = Session()
every time you need a session. By virtue of the scoped_session and the scopefunc, this will return you a different session in each request, but the same session in the same request.
Figured it out. What I did was to add a setup and teardown to the request in my app's __init__.py:
#app.before_request
def startup_session():
db.session = db.connectSQLAlchemy()
#app.teardown_request
def shutdown_session(exception=None):
db.session.close()
still using the global session object in my db module:
db.py:
....
session = None
....
The scoped_session handles the different threads, I think...
Please advise if this is a terrible way to do this for some reason. =c)

How to cache SQL Alchemy calls with Flask-Cache and Redis?

I have a Flask app that takes parameters from a web form, queries a DB with SQL Alchemy and returns Jinja-generated HTML showing a table with the results. I want to cache the calls to the DB. I looked into Redis (Using redis as an LRU cache for postgres), which led me to http://pythonhosted.org/Flask-Cache/.
Now I am trying to use Redis + Flask-Cache to cache the calls to the DB. Based on the Flask-Cache docs, it seems like I need to set up a custom Redis cache.
class RedisCache(BaseCache):
def __init__(self, servers, default_timeout=500):
pass
def redis(app, config, args, kwargs):
args.append(app.config['REDIS_SERVERS'])
return RedisCache(*args, **kwargs)
From there I would need to something like:
# not sure what to put for args or kwargs
cache = redis(app, config={'CACHE_TYPE': 'redis'})
app = Flask(__name__)
cache.init_app(app)
I have two questions:
What do I put for args and kwargs? What do these mean? How do I set up a Redis cache with Flask-Cache?
Once the cache is set up, it seems like I would want to somehow "memoize" the calls the DB so that if the method gets the same query it has the output cached. How do I do this? My best guess would be to wrap the call the SQL Alchemy in a method that could then be given memoize decorator? That way if two identical queries were passed to the method, Flask-Cache would recognize this and return to the appropriate response. I'm guessing that it would look like this:
#cache.memoize(timeout=50)
def queryDB(q):
return q.all()
This seems like a fairly common use of Redis + Flask + Flask-Cache + SQL Alchemy, but I am unable to find a complete example to follow. If someone could post one, that would be super helpful -- but for me and for others down the line.
You don't need to create custom RedisCache class. The docs is just teaching how you would create new backends that are not available in flask-cache. But RedisCache is already available in werkzeug >= 0.7, which you might have already installed because it is one of the core dependencies of flask.
This is how I could run the flask-cache with redis backend:
import time
from flask import Flask
from flask_cache import Cache
app = Flask(__name__)
cache = Cache(app, config={'CACHE_TYPE': 'redis'})
#cache.memoize(timeout=60)
def query_db():
time.sleep(5)
return "Results from DB"
#app.route('/')
def index():
return query_db()
app.run(debug=True)
The reason you're getting "ImportError: redis is not a valid FlaskCache backend" is probably because you don't have redis (python library) installed which you can simply install by:
pip install redis.
your redis args would look something like this:
cache = Cache(app, config={
'CACHE_TYPE': 'redis',
'CACHE_KEY_PREFIX': 'fcache',
'CACHE_REDIS_HOST': 'localhost',
'CACHE_REDIS_PORT': '6379',
'CACHE_REDIS_URL': 'redis://localhost:6379'
})
Putting the #cache.memoize over a method that grabs the info from the DB should work.

Categories

Resources