Database testing in python, postgresql - python

How do you unit test your python DAL that is using postgresql.
In sqlite you could create in-memory database for every test but this cannot be done for postgresql.
I want a library that could be used to setup a database and clean it once the test is done.
I am using Sqlalchemy as my ORM.

pg_tmp(1) is a utility intended to make this task easy. Here is how you might start up a new connection with SQLAlchemy:
from subprocess import check_output
from sqlalchemy import create_engine
url = check_output(['pg_tmp', '-t'])
engine = create_engine(url)
This will spin up a new database that is automatically destroyed in 60 seconds. If a connection is open pg_tmp will wait until all active connections are closed.

Have you tried testing.postgresql?

You can use nose to write your tests, then just use SQLAlchemy to create and clean the test database in your setup/teardown methods.

There's QuickPiggy too, which is capable of cleaning up after itself.
From the docs:
A makeshift PostgresSQL instance can be obtained quite easily:
pig = quickpiggy.Piggy(volatile=True, create_db='somedb')
conn = psycopg2.connect(pig.dsnstring())

Related

How to persist a connection pool for asyncpg and utilise it in Databases wrapper?

As per FastApi documentation, I'm using the Databases wrapper and Sqlalchemy Core to do async operations on the postgres database.
I have come across an issue where the connection gets closed in the middle of operation. As it turns out it is an issue with asyncpg and can be resolved by using a pool.
However I'm not using asyncpg directly, but use the Database wrapper as it was recommended by FastAPI. How can I create a pool like this:
await asyncpg.create_pool(database="dbname",
user="username",
password="dbpw",
max_inactive_connection_lifetime=3)
and utilise it within the databases wrapper?
import databases
from sqlalchemy import MetaData
db = databases.Database(settings.SQLALCHEMY_DATABASE_URI)
metadata = MetaData(schema='main')
Under hood database.connect uses asyncpg.create_pool, your SQLALCHEMY_DATABASE_URI already has connection vars, you could also add additional connection options:
db = databases.Database(settings.SQLALCHEMY_DATABASE_URI, max_inactive_connection_lifetime=3)
They will be passed to asyncpg.create_pool on connection.

Why do we need airflow hooks?

Doc says:
Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. Hooks implement a common interface when possible, and act as a building block for operators. Ref
But why do we need them?
I want to select data from one Postgres DB, and store to another one. Can I use, for example, psycopg2 driver inside python script, which runs by a python operator, or airflow should know for some reason what exactly I'm doing inside script, so, I need to use PostgresHook instead of just psycopg2 driver?
You should use just PostresHook. Instead of using psycopg2 as so:
conn = f'{pass}:{server}#host etc}'
cur = conn.cursor()
cur.execute(query)
data = cur.fetchall()
You can just type:
postgres = PostgresHook('connection_id')
data = postgres.get_pandas_df(query)
Which can also make use of encryption of connections.
So using hooks is cleaner, safer and easier.
While it is possible to just hardcode the connections in your script and run it, the power of hooks will allow to edit environment variables from within the UI.
Have a look at "Automate AWS Tasks Thanks to Airflow Hooks" to learn a bit more about how to use hooks.

Is it possible to use SQLAlchemy with AWS RDS Data API?

AWS Recently launched the Data API. This simplifies creating Lambda functions, eliminating the necessity for additional complexity by allowing API calls, instead of direct database connections.
I'm trying to use SQLAlchemy in an AWS Lambda Function, and I'd really like to take advantage of this new API.
Does anyone know if there is any support for this, or if support for this is coming?
Alternatively, how difficult would it be to create a new Engine to support this?
SQLAlchemy calls database drivers "dialects". So if you're using SQLAlchemy with PostgreSQL and using psycopg2 as the driver, then you're using the psycopg2 dialect of PostgreSQL.
I was looking for the same thing as you, and found no existing solution, so I wrote my own and published it. To use the AWS Aurora RDS Data API, I created a SQL dialect package for it, sqlalchemy-aurora-data-api. This in turn required me to write a DB-API compatible Python DB driver for Aurora Data API, aurora-data-api. After installing with pip install sqlalchemy-aurora-data-api, you can use it like this:
from sqlalchemy import create_engine
cluster_arn = "arn:aws:rds:us-east-1:123456789012:cluster:my-aurora-serverless-cluster"
secret_arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:MY_DB_CREDENTIALS"
engine = create_engine('postgresql+auroradataapi://:#/my_db_name',
echo=True,
connect_args=dict(aurora_cluster_arn=cluster_arn, secret_arn=secret_arn))
with engine.connect() as conn:
for result in conn.execute("select * from pg_catalog.pg_tables"):
print(result)
As an alternative, if you want something more like Records, you can try Camus https://github.com/rizidoro/camus.

sqlacodegen doesn't make any Tables

I'm running a PostgreSQL database on a server and I'm trying to connect to it using SQLAlchemy. I found that sqlacodegen was a good tool to generate the MetaData object automatically along with its Tables. But when I try and run sqlacodgen postgresql+psycopg2://username:password#host:5432/dbname, I only get this:
# coding: utf-8
from sqlalchemy import MetaData
metadata = MetaData()
The connection string is definitely correct, and the database is up - I ran a small Python script using that connection string to connect to the database and used execute to run a query on it, and it returns exactly what I expected.
I'm not sure where even to start with debugging this. What could I do to see what's wrong? Is there some sort of requirement that sqlacodegen has that I'm missing?
As it turns out this problem was unrelated to sqlacodegen. All the tables in my database are prefixed with dbname. Passing in --schema dbname works.

Use mock MongoDB server for unit test

I have to implement nosetests for Python code using a MongoDB store. Is there any python library which permits me initializing a mock in-memory MongoDB server?
I am using continuous integration. So, I want my tests to be independent of any MongoDB running server.
Is there a way to mock mongoDM Server in memory to test the code independently of connecting to a Mongo server?
Thanks in advance!
You could try: https://github.com/vmalloc/mongomock, which aims to be a small library for mocking pymongo collection objects for testing purposes.
However, I'm not sure that the cost of just running mongodb would be prohibitive compared to ensuring some mocking library is feature complete.
I don’t know about Python, but I had a similar concern with C#. I decided to just run a real instance of Mongo on my workstation pointed at an empty directory. It’s not great because the code isn’t isolated but it’s fast and easy.
Only the data access layer actually calls Mongo during the test. The rest can rely on the mocks of the data access layer. I didn’t feel like faking Mongo was worth the effort when really I want to verify the interaction with Mongo is correct anyway.
You can use Ming which has an in-memory mongo db pymongo connection replacement.
import ming
mg = ming.create_datastore('mim://')
mg.conn # is the connection
mg.db # is a db with no name
mg.conn.somedb.somecol
# >> mim.Collection(mim.Database(somedb), somecol)
col = mg.conn.somedb.somecol
col.insert({'a': 1})
# >> ObjectId('5216ac3fe0323a1218f4e9aa')
col.find().count()
# >> 1
I am also using pymongo and MockupDB is working very well for my purpose (integration tests).
To use it is as simple as:
from mockupdb import *
server = MockupDB()
port = server.run()
from pymongo import MongoClient
client = MongoClient(server.uri)
import module_i_want_to_patch
module_i_want_to_patch.client = client
You can check the official tutorial for MockupDB here

Categories

Resources