Unit testing a function that depends on database

Unit testing a function that depends on database - python

I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?

There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()

The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.

Related

Why redoing automapping is necessary? (SQLAlchemy)

I am trying to understand code, which does roughly the following:
# db.py module
engine = create_engine(DB_URL, pool_timeout=20, pool_recycle=1)
def get_session():
return scoped_session(sessionmaker(bind=engine, expire_on_commit=False))()
def get_base():
base = automap_base()
base.prepare(engine, reflect=True)
return base
base = get_base()
User = base.classes.user
in some function:
# other.py module
from db import get_base, get_session, User
def some_func():
sess = get_session()
# do something with sess and User:
user = sess.query(User).first()
User2 = get_base().classes.user
try:
check = sess.query(User2).first()
except:
sess.rollback()
# do more with sess
sess.commit()
The some_func can be called in eg a celery task, but no greenlets or other similar monkey-patch concurrency tricks are in use.
I am wondering, what can be achieved by re-mapping metadata? Is my understanding correct, then due to scoped session SQLAlchemy will anyway have the same object? And in this case even session seems to be the same.
What can be the point?
My assumption about getting same object is wrong though:
(Pdb) pp user
<sqlalchemy.ext.automap.user object at 0x7f62e1a57390>
(Pdb) pp check
<sqlalchemy.ext.automap.user object at 0x7f62e0e93750>
(Pdb) pp user == check
False
(Pdb) pp user.id
1L
(Pdb) pp check.id
1L
(id is a primary key, that is, unique)
So it seems that SQLAlchemy keeps objects from different base separately.
My best guess so far is that this trick allows to have, for example, user existence test outside of currently running transaction.

Most of the time it is not necessary and just slows the application down. The database schema does not usually change too often while the application is running, and simple changes should not matter (see "Data independence"). Redoing reflection etc. is just something people seem to do – maybe due to fear of using globals. On the other hand in your example it first seems that in db.py reflection is done only once to produce the global base and User classes.
The same applies to scoped session registries. The registry itself is meant to serve thread-local sessions, so there is no point in recreating it all the time. Instead it should be an application wide singleton. It should be noted that using scoped sessions means that your application uses threads in a manner compatible with it, or in other words a single thread should be handling a single job, such as a request/response etc., so that the lifetime of a session binds naturally to the lifetime of a thread.
Your assumption about getting the same object breaks due to the recreation of base and model classes. Though they represent the same row in the database, they are different models and as such produce different objects in the session.

Modify data as part of an alembic upgrade

I would like to modify some database data as part of an alembic upgrade.
I thought I could just add any code in the upgrade of my migration, but the following fails:
def upgrade():
### commands auto generated by Alembic - please adjust! ###
op.add_column('smsdelivery', sa.Column('sms_message_part_id', sa.Integer(), sa.ForeignKey('smsmessagepart.id'), nullable=True))
### end Alembic commands ###
from volunteer.models import DBSession, SmsDelivery, SmsMessagePart
for sms_delivery in DBSession.query(SmsDelivery).all():
message_part = DBSession.query(SmsMessagePart).filter(SmsMessagePart.message_id == sms_delivery.message_id).first()
if message_part is not None:
sms_delivery.sms_message_part = message_part
with the following error:
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configured on mapper Mapper|SmsDelivery|smsdelivery, SQL expression or this Session
I am not really understanding this error. How can I fix this or is doing operations like this not a possibility?

It is difficult to understand what exactly you are trying to achieve from the code excerpt your provided. But I'll try to guess. So the following answer will be based on my guess.
Line 4 - you import things (DBSession, SmsDelivery, SmsMessagePart) form your modules and then you are trying to operate with these objects like you do in your application.
The error shows that SmsDelivery is a mapper object - so it is pointing to some table. mapper objects should bind to valid sqlalchemy connection.
Which tells me that you skipped initialization of DB objects (connection and binding this connection to mapper objects) like you normally do in your application code.
DBSession looks like SQLAlchemy session object - it should have connection bind too.
Alembic already has connection ready and open - for making changes to db schema you are requesting with op.* methods.
So there should be way to get this connection.
According to Alembic manual op.get_bind() will return current Connection bind:
For full interaction with a connected database, use the “bind” available from the context:
from alembic import op
connection = op.get_bind()
So you may use this connection to run your queries into db.
PS. I would assume you wanted to perform some modifications to data in your table. You may try to formulate this modification into one update query. Alembic has special method for executing such changes - so you would not need to deal with connection.
alembic.operations.Operations.execute
execute(sql, execution_options=None)
Execute the given SQL using the current migration context.
In a SQL script context, the statement is emitted directly to the output stream. There is no return result, however, as this function is oriented towards generating a change script that can run in “offline” mode.
Parameters: sql – Any legal SQLAlchemy expression, including:
a string a sqlalchemy.sql.expression.text() construct.
a sqlalchemy.sql.expression.insert() construct.
a sqlalchemy.sql.expression.update(),
sqlalchemy.sql.expression.insert(), or
sqlalchemy.sql.expression.delete() construct. Pretty much anything
that’s “executable” as described in SQL Expression Language Tutorial.

Its worth noting that if you do this, you probably want to freeze a copy of your orm model inside the migration, like this:
class MyType(Base):
__tablename__ = 'existing_table'
__table_args__ = {'extend_existing': True}
id = Column(Integer, ...)
..
def upgrade():
Base.metadata.bind = op.get_bind()
for item in Session.query(MyType).all():
...
Otherwise you'll inevitably end up in a situation where you orm model changes and previous migrations no longer work.
Particularly note that you want to extend Base, not the base type itself (app.models.MyType) because your type might go away as some point, and once again, your migrations will fail.

You need to import Base also and then
Base.metatada.bind = op.get_bind()
and after this you can use your models like always without errors.

Django. Thread safe update or create.

We know, that update - is thread safe operation.
It means, that when you do:
SomeModel.objects.filter(id=1).update(some_field=100)
Instead of:
sm = SomeModel.objects.get(id=1)
sm.some_field=100
sm.save()
Your application is relativly thread safe and operation SomeModel.objects.filter(id=1).update(some_field=100) will not rewrite data in other model fields.
My question is.. If there any way to do
SomeModel.objects.filter(id=1).update(some_field=100)
but with creation of object if it does not exists?

from django.db import IntegrityError
def update_or_create(model, filter_kwargs, update_kwargs)
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
kwargs = filter_kwargs.copy()
kwargs.update(update_kwargs)
try:
model.objects.create(**kwargs)
except IntegrityError:
if not model.objects.filter(**filter_kwargs).update(**update_kwargs):
raise # re-raise IntegrityError
I think, code provided in the question is not very demonstrative: who want to set id for model?
Lets assume we need this, and we have simultaneous operations:
def thread1():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 1})
def thread2():
update_or_create(SomeModel, {'some_unique_field':1}, {'some_field': 2})
With update_or_create function, depends on which thread comes first, object will be created and updated with no exception. This will be thread-safe, but obviously has little use: depends on race condition value of SomeModek.objects.get(some__unique_field=1).some_field could be 1 or 2.
Django provides F objects, so we can upgrade our code:
from django.db.models import F
def thread1():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 1})
def thread2():
update_or_create(SomeModel,
{'some_unique_field':1},
{'some_field': F('some_field') + 2})

You want django's select_for_update() method (and a backend that supports row-level locking, such as PostgreSQL) in combination with manual transaction management.
try:
with transaction.commit_on_success():
SomeModel.objects.create(pk=1, some_field=100)
except IntegrityError: #unique id already exists, so update instead
with transaction.commit_on_success():
object = SomeModel.objects.select_for_update().get(pk=1)
object.some_field=100
object.save()
Note that if some other process deletes the object between the two queries, you'll get a SomeModel.DoesNotExist exception.
Django 1.7 and above also has atomic operation support and a built-in update_or_create() method.

You can use Django's built-in get_or_create, but that operates on the model itself, rather than a queryset.
You can use that like this:
me = SomeModel.objects.get_or_create(id=1)
me.some_field = 100
me.save()
If you have multiple threads, your app will need to determine which instance of the model is correct. Usually what I do is refresh the model from the database, make changes, and then save it, so you don't have a long time in a disconnected state.

It's impossible in django do such upsert operation, with update. But queryset update method return number of filtered fields so you can do:
from django.db import router, connections, transaction
class MySuperManager(models.Manager):
def _lock_table(self, lock='ACCESS EXCLUSIVE'):
cursor = connections[router.db_for_write(self.model)]
cursor.execute(
'LOCK TABLE %s IN %s MODE' % (self.model._meta.db_table, lock)
)
def create_or_update(self, id, **update_fields):
with transaction.commit_on_success():
self.lock_table()
if not self.get_query_set().filter(id=id).update(**update_fields):
self.model(id=id, **update_fields).save()
this example if for postgres, you can use it without sql code, but update or insert operation will not be atomic. If you create a lock on table you will be sure that two objects will be not created in two other threads.

I think if you have critical demands on atom operations. You'd better design it in database level instead of Django ORM level.
Django ORM system is focusing on convenience instead of performance and safety. You have to optimize the automatic generated SQL sometimes.
"Transaction" in most productive databases provide database lock and rollback well.
In mashup(hybrid) systems, or say your system added some 3rd part components, like logging, statistics. Application in different framework or even language may access database at the same time, adding thread safe in Django is not enough in this case.

SomeModel.objects.filter(id=1).update(set__some_field=100)

Testing cherrypy with nose/fixture/webtest (amidoinitrite)

I am developing a CherryPy application and I want to write some automated tests for it. I chose to use nosetests for it. The application uses sqlalchemy as db backend so I need to use fixture package to provide fixed datasets. Also I want to do webtests. Here is how I set it all together:
I have a helper function init_model(test = False) in the file where all models are created. It connects to the production or test (if test == True or cherrypy.request.app.test == True) database and calls create_all
Then I have created a base class for tests like this:
class BaseTest(DataTestCase):
def __init__(self):
init_model(True)
application.test = True
self.app = TestApp(application)
self.fixture = SQLAlchemyFixture(env = models, engine = meta.engine, style = NamedDataStyle())
self.datasets = (
# all the datasets go here
)
And now I do my tests by creating child classes of BaseTest and calling self.app.some_method()
This is my first time doing tests in python and all this seems very complicated. I want to know if I am using the mentioned packages as their authors intended and if it's not overcomplicated.

That looks mostly like normal testing glue for a system of any size. In other words, it's not overly-complicated.
In fact, I'd suggest slightly more complexity in one respect: I think you're going to find setting up a new database in each child test class to be really slow. It's more common to at least set up all your tables once per run instead of once per class. Then, you either have each test method create all the data it needs for its own sake, and/or you run each test case in a transaction and roll it all back in a finally: block.

Why does this SQLAlchemy example commit changes to the DB?

This example illustrates a mystery I encountered in an application I am building. The application needs to support an option allowing the user to exercise the code without actually committing changes to the DB. However, when I added this option, I discovered that changes were persisted to the DB even when I did not call the commit() method.
My specific question can be found in the code comments. The underlying goal is to have a clearer understanding of when and why SQLAlchemy will commit to the DB.
My broader question is whether my application should (a) use a global Session instance, or (b) use a global Session class, from which particular instances would be instantiated. Based on this example, I'm starting to think that the correct answer is (b). Is that right? Edit: this SQLAlchemy documentation suggests that (b) is recommended.
import sys
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key = True)
name = Column(String)
age = Column(Integer)
def __init__(self, name, age = 0):
self.name = name
self.age = 0
def __repr__(self):
return "<User(name='{0}', age={1})>".format(self.name, self.age)
engine = create_engine('sqlite://', echo = False)
Base.metadata.create_all(engine)
Session = sessionmaker()
Session.configure(bind=engine)
global_session = Session() # A global Session instance.
commit_ages = False # Whether to commit in modify_ages().
use_global = True # If True, modify_ages() will commit, regardless
# of the value of commit_ages. Why?
def get_session():
return global_session if use_global else Session()
def add_users(names):
s = get_session()
s.add_all(User(nm) for nm in names)
s.commit()
def list_users():
s = get_session()
for u in s.query(User): print ' ', u
def modify_ages():
s = get_session()
n = 0
for u in s.query(User):
n += 10
u.age = n
if commit_ages: s.commit()
add_users(('A', 'B', 'C'))
print '\nBefore:'
list_users()
modify_ages()
print '\nAfter:'
list_users()

tl;dr - The updates are not actually committed to the database-- they are part of an uncommitted transaction in progress.
I made 2 separate changes to your call to create_engine(). (Other than this one line, I'm using your code exactly as posted.)
The first was
engine = create_engine('sqlite://', echo = True)
This provides some useful information. I'm not going to post the entire output here, but notice that no SQL update commands are issued until after the second call to list_users() is made:
...
After:
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 UPDATE users SET age=? WHERE users.id = ?
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 (10, 1)
...
This is a clue that the data is not persisted, but kept around in the session object.
The second change I made was to persist the database to a file with
engine = create_engine('sqlite:///db.sqlite', echo = True)
Running the script again provides the same output as before for the second call to list_users():
<User(name='A', age=10)>
<User(name='B', age=20)>
<User(name='C', age=30)>
However, if you now open the db we just created and query it's contents, you can see that the added users were persisted to the database, but the age modifications were not:
$ sqlite3 db.sqlite "select * from users"
1|A|0
2|B|0
3|C|0
So, the second call to list_users() is getting its values from the session object, not from the database, because there is a transaction in progress that hasn't been committed yet. To prove this, add the following lines to the end of your script:
s = get_session()
s.rollback()
print '\nAfter rollback:'
list_users()

Since you state you are actually using MySQL on the system you are seeing the problem, check the engine type the table was created with. The default is MyISAM, which does not support ACID transactions. Make sure you are using the InnoDB engine, which does do ACID transactions.
You can see which engine a table is using with
show create table users;
You can change the db engine for a table with alter table:
alter table users engine="InnoDB";

1. the example: Just to make sure that (or check if) the session does not commit the changes, it is enough to call expunge_all on the session object. This will most probably prove that the changes are not actually committed:
....
print '\nAfter:'
get_session().expunge_all()
list_users()
2. mysql: As you already mentioned, the sqlite example might not reflect what you actually see when using mysql. As documented in sqlalchemy - MySQL - Storage Engines, the most likely reason for your problem is the usage of non-transactional storage engines (like MyISAM), which results in an autocommit mode of execution.
3. session scope: Although having one global session sounds like a quest for a problem, using new session for every tiny little request is also not a great idea. You should think of a session as a transaction/unit-of-work. I find the usage of the contextual sessions the best of two worlds, where you do not have to pass the session object in the hierarchy of method calls, and at the same time you are given a pretty good safety in the multi-threaded environment. I do use the local session once in a while where I know I do not want to interact with the currently running transaction (session).

Note that the defaults of create_session() are the opposite of that of sessionmaker(): autoflush and expire_on_commit are False, autocommit is True.

global_session is already instantiated when you call modify_ages() and you've already committed to the database. If you re-instantiate global_session after you commit, it should start a new transaction.
My guess is since you've already committed and are re-using the same object, each additional modification is automatically committed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unit testing a function that depends on database - python

Related

Why redoing automapping is necessary? (SQLAlchemy)

Modify data as part of an alembic upgrade

Django. Thread safe update or create.

Testing cherrypy with nose/fixture/webtest (amidoinitrite)

Why does this SQLAlchemy example commit changes to the DB?

Categories

Resources