Pytest where to store expected data

Pytest where to store expected data - python

Testing function I need to pass parameters and see the output matches the expected output.
It is easy when function's response is just a small array or a one-line string which can be defined inside the test function, but suppose function I test modifies a config file which can be huge. Or the resulting array is something 4 lines long if I define it explicitly. Where do I store that so my tests remain clean and easy to maintain?
Right now if that is string I just put a file near the .py test and do open() it inside the test:
def test_if_it_works():
with open('expected_asnwer_from_some_function.txt') as res_file:
expected_data = res_file.read()
input_data = ... # Maybe loaded from a file as well
assert expected_data == if_it_works(input_data)
I see many problems with such approach, like the problem of maintaining this file up to date. It looks bad as well.
I can make things probably better moving this to a fixture:
#pytest.fixture
def expected_data()
with open('expected_asnwer_from_some_function.txt') as res_file:
expected_data = res_file.read()
return expected_data
#pytest.fixture
def input_data()
return '1,2,3,4'
def test_if_it_works(input_data, expected_data):
assert expected_data == if_it_works(input_data)
That just moves the problem to another place and usually I need to test if function works in case of empty input, input with a single item or multiple items, so I should create one big fixture including all three cases or multiple fixtures. In the end code gets quite messy.
If a function expects a complicated dictionary as an input or gives back the dictionary of the same huge size test code becomes ugly:
#pytest.fixture
def input_data():
# It's just an example
return {['one_value': 3, 'one_value': 3, 'one_value': 3,
'anotherky': 3, 'somedata': 'somestring'],
['login': 3, 'ip_address': 32, 'value': 53,
'one_value': 3], ['one_vae': 3, 'password': 13, 'lue': 3]}
It's quite hard to read tests with such fixtures and keep them up to date.
Update
After searching a while I found a library which solved a part of a problem when instead of big config files I had large HTML responses. It's betamax.
For easier usage I created a fixture:
from betamax import Betamax
#pytest.fixture
def session(request):
session = requests.Session()
recorder = Betamax(session)
recorder.use_cassette(os.path.join(os.path.dirname(__file__), 'fixtures', request.function.__name__)
recorder.start()
request.addfinalizer(recorder.stop)
return session
So now in my tests I just use the session fixture and every request I make is being serialized automatically to the fixtures/test_name.json file so the next time I execute the test instead of doing a real HTTP request library loads it from the filesystem:
def test_if_response_is_ok(session):
r = session.get("http://google.com")
It's quite handy because in order to keep these fixtures up to date I just need to clean the fixtures folder and rerun my tests.

I had a similar problem once, where I have to test configuration file against an expected file. That's how I fixed it:
Create a folder with the same name of your test module and at the same location. Put all your expected files inside that folder.
test_foo/
expected_config_1.ini
expected_config_2.ini
test_foo.py
Create a fixture responsible for moving the contents of this folder to a temporary file. I did use of tmpdir fixture for this matter.
from __future__ import unicode_literals
from distutils import dir_util
from pytest import fixture
import os
#fixture
def datadir(tmpdir, request):
'''
Fixture responsible for searching a folder with the same name of test
module and, if available, moving all contents to a temporary directory so
tests can use them freely.
'''
filename = request.module.__file__
test_dir, _ = os.path.splitext(filename)
if os.path.isdir(test_dir):
dir_util.copy_tree(test_dir, bytes(tmpdir))
return tmpdir
Important: If you are using Python 3, replace dir_util.copy_tree(test_dir, bytes(tmpdir)) with dir_util.copy_tree(test_dir, str(tmpdir)).
Use your new fixture.
def test_foo(datadir):
expected_config_1 = datadir.join('expected_config_1.ini')
expected_config_2 = datadir.join('expected_config_2.ini')
Remember: datadir is just the same as tmpdir fixture, plus the ability of working with your expected files placed into the a folder with the very name of test module.

I believe pytest-datafiles can be of great help. Unfortunately, it seems not to be maintained much anymore. For the time being, it's working nicely.
Here's a simple example taken from the docs:
import os
import pytest
#pytest.mark.datafiles('/opt/big_files/film1.mp4')
def test_fast_forward(datafiles):
path = str(datafiles) # Convert from py.path object to path (str)
assert len(os.listdir(path)) == 1
assert os.path.isfile(os.path.join(path, 'film1.mp4'))
#assert some_operation(os.path.join(path, 'film1.mp4')) == expected_result
# Using py.path syntax
assert len(datafiles.listdir()) == 1
assert (datafiles / 'film1.mp4').check(file=1)

If you only have a few tests, then why not include the data as a string literal:
expected_data = """
Your data here...
"""
If you have a handful, or the expected data is really long, I think your use of fixtures makes sense.
However, if you have many, then perhaps a different solution would be better. In fact, for one project I have over one hundred input and expected-output files. So I built my own testing framework (more or less). I used Nose, but PyTest would work as well. I created a test generator which walked the directory of test files. For each input file, a test was yielded which compared the actual output with the expected output (PyTest calls it parametrizing). Then I documented my framework so others could use it. To review and/or edit the tests, you only edit the input and/or expected output files and never need to look at the python test file. To enable different input files to to have different options defined, I also crated a YAML config file for each directory (JSON would work as well to keep the dependencies down). The YAML data consists of a dictionary where each key is the name of the input file and the value is a dictionary of keywords that will get passed to the function being tested along with the input file. If you're interested, here is the source code and documentation. I recently played with the idea of defining the options as Unittests here (requires only the built-in unittest lib) but I'm not sure if I like it.

Think if the whole contents of the config file really needs to be tested.
If only several values or substrings must be checked, prepare an expected template for that config. The tested places will be marked as "variables" with some special syntax. Then prepare a separate expected list of the values for the variables in the template. This expected list can be stored as a separate file or directly in the source code.
Example for the template:
ALLOWED_HOSTS = ['{host}']
DEBUG = {debug}
DEFAULT_FROM_EMAIL = '{email}'
Here, the template variables are placed inside curly braces.
The expected values can look like:
host = www.example.com
debug = False
email = webmaster#example.com
or even as a simple comma-separated list:
www.example.com, False, webmaster#example.com
Then your testing code can produce the expected file from the template by replacing the variables with the expected values. And the expected file is compared with the actual one.
Maintaining the template and expected values separately has and advantage that you can have many testing data sets using the same template.
Testing only variables
An even better approach is that the config generation method produces only needed values for the config file. These values can be easily inserted into the template by another method. But the advantage is that the testing code can directly compare all config variables separately and in clear way.
Templates
While it is easy to replace the variables with needed values in the template, there are ready template libraries, which allow to do it only in one line. Here are just a few examples: Django, Jinja, Mako

Related

How to test complicated functions which use requests?

I want to test my code that is based on the API created by someone else, but im not sure how should I do this.
I have created some function to save the json into file so I don't need to send requests each time I run test, but I don't know how to make it work in situation when the original (check) function takes an input arg (problem_report) which is an instance of some class provided by API and it has this
problem_report.get_correction(corr_link) method. I just wonder if this is a sign of bad written code by me, beacuse I can't write a test to this, or maybe I should rewrite this function in my tests file like I showed at the end of provided below code.
# I to want test this function
def check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
# function serves to load json from file, normally it is downloaded by API from some page.
def load_pr(pr_id):
print('loading')
with open('{}{}_view_pr.json'.format(saved_prs_path, pr_id)) as view_pr:
view_pr = json.load(view_pr)
...
pr_info = {'view_pr': view_pr, ...}
return pr_info
# create an instance of class MyPR which takes json to __init__
#pytest.fixture
def setup_pr():
print('setup')
pr = load_pr('123')
my_pr = MyPR(pr['view_pr'])
return my_pr
# test function
def test_check(setup_pr):
pr = setup_pr
checked_pr = pr.check(setup_rft[1]['problem_report_pr'])
assert checker_pr
# rewritten check function in test file
#mock.patch('problem_report.get_correction', side_effect=get_corr)
def test_check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
Im' not sure if I provided enough code and explanation to underastand the problem, but I hope so. I wish you could tell me if this is normal that some function are just hard to test, and if this is good practice to rewritte them separately so I can mock functions inside the tested function. I also was thinking that I could write new class with similar functionality but API is very large and it would be very long process.

I understand your question as follows: You have a function check that you consider hard to test because of its dependency on the problem_report. To make it better testable you have copied the code into the test file. You will test the copied code because you can modify this to be easier testable. And, you want to know if this approach makes sense.
The answer is no, this does not make sense. You are not testing the real function, but completely different code. Well, the code may not start being completely different, but in short time the copy and the original will deviate, and it will be a maintenance nightmare to ensure that the copy always resembles the original. Improving code for testability is a different story: You can make changes to the check function to improve its testability. But then, exactly the same resulting function should be used both in the test and the production code.
How to better test the function check then? First, are you sure that using the original problem_report objects really can not be sensibly used in your tests? (Here are some criteria that help you decide: What to mock for python test cases?). Now, lets assume that you come to the conclusion you can not sensibly use the original problem_report.
In that case, here the interface is simple enough to define a mocked problem_report. Keep in mind that Python uses duck typing, so you only have to create a class that has a links member which has an items() method. Plus, your mocked problem_report class needs a method get_correction(). Beyond that, your mock does not have to produce types that are similar to the types used by problem_report. The items() method can return simply a list of lists, like [["a",2],["xxxxdetailCorrectionxxxx",4]]. The same argument holds for get_correction, which could for example simply return its argument or a derived value, like, its negative.
For the above example (items() returning [["a",2],["xxxxdetailCorrectionxxxx",4]] and get_correction returning the negative of its argument) the expected result would be {4: -4}. No need to simulate real correction objects. And, you can create your mocked versions of problem_report without need to read data from files - the mocks can be setup completely from within the unit-testing code.

Try patching the problem_report symbol in the module. You should put your tests in a separate class.
#mock.patch('some.module.path.problem_report')
def test_check(problem_report):
problem_report.side_effect = get_corr
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections

How to construct an instance of _pytest.pytester.Testdir

I'm attempting to do some debugging (specifically on pytest/testing/test_doctest.py) and I want to step through some code in IPython. I have experience with pytest, but I never do anything too fancy with it, so I've never delved to deep into the more "magic" things it does.
In the test that I want to step through (potentially introspecting some of the objects), there is an argument called testdir, but nowhere in this file does it reference what testdir is or how I could possibly construct one.
After doing some digging it seems this is some magic fixture that automatically gets constructed and send to your function as a parameter, when you execute pytest with the pytester plugin. When I tracked down that class, it is constructed again via some magic request param, where the code is massively unhelpful in telling you what that magic request is or how to make one.
To make this concrete I simply want to take a test like this one:
def test_reportinfo(self, testdir):
'''
Test case to make sure that DoctestItem.reportinfo() returns lineno.
'''
p = testdir.makepyfile(test_reportinfo="""
def foo(x):
'''
>>> foo('a')
'b'
'''
return 'c'
""")
items, reprec = testdir.inline_genitems(p, '--doctest-modules')
reportinfo = items[0].reportinfo()
assert reportinfo[1] == 1
and run its logic in IPython. Looking at what the testdir object does, it seems pretty cool. It automatically makes a file for you and runs pytest problematically instead of via the command line. How can I make one of these? Is there some documentation I missed that makes how to do this clear and seem less obfuscated?
If I wanted to use something like this is my tests is there a way I could make what the magic testdir parameter is slightly more explicit so the next coder that looks at it isn't pulling his/her hair out like I am?

After much agonizing, I've figured out how to instantiate a fixture value.
import _pytest
config = _pytest.config._prepareconfig(['-s'], plugins=['pytester'])
session = _pytest.main.Session(config)
_pytest.tmpdir.pytest_configure(config)
_pytest.fixtures.pytest_sessionstart(session)
_pytest.runner.pytest_sessionstart(session)
def func(testdir):
return testdir
parent = _pytest.python.Module('parent', config=config, session=session)
function = _pytest.python.Function(
'func', parent, callobj=func, config=config, session=session)
_pytest.fixtures.fillfixtures(function)
testdir = function.funcargs['testdir']
The main idea is to create a dummy pytest session. This is a bit tricky. Its critical that the ['-s'] is passed into _prepareconfig otherwise this will not print stdout, or crash when run in IPython.
Given a barebones config and session, the next step is to manually load in whatever fixture functionality you are going to use. This amounts to manually calling the hooks that pluggy usually takes care of for you. I found these by looking at the attribute error I got when trying to run code without them. Usually its just due to session or config lacking a required attribute. There may be a better way to go about doing this (aka automatically via pluggy).
Next, we create a function that requests the specific fixture we are interested in. Its up to you to know what these names are. Finally we setup a dummy module / function tree structure and call fillfixtures, which does the magic. The funcargs then contains a dictionary of these objects ready for use. Be careful if you expect some teardown functionality. I'm not sure if this covers that, but I don't really need it for what I'm doing.
Hope this helps someone else. Note: this talk helped me understand what was happening in pytest under the hood a bit better: https://www.youtube.com/watch?v=zZsNPDfOoHU

Pytest pass arbitrary information from test

I am writing a python plugin for custom HTML report for pytest test results. I want to store some arbitrary test information (i.o. some python objects...) inside tests, and then when making report I want to reuse this information in the report. So far I have only came with a bit of hackish solution.
I pass request object to my test and fill the request.node._report_sections part of it with my data.
This object is then passed to TestReport.sections attribute, which is available via hook pytest_runtest_logreport, from which finally I can generate HTML and then I remove all my objects from sections attribute.
In pseudopythoncode:
def test_answer(request):
a = MyObject("Wooo")
request.node._report_sections.append(("call","myobj",a))
assert False
and
def pytest_runtest_logreport(report):
if report.when=="call":
#generate html from report.sections content
#clean report.sections list from MyObject objects
#(Which by the way contains 2-tuples, i.e. ("myobj",a))
Is there a better pytest way to do this?

This way seems OK.
Improvements I can suggest:
Think about using a fixture to create the MyObject object. Then you can place the request.node._report_sections.append(("call","myobj",a)) inside the fixture, and make it invisible in the test. Like this:
#pytest.fixture
def a(request):
a_ = MyObject("Wooo")
request.node._report_sections.append(("call","myobj",a_))
return a_
def test_answer(a):
...
Another idea, which is suitable in case you have this object in all of your tests, is to implement one of the hooks pytest_pycollect_makeitem or pytest_pyfunc_call, and "plant" the object there in the first place.

Sphinx revealing my (mailgun) password

I have a simple function
import config
def send_message(mailgunkey=config.MAILGUNKEY):
"""
send an email
"""
It relies on a variable defined in my config.py file. I read the variables from local files on all my machines as I don't want to have my keys etc. in any repository. However, I recently got into the habit of using Sphinx. When generating the html docs the expression config.MAILGUNKEY is getting evaluated and the actual key is revealed in the html file. Is there an option to stop this kind of undesired action?

Consider using this approach:
import config
def send_message(mailgunkey=None):
"""
send an email
"""
if mailgunkey is None:
mailgunkey = config.MAILGUNKEY
In general, this approach gives you some important advantages:
lets your users pass None as the default;
allows changes to config.MAILGUNKEY even if your module has already been imported;
solves the problem of mutable default arguments (not your case, but still it's something to be aware of).
The second point is, in my opinion, something very important, as I would be very surprised to see that changes to a configuration variable at runtime have no effects.

Another option would be to mock the module that has your secrets.
This avoids needing to change your code to generate documentation.
Assuming you are using autodoc; add the below to your conf.py:
autodoc_mock_imports = ["config","secrets",]
https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html?highlight=autodoc_mock_imports%20#confval-autodoc_mock_imports

The autodoc_preserve_defaults configuration option was added in Sphinx 4.0.
The problem is solved by setting this option to True in conf.py. Default argument values of functions will not be evaluated and shown in the generated output.

How could you swap out a particular database implementation in python?

If I have a seperate class for my db calls, and I create another implementation of the db layer but say with a different data store.
Is there a way for me to completly swap out the implementation without having to change allot of code?
i.e. I am starting a project, so I can design things properly to achieve this from the get-go.
Note: I will use this pattern for other parts of the site also, not just the db layer so its not really specific to db layer only.

As long as two modules implement exactly the same interface (classes with the same names, methods, and other attributes, functions with the same names and signatures, ...) you can pick one or the other at the time your application is starting up, for example on the basis of some configuration file, and import the chosen one under a fixed name. All the rest of your application can then use that fixed name and, net of the startup code, be blissfully unaware of any shenanigans that may have been done at the start.
For example, consider a simplified case:
# english.py
def greet(): return 'Hello!'
# italian.py
def greet(): return 'Ciao!'
# french.py
def greet(): return 'Salut!'
# config.py
langname = 'italian'
# startit.py
import config
import sys
lang = __import__(config.langname)
sys.modules['lang'] = lang
Now, all the rest of the application can just import lang, and it will be getting under that name the italian module, so, when calling lang.greet(), it will get the string 'Ciao!'.
Of course, in real life you'll have multiple modules, each with multiple functions, classes, and whatnot, but the general principles stay very similar. Just take special care about modules with qualified names (such as foo.bar), i.e., modules which must reside in a package (in this case, foo). For those, you can't just use __import__'s return value, but must use a slightly more roundabout approach, such as:
import sys
def importanyasname(actualname, fakename):
__import__(actualname)
sys.modules[fakename] = sys.modules[actualname]
that is, ignore __import__'s return value, and reach right for the value that's left (with the actual name as the key) in the sys.modules dictionary -- that is the module object you seek, and that you can set back into sys.modules with the "fake name" by which all the rest of the application will be able to blissfully import it any time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.