Reading a file from fixture only once - python

Imagine we have the following fixture
#pytest.fixture()
def read_file(self):
df =pd.read_excel(Path+file_name)
return df
Basically it just reads a file and returns it as a dataframe. However, from my understanding if we use the same fixture in multiple tests, it will do the same process again and again, which has a higher cost. Is there a way to read that file in a test that I have and somehow that dataframe to be kept in memory so it can be used in other tests as well?

You can create a Singletone object which holds the file's data.
When you first create that object (in the def init), it will read the file.
Since Singletone is being created only once, the file will be read only once.
You could approach to it from any test you'd like via using import.
you can read about singletones here: Creating a singleton in Python

Set the fixture's scope to session. This way the fixture would be called only once and any test would receive the object that the fixture returns.
#pytest.fixture(scope="session")
def read_file(self):
return pd.read_excel(Path+file_name)
However, beware that in case the fixture returns a mutable object, one test can alter it and affect the results of other tests.

Related

How to test complicated functions which use requests?

I want to test my code that is based on the API created by someone else, but im not sure how should I do this.
I have created some function to save the json into file so I don't need to send requests each time I run test, but I don't know how to make it work in situation when the original (check) function takes an input arg (problem_report) which is an instance of some class provided by API and it has this
problem_report.get_correction(corr_link) method. I just wonder if this is a sign of bad written code by me, beacuse I can't write a test to this, or maybe I should rewrite this function in my tests file like I showed at the end of provided below code.
# I to want test this function
def check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
# function serves to load json from file, normally it is downloaded by API from some page.
def load_pr(pr_id):
print('loading')
with open('{}{}_view_pr.json'.format(saved_prs_path, pr_id)) as view_pr:
view_pr = json.load(view_pr)
...
pr_info = {'view_pr': view_pr, ...}
return pr_info
# create an instance of class MyPR which takes json to __init__
#pytest.fixture
def setup_pr():
print('setup')
pr = load_pr('123')
my_pr = MyPR(pr['view_pr'])
return my_pr
# test function
def test_check(setup_pr):
pr = setup_pr
checked_pr = pr.check(setup_rft[1]['problem_report_pr'])
assert checker_pr
# rewritten check function in test file
#mock.patch('problem_report.get_correction', side_effect=get_corr)
def test_check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
Im' not sure if I provided enough code and explanation to underastand the problem, but I hope so. I wish you could tell me if this is normal that some function are just hard to test, and if this is good practice to rewritte them separately so I can mock functions inside the tested function. I also was thinking that I could write new class with similar functionality but API is very large and it would be very long process.
I understand your question as follows: You have a function check that you consider hard to test because of its dependency on the problem_report. To make it better testable you have copied the code into the test file. You will test the copied code because you can modify this to be easier testable. And, you want to know if this approach makes sense.
The answer is no, this does not make sense. You are not testing the real function, but completely different code. Well, the code may not start being completely different, but in short time the copy and the original will deviate, and it will be a maintenance nightmare to ensure that the copy always resembles the original. Improving code for testability is a different story: You can make changes to the check function to improve its testability. But then, exactly the same resulting function should be used both in the test and the production code.
How to better test the function check then? First, are you sure that using the original problem_report objects really can not be sensibly used in your tests? (Here are some criteria that help you decide: What to mock for python test cases?). Now, lets assume that you come to the conclusion you can not sensibly use the original problem_report.
In that case, here the interface is simple enough to define a mocked problem_report. Keep in mind that Python uses duck typing, so you only have to create a class that has a links member which has an items() method. Plus, your mocked problem_report class needs a method get_correction(). Beyond that, your mock does not have to produce types that are similar to the types used by problem_report. The items() method can return simply a list of lists, like [["a",2],["xxxxdetailCorrectionxxxx",4]]. The same argument holds for get_correction, which could for example simply return its argument or a derived value, like, its negative.
For the above example (items() returning [["a",2],["xxxxdetailCorrectionxxxx",4]] and get_correction returning the negative of its argument) the expected result would be {4: -4}. No need to simulate real correction objects. And, you can create your mocked versions of problem_report without need to read data from files - the mocks can be setup completely from within the unit-testing code.
Try patching the problem_report symbol in the module. You should put your tests in a separate class.
#mock.patch('some.module.path.problem_report')
def test_check(problem_report):
problem_report.side_effect = get_corr
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections

Pytest pass arbitrary information from test

I am writing a python plugin for custom HTML report for pytest test results. I want to store some arbitrary test information (i.o. some python objects...) inside tests, and then when making report I want to reuse this information in the report. So far I have only came with a bit of hackish solution.
I pass request object to my test and fill the request.node._report_sections part of it with my data.
This object is then passed to TestReport.sections attribute, which is available via hook pytest_runtest_logreport, from which finally I can generate HTML and then I remove all my objects from sections attribute.
In pseudopythoncode:
def test_answer(request):
a = MyObject("Wooo")
request.node._report_sections.append(("call","myobj",a))
assert False
and
def pytest_runtest_logreport(report):
if report.when=="call":
#generate html from report.sections content
#clean report.sections list from MyObject objects
#(Which by the way contains 2-tuples, i.e. ("myobj",a))
Is there a better pytest way to do this?
This way seems OK.
Improvements I can suggest:
Think about using a fixture to create the MyObject object. Then you can place the request.node._report_sections.append(("call","myobj",a)) inside the fixture, and make it invisible in the test. Like this:
#pytest.fixture
def a(request):
a_ = MyObject("Wooo")
request.node._report_sections.append(("call","myobj",a_))
return a_
def test_answer(a):
...
Another idea, which is suitable in case you have this object in all of your tests, is to implement one of the hooks pytest_pycollect_makeitem or pytest_pyfunc_call, and "plant" the object there in the first place.

Pytest: running tests multiple times with different input data

I want to run through a collection of test functions with different fixtures for each run. Generally, the solutions suggested on Stack Overflow, documentation and in blog posts fall under two categories. One is by parametrizing the fixture:
#pytest.fixture(params=list_of_cases)
def some_case(request):
return request.param
The other is by calling metafunc.parametrize in order to generate multiple tests:
def pytest_generate_tests(metafunc):
metafunc.parametrize('some_case', list_of_cases)
The problem with both approaches is the order in which the cases are run. Basically it runs each test function with each parameter, instead of going through all test functions for a given parameter and then continuing with the next parameter. This is a problem when some of my fixtures are comparatively expensive database calls.
To illustrate this, assume that dataframe_x is another fixture that belongs to case_x. Pytest does this
test_01(dataframe_1)
test_01(dataframe_2)
...
test_50(dataframe_1)
test_50(dataframe_2)
instead of
test_01(dataframe_1)
...
test_50(dataframe_1)
test_01(dataframe_2)
...
test_50(dataframe_2)
The result is that I will fetch each dataset from the DB 50 times instead of just once. Since I can only define the fixture scope as 'session', 'module' or 'function', I couldn't figure out how to group my tests to that they are run together in chunks.
Is there a way to structure my tests so that I can run through all my test functions in sequence for each dataset?
If you only want to load the dataframes once you could use the scope parameter with 'module' or 'session'.
#pytest.fixture(scope="module", params=[1, 2])
def dataframe(request):
if request.param == 1:
return #load datagrame_1
if request.param == 2:
return #load datagrame_2
The tests will still be run alternately but the dataframe will only be loaded once per module or session.

Python - How to do unit testing of functions deleting something?

I have two functions, one creates an object and store in list call create(), the other deletes object form list call delete().
I have already written unit test using unittest module for create().
But I have no idea how to write unit test for delete().
Because of delete() depends on create().
It is impossible to delete object before create it.
If I write unit test for delete() by calling create() first.
When test fail, I don't know which function cause test fail.
def create(self, clusterName):
import uuid
newClusterUuid = str(uuid.uuid4())
newCluster = Cluster(uuid = newClusterUuid, name = clusterName)
self.clusterList[newClusterUuid] = newCluster
return newClusterUuid
def delete(self, uuid):
try:
del self.clusterList[uuid]
return True
except:
return False
You could try using the setUp and tearDown methods too. So you would put the create() in setup, and if it fails, your delete() test will automatically be skipped.
In testing, you have "expected failures" that are actually proof that your function is working.
So for your delete function, you might test all of these scenarios:
It raises an exception when there is nothing to delete (ie, nothing has been created).
If it does delete, the total is reduced.
It checks that the amount to be delete is less than the total amount of things.
In case #1, you expect it to fail - and if it does fail (it raises the exception) - it actually passes the test.
There are also ways to mark a test as expected failure; which basically means that if this test fails to run, then its not a failure.
You can isolate delete and create by initializing a fake clusterlist.
#setup
fakeClusterList = [1,3,5,6,0]
fakeUUID = 3
# test delete with your delete method
A better way would be to inject Cluster into your create method (i.e. pass it in as a parameter). In this way, you can pass in a mock Cluster object which would return a fake list for testing.
There would thus be a much lower likelihood of the fake create failing during unit testing of delete since the actual create logic (which might be complicated) is gotten rid of.
Do read up on Dependency Injection.

Pytest where to store expected data

Testing function I need to pass parameters and see the output matches the expected output.
It is easy when function's response is just a small array or a one-line string which can be defined inside the test function, but suppose function I test modifies a config file which can be huge. Or the resulting array is something 4 lines long if I define it explicitly. Where do I store that so my tests remain clean and easy to maintain?
Right now if that is string I just put a file near the .py test and do open() it inside the test:
def test_if_it_works():
with open('expected_asnwer_from_some_function.txt') as res_file:
expected_data = res_file.read()
input_data = ... # Maybe loaded from a file as well
assert expected_data == if_it_works(input_data)
I see many problems with such approach, like the problem of maintaining this file up to date. It looks bad as well.
I can make things probably better moving this to a fixture:
#pytest.fixture
def expected_data()
with open('expected_asnwer_from_some_function.txt') as res_file:
expected_data = res_file.read()
return expected_data
#pytest.fixture
def input_data()
return '1,2,3,4'
def test_if_it_works(input_data, expected_data):
assert expected_data == if_it_works(input_data)
That just moves the problem to another place and usually I need to test if function works in case of empty input, input with a single item or multiple items, so I should create one big fixture including all three cases or multiple fixtures. In the end code gets quite messy.
If a function expects a complicated dictionary as an input or gives back the dictionary of the same huge size test code becomes ugly:
#pytest.fixture
def input_data():
# It's just an example
return {['one_value': 3, 'one_value': 3, 'one_value': 3,
'anotherky': 3, 'somedata': 'somestring'],
['login': 3, 'ip_address': 32, 'value': 53,
'one_value': 3], ['one_vae': 3, 'password': 13, 'lue': 3]}
It's quite hard to read tests with such fixtures and keep them up to date.
Update
After searching a while I found a library which solved a part of a problem when instead of big config files I had large HTML responses. It's betamax.
For easier usage I created a fixture:
from betamax import Betamax
#pytest.fixture
def session(request):
session = requests.Session()
recorder = Betamax(session)
recorder.use_cassette(os.path.join(os.path.dirname(__file__), 'fixtures', request.function.__name__)
recorder.start()
request.addfinalizer(recorder.stop)
return session
So now in my tests I just use the session fixture and every request I make is being serialized automatically to the fixtures/test_name.json file so the next time I execute the test instead of doing a real HTTP request library loads it from the filesystem:
def test_if_response_is_ok(session):
r = session.get("http://google.com")
It's quite handy because in order to keep these fixtures up to date I just need to clean the fixtures folder and rerun my tests.
I had a similar problem once, where I have to test configuration file against an expected file. That's how I fixed it:
Create a folder with the same name of your test module and at the same location. Put all your expected files inside that folder.
test_foo/
expected_config_1.ini
expected_config_2.ini
test_foo.py
Create a fixture responsible for moving the contents of this folder to a temporary file. I did use of tmpdir fixture for this matter.
from __future__ import unicode_literals
from distutils import dir_util
from pytest import fixture
import os
#fixture
def datadir(tmpdir, request):
'''
Fixture responsible for searching a folder with the same name of test
module and, if available, moving all contents to a temporary directory so
tests can use them freely.
'''
filename = request.module.__file__
test_dir, _ = os.path.splitext(filename)
if os.path.isdir(test_dir):
dir_util.copy_tree(test_dir, bytes(tmpdir))
return tmpdir
Important: If you are using Python 3, replace dir_util.copy_tree(test_dir, bytes(tmpdir)) with dir_util.copy_tree(test_dir, str(tmpdir)).
Use your new fixture.
def test_foo(datadir):
expected_config_1 = datadir.join('expected_config_1.ini')
expected_config_2 = datadir.join('expected_config_2.ini')
Remember: datadir is just the same as tmpdir fixture, plus the ability of working with your expected files placed into the a folder with the very name of test module.
I believe pytest-datafiles can be of great help. Unfortunately, it seems not to be maintained much anymore. For the time being, it's working nicely.
Here's a simple example taken from the docs:
import os
import pytest
#pytest.mark.datafiles('/opt/big_files/film1.mp4')
def test_fast_forward(datafiles):
path = str(datafiles) # Convert from py.path object to path (str)
assert len(os.listdir(path)) == 1
assert os.path.isfile(os.path.join(path, 'film1.mp4'))
#assert some_operation(os.path.join(path, 'film1.mp4')) == expected_result
# Using py.path syntax
assert len(datafiles.listdir()) == 1
assert (datafiles / 'film1.mp4').check(file=1)
If you only have a few tests, then why not include the data as a string literal:
expected_data = """
Your data here...
"""
If you have a handful, or the expected data is really long, I think your use of fixtures makes sense.
However, if you have many, then perhaps a different solution would be better. In fact, for one project I have over one hundred input and expected-output files. So I built my own testing framework (more or less). I used Nose, but PyTest would work as well. I created a test generator which walked the directory of test files. For each input file, a test was yielded which compared the actual output with the expected output (PyTest calls it parametrizing). Then I documented my framework so others could use it. To review and/or edit the tests, you only edit the input and/or expected output files and never need to look at the python test file. To enable different input files to to have different options defined, I also crated a YAML config file for each directory (JSON would work as well to keep the dependencies down). The YAML data consists of a dictionary where each key is the name of the input file and the value is a dictionary of keywords that will get passed to the function being tested along with the input file. If you're interested, here is the source code and documentation. I recently played with the idea of defining the options as Unittests here (requires only the built-in unittest lib) but I'm not sure if I like it.
Think if the whole contents of the config file really needs to be tested.
If only several values or substrings must be checked, prepare an expected template for that config. The tested places will be marked as "variables" with some special syntax. Then prepare a separate expected list of the values for the variables in the template. This expected list can be stored as a separate file or directly in the source code.
Example for the template:
ALLOWED_HOSTS = ['{host}']
DEBUG = {debug}
DEFAULT_FROM_EMAIL = '{email}'
Here, the template variables are placed inside curly braces.
The expected values can look like:
host = www.example.com
debug = False
email = webmaster#example.com
or even as a simple comma-separated list:
www.example.com, False, webmaster#example.com
Then your testing code can produce the expected file from the template by replacing the variables with the expected values. And the expected file is compared with the actual one.
Maintaining the template and expected values separately has and advantage that you can have many testing data sets using the same template.
Testing only variables
An even better approach is that the config generation method produces only needed values for the config file. These values can be easily inserted into the template by another method. But the advantage is that the testing code can directly compare all config variables separately and in clear way.
Templates
While it is easy to replace the variables with needed values in the template, there are ready template libraries, which allow to do it only in one line. Here are just a few examples: Django, Jinja, Mako

Categories

Resources