Pytest schedule intervals between groups of tests

Pytest schedule intervals between groups of tests - python

Is there any way of telling pytest to run a certain set of tests, then wait for a known amount of time, then run another set of tests? For example, if I have tests with the following requirements:
Each test has 3 parts (3 methods to execute)
Part 2 must not be run for each test until a specific, known amount of time has passed since running part 1.
Part 3 must not be run for each test until a specific, known amount of time has passed since running part 2.
If I stitched parts 1, 2 and 3 together for each test and just used time.sleep(), this would take far too long to execute all tests.
Instead I want to run all of the part 1s back to back, then wait a known amount of time, then run all of the part 2s back to back, then wait a known amount of time, then run all of the part 3s.
It appears that this should be possible to implement using markers https://docs.pytest.org/en/stable/example/markers.html and probably implementing hooks https://docs.pytest.org/en/latest/reference.html#hooks to implement certain behaviour based on the markers used, though I'm not very familiar with pytest hooks.
I also came across pytest-ordering https://pytest-ordering.readthedocs.io/en/develop/ which appears to provide behaviour close to what I'm looking for. I just need a way of waiting between certain groups of tests.

You could combine all part one tests in one class, all part two tests in another class, and use class scope fixture for the delay, something like this:
import pytest
import time
#pytest.fixture(scope='class')
def delay():
time.sleep(5)
class TestPart1:
def test_one_part_1(self):
assert 1 == 1
def test_two_part_1(self):
assert 2 == 2
#pytest.mark.usefixtures("delay")
class TestPart2:
def test_one_part_2(self):
assert 1 == 1
def test_two_part_2(self):
assert 2 == 2

Related

Call method on many objects in parallel

I wanted to use concurrency in Python for the first time. So I started reading a lot about Python concurreny (GIL, threads vs processes, multiprocessing vs concurrent.futures vs ...) and seen a lot of convoluted examples. Even in examples using the high level concurrent.futures library.
So I decided to just start trying stuff and was surprised with the very, very simple code I ended up with:
from concurrent.futures import ThreadPoolExecutor
class WebHostChecker(object):
def __init__(self, websites):
self.webhosts = []
for website in websites:
self.webhosts.append(WebHost(website))
def __iter__(self):
return iter(self.webhosts)
def check_all(self):
# sequential:
#for webhost in self:
# webhost.check()
# threaded:
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(lambda webhost: webhost.check(), self.webhosts)
class WebHost(object):
def __init__(self, hostname):
self.hostname = hostname
def check(self):
print("Checking {}".format(self.hostname))
self.check_dns() # only modifies internal state, i.e.: sets self.dns
self.check_http() # only modifies internal status, i.e.: sets self.http
Using the classes looks like this:
webhostchecker = WebHostChecker(["urla.com", "urlb.com"])
webhostchecker.check_all() # -> this calls .check() on all WebHost instances in parallel
The relevant multiprocessing/threading code is only 3 lines. I barely had to modify my existing code (which I hoped to be able to do when first starting to write the code for sequential execution, but started to doubt after reading the many examples online).
And... it works! :)
It perfectly distributes the IO-waiting among multiple threads and runs in less than 1/3 of the time of the original program.
So, now, my question(s):
What am I missing here?
Could I implement this differently? (Should I?)
Why are other examples so convoluted? (Although I must say I couldn't find an exact example doing a method call on multiple objects)
Will this code get me in trouble when I expand my program with features/code I cannot predict right now?
I think I already know of one potential problem and it would be nice if someone can confirm my reasoning: if WebHost.check() also becomes CPU bound I won't be able to swap ThreadPoolExecutor for ProcessPoolExecutor. Because every process will get cloned versions of the WebHost instances? And I would have to code something to sync those cloned instances back to the original?
Any insights/comments/remarks/improvements/... that can bring me to greater understanding will be much appreciated! :)

Ok, so I'll add my own first gotcha:
If webhost.check() raises an Exception, then the thread just ends and self.dns and/or self.http might NOT have been set. However, with the current code, you won't see the Exception, UNLESS you also access the executor.map() results! Leaving me wondering why some objects raised AttributeErrors after running check_all() :)
This can easily be fixed by just evaluating every result (which is always None, cause I'm not letting .check() return anything). You can do it after all threads have run or during. I choose to let Exceptions be raised during (ie: within the with statement), so the program stops at the first unexpected error:
def check_all(self):
with ThreadPoolExecutor(max_workers=10) as executor:
# this alone works, but does not raise any exceptions from the threads:
#executor.map(lambda webhost: webhost.check(), self.webhosts)
for i in executor.map(lambda webhost: webhost.check(), self.webhosts):
pass
I guess I could also use list(executor.map(lambda webhost: webhost.check(), self.webhosts)) but that would unnecessarily use up memory.

How to queue 3 dependent functions using threads and queues

I have 3 functions which I need to run. Each function generates a certain output that the next function depends on. So basically, once the first one finishes only then I can proceed to the second one, but once the second one runs, I can start running the first one again to generate the next batch of data - so I want to run them both at the same time, but I can't run the second one before I finish the first one, and the third one before I finish the second one. But I can run the first and second while the third is running. How can I implement that using threading in python? I understand the basics behind threading, but I don't know how to create the queue for that purpose.
This is an example of what I need to do:
# This is what will usually happen without threading. How can I implement the
# same thing but with threading? Keep in mind that foo() 1, 2 and 3
# take some amount of time. And foo2() may finish before foo() finished
# generating data, so I can't run foo2() until I have the data from foo()
# foo generates the data for foo2
data = foo()
# foo2 generates data for foo3
data2 = foo2(data)
# foo3 does something with data2 and the data is no longer used
foo3(data2)

After going through the comments of your questions I did a little more searching. I understand what you are looking is some kind of Pipeline pattern. And it seems Python does have something for it. Answer for the qn in the link also talks abotu what #Peterwood said about making use of queues.
How to design an async pipeline pattern in python

Multiprocessing, pooling and randomness

I am experiencing a strange thing: I wrote a program to simulate economies. Instead of running this simulation one by one on one CPU core, I want to use multiprocessing to make things faster. So I run my code (fine), and I want to get some stats from the simulations I am doing. Then arises one surprise: all the simulations done at the same time yield the very same result! Is there some strange relationship between Pool() and random.seed()?
To be much clearer, here is what the code can be summarized as:
class Economy(object):
def __init__(self,i):
self.run_number = i
self.Statistics = Statistics()
self.process()
def run_and_return(i):
eco = Economy(i)
return eco
collection = []
def get_result(x):
collection.append(x)
if __name__ == '__main__':
pool = Pool(processes=4)
for i in range(NRUN):
pool.apply_async(run_and_return, (i,), callback=get_result)
pool.close()
pool.join()
The process(i) is the function that goes through every step of the simulation, during i steps. Basically I simulate NRUN Economies, from which I get the Statistics that I put in the list collection.
Now the strange thing is that the output of this is exactly the same for the first 4 runs: during the same "wave" of simulations, I get the very same output. Once I get to the second wave, then I get a different output for the next 4 simulations!
All these simulations run well if I use the same program with processes=1: I get different results when I only work on one core, taking simulations one by one... I have tried a few things, but can't get my head around this, hence my post...
Thank you very much for taking the time to read this long post, do not hesitate to ask for more precisions!
All the best,

If you are on Linux then each pool process is made by forking the parent process. This means the process is literally duplicated - this includes the seed any random object may be using.
The random module selects the seed for its default functions on import. Meaning the seed has already been selected before you create the Pool.
To get around this you must use an initialiser for each pool process that sets the random seed to something unique.
A decent way to seed random would be to use the process id and the current time. The process id is bound to be unique on a single run of your program. Whilst using the time will ensure uniqueness over multiple runs in case the same process id is produced. Passing process id and time through as a string will mean that the digest of the string is also used to seed the random number generator -- meaning two similar strings will produce substantially different seeds. Alternatively, you could use the uuid module to generate seeds.
def proc_init():
random.seed(str(os.getpid()) + str(time.time()))
pool = Pool(num_procs, initializer=proc_init)

Timing a unit test, including the set up

How can you capture the time of an individual unit-test, including the set-up cost?
I've got a test base with a set-up procedure which takes a non-trivial amount of time to complete. I've got several tests which descend from that test base, and I've got a decorator which, in theory, should print out the time it takes to run each test:
class TestBase(unittest.TestCase):
def setUp(self):
# some setup procedure that takes a long time
def timed_test(decorated_test):
def run_test(self, *kw, **kwargs):
start = time.time()
decorated_test(self, *kw, **kwargs)
end = time.time()
print "test_duration: %s (seconds)" % (end - start)
return run_test
class TestSomething(TestBase):
#timed_test
def test_something_useful(self):
# some test
Now, when I run these tests it turns out that I'm only printing the time it took for the test to run not including the set-up time. Tangentially, a related question may be: is it best to deal with timing outside of your testing framework?

I would not reinvent the wheel and use nose test runner with nose-timer plugin:
A timer plugin for nosetests that answers the question: how much time
does every test take?
See more about nose-timer here:
How to benchmark unit tests in Python without adding any code

Repeated single or multiple tests with Nose

Similar to this question, I'd like to have Nose run a test (or all tests) n times -- but not in parallel.
I have a few hundred tests in a project; some are some simple unit tests. Others are integration tests w/ some degree of concurrency. Frequently when debugging tests I want to "hit" a test harder; a bash loop works, but makes for a lot of cluttered output -- no more nice single "." for each passing test. Having the ability to beat on the selected tests for some number of trials seems like a natural thing to ask Nose to do, but I haven't found it anywhere in the docs.
What's the simplest way to get Nose to do this (other than a bash loop)?

You can write a nose test as a generator, and nose will then run each function
yielded:
def check_something(arg):
# some test ...
def test_something():
for arg in some_sequence:
yield (check_something, arg)
Using nose-testconfig, you could make the number of test runs a command line argument:
from testconfig import config
# ...
def test_something():
for n in range(int(config.get("runs", 1))):
yield (check_something, arg)
Which you'd call from the command line with e.g.
$ nosetests --tc=runs:5
... for more than one run.
Alternatively (but also using nose-testconfig), you could write a decorator:
from functools import wraps
from testconfig import config
def multi(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get("runs", 1))):
fn()
return wrapper
#multi
def test_something():
# some test ...
And then, if you want to divide your tests into different groups, each with their own command line argument for the number of runs:
from functools import wraps
from testconfig import config
def multi(cmd_line_arg):
def wrap(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get(cmd_line_arg, 1))):
fn()
return wrapper
return wrap
#multi("foo")
def test_something():
# some test ...
#multi("bar")
def test_something_else():
# some test ...
Which you can call like this:
$ nosetests --tc=foo:3 --tc=bar:7

You'll have to write a script to do this, but you can repeat the test names on the commandline X times.
nosetests testname testname testname testname testname testname testname
etc.

Solution I ended up using is create sh script run_test.sh:
var=0
while $1; do
((var++))
echo "*** RETRY $var"
done
Usage:
./run_test.sh "nosetests TestName"
It runs test infinitely but stops on first error.

One way is in the test itself:
Change this:
class MyTest(unittest.TestCase):
def test_once(self):
...
To this:
class MyTest(unittest.TestCase):
def assert_once(self):
...
def test_many(self):
for _ in range(5):
self.assert_once()

There should never be a reason to run a test more than once. It's important that your tests are deterministic (i.e. given the same state of the codebase, they always produce the same result.) If this isn't the case, then instead of running tests more than once, you should redesign the tests and/or code so that they are.
For example, one reason why tests fail intermittently is a race condition between the test and the code-under-test (CUT). In this circumstance, a naive response is to add a big 'voodoo sleep' to the test, to 'make sure' that the CUT is finished before the test starts asserting.
This is error-prone though, because if your CUT is slow for any reason (underpowered hardware, loaded box, busy database, etc) then it will fail sporadically. A better solution in this instance is to have your test wait for an event, rather than sleeping.
The event could be anything of your choosing. Sometimes, events you can use are already being generated (e.g. Javascript DOM events, the 'pageRendered' kind of events that Selenium tests can make use of.) Other times, it might be appropriate for you to add code to your CUT which raises an event when it's done (perhaps your architecture involves other components that are interested in events like this.)
Often though, you'll need to re-write the test such that it tries to detect whether your CUT is finished executing (e.g. does the output file exist yet?), and if not, sleeps for 50ms and then tries again. Eventually it will time out and fail, but only do this after a very long time (e.g. 100 times the expected execution time of your CUT)
Another approach is to design your CUT using 'onion/hexagonal/ports'n'adaptors' principles, which insists your business logic should be free of all external dependencies. This means that your business logic can be tested using plain old sub-millisecond unit tests, which never touch the network or filesystem. Once this is done, you need far fewer end-to-end system tests, because they are now serving just as integration tests, and don't need to try to manipulate every detail and edge-case of your business logic going through the UI. This approach will also yield big benefits in other areas, such as improved CUT design (reducing dependencies between components), tests are much easier to write, and the time taken to run the whole test suite is much reduced.
Using approaches like the above can entirely eliminate the problem of unreliable tests, and I'd recommend doing so, to improve not just your tests, but also your codebase, and your design abilities.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.