Timing a unit test, including the set up

Timing a unit test, including the set up - python

How can you capture the time of an individual unit-test, including the set-up cost?
I've got a test base with a set-up procedure which takes a non-trivial amount of time to complete. I've got several tests which descend from that test base, and I've got a decorator which, in theory, should print out the time it takes to run each test:
class TestBase(unittest.TestCase):
def setUp(self):
# some setup procedure that takes a long time
def timed_test(decorated_test):
def run_test(self, *kw, **kwargs):
start = time.time()
decorated_test(self, *kw, **kwargs)
end = time.time()
print "test_duration: %s (seconds)" % (end - start)
return run_test
class TestSomething(TestBase):
#timed_test
def test_something_useful(self):
# some test
Now, when I run these tests it turns out that I'm only printing the time it took for the test to run not including the set-up time. Tangentially, a related question may be: is it best to deal with timing outside of your testing framework?

I would not reinvent the wheel and use nose test runner with nose-timer plugin:
A timer plugin for nosetests that answers the question: how much time
does every test take?
See more about nose-timer here:
How to benchmark unit tests in Python without adding any code

Related

Pytest schedule intervals between groups of tests

Is there any way of telling pytest to run a certain set of tests, then wait for a known amount of time, then run another set of tests? For example, if I have tests with the following requirements:
Each test has 3 parts (3 methods to execute)
Part 2 must not be run for each test until a specific, known amount of time has passed since running part 1.
Part 3 must not be run for each test until a specific, known amount of time has passed since running part 2.
If I stitched parts 1, 2 and 3 together for each test and just used time.sleep(), this would take far too long to execute all tests.
Instead I want to run all of the part 1s back to back, then wait a known amount of time, then run all of the part 2s back to back, then wait a known amount of time, then run all of the part 3s.
It appears that this should be possible to implement using markers https://docs.pytest.org/en/stable/example/markers.html and probably implementing hooks https://docs.pytest.org/en/latest/reference.html#hooks to implement certain behaviour based on the markers used, though I'm not very familiar with pytest hooks.
I also came across pytest-ordering https://pytest-ordering.readthedocs.io/en/develop/ which appears to provide behaviour close to what I'm looking for. I just need a way of waiting between certain groups of tests.

You could combine all part one tests in one class, all part two tests in another class, and use class scope fixture for the delay, something like this:
import pytest
import time
#pytest.fixture(scope='class')
def delay():
time.sleep(5)
class TestPart1:
def test_one_part_1(self):
assert 1 == 1
def test_two_part_1(self):
assert 2 == 2
#pytest.mark.usefixtures("delay")
class TestPart2:
def test_one_part_2(self):
assert 1 == 1
def test_two_part_2(self):
assert 2 == 2

Timing blocks of code - Python [duplicate]

This question already has answers here:
How can I time a code segment for testing performance with Pythons timeit?
(9 answers)
Closed 6 years ago.
I'm trying to measure the time it takes to run a block of instructions in Python, but I don't want to write things like:
start = time.clock()
...
<lines of code>
...
time_elapsed = time.clock() - start
Instead, I want to know if there is a way I can send the block of instructions as a parameter to a function that returns the elapsed time, like
time_elapsed = time_it_takes(<lines of code>)
The implementation of this method could be something like
def time_it_takes(<lines of code>):
start = time.clock()
result = <lines of code>
return (result, time.clock() - start)
Does anybody know if there is some way I can do this? Thanks in advance.

This would be a good use of a decorator. You could write a decorator that does that like this
import time
def timer(func):
def wrapper(*args, **kwargs):
start = time.time()
func(*args, **kwargs)
print('The function ran for', time.time() - start)
return wrapper
#timer
def just_sleep():
time.sleep(5)
just_sleep()
Output
The function ran for 5.0050904750823975
and then you can decorate any function you want to time with #timer and you can also do some other fancy things inside the decorator. Like if the function ran for more than 15 seconds do something...else do another thing
Note: This is not the most accurate way to measure execution time of a function in python

You can build your own context manager to time relatively long bits of code.
import time
class MyTimer(object):
def __enter__(self):
self.start = time.clock()
return self
def __exit__(self, typ, value, traceback):
self.duration = time.clock() - self.start
with MyTimer() as timer:
time.sleep(3)
print(timer.duration)
But be careful about what you are measuring. On Linux time.clock is cpu run time, but on Windows (where cpu run time is not easily available) it is a wall-clock.

If you use IPython, and it's a good thing to do, and you can construct your code to be a single line, i.e. a function call:
%timeit your-code
It's been handy for me. Hope it helps.

use python -m cProfile myscript.py It provides a full log about time consumption of methods.

python running coverage on never ending process

I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).
The problem is, that since the processes never end, I don't have a good place to set the cov.start() cov.stop() cov.save() hooks.
Therefore, I thought about spawning a thread that in an infinite loop will save and combine the coverage data and then sleep some time, however this approach doesn't work, the coverage report seems to be empty, except from the sleep line.
I would be happy to receive any ideas about how to get the coverage of my code,
or any advice about why my idea doesn't work. Here is a snippet of my code:
import coverage
cov = coverage.Coverage()
import time
import threading
import os
class CoverageThread(threading.Thread):
_kill_now = False
_sleep_time = 2
#classmethod
def exit_gracefully(cls):
cls._kill_now = True
def sleep_some_time(self):
time.sleep(CoverageThread._sleep_time)
def run(self):
while True:
cov.start()
self.sleep_some_time()
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
if self._kill_now:
break
cov.stop()
if os.path.exists('.coverage'):
cov.combine()
cov.save()
cov.html_report(directory="coverage_report_data.html")
print "End of the program. I was killed gracefully :)"

Apparently, it is not possible to control coverage very well with multiple Threads.
Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread.
So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.
I played a bit with the API and it is possible to access the measurments without stopping the Coverage object.
So you could launch a thread that save the coverage data periodically, using the API.
A first implementation would be something like in this
import threading
from time import sleep
from coverage import Coverage
from coverage.data import CoverageData, CoverageDataFiles
from coverage.files import abs_file
cov = Coverage(config_file=True)
cov.start()
def get_data_dict(d):
"""Return a dict like d, but with keys modified by `abs_file` and
remove the copied elements from d.
"""
res = {}
keys = list(d.keys())
for k in keys:
a = {}
lines = list(d[k].keys())
for l in lines:
v = d[k].pop(l)
a[l] = v
res[abs_file(k)] = a
return res
class CoverageLoggerThread(threading.Thread):
_kill_now = False
_delay = 2
def __init__(self, main=True):
self.main = main
self._data = CoverageData()
self._fname = cov.config.data_file
self._suffix = None
self._data_files = CoverageDataFiles(basename=self._fname,
warn=cov._warn)
self._pid = os.getpid()
super(CoverageLoggerThread, self).__init__()
def shutdown(self):
self._kill_now = True
def combine(self):
aliases = None
if cov.config.paths:
from coverage.aliases import PathAliases
aliases = PathAliases()
for paths in self.config.paths.values():
result = paths[0]
for pattern in paths[1:]:
aliases.add(pattern, result)
self._data_files.combine_parallel_data(self._data, aliases=aliases)
def export(self, new=True):
cov_report = cov
if new:
cov_report = Coverage(config_file=True)
cov_report.load()
self.combine()
self._data_files.write(self._data)
cov_report.data.update(self._data)
cov_report.html_report(directory="coverage_report_data.html")
cov_report.report(show_missing=True)
def _collect_and_export(self):
new_data = get_data_dict(cov.collector.data)
if cov.collector.branch:
self._data.add_arcs(new_data)
else:
self._data.add_lines(new_data)
self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
self._data_files.write(self._data, self._suffix)
if self.main:
self.export()
def run(self):
while True:
sleep(CoverageLoggerThread._delay)
if self._kill_now:
break
self._collect_and_export()
cov.stop()
if not self.main:
self._collect_and_export()
return
self.export(new=False)
print("End of the program. I was killed gracefully :)")
A more stable version can be found in this GIST.
This code basically grab the info collected by the collector without stopping it.
The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
The report files get updated every _delay seconds.
But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process.
The script main.py permits to launch different tests with threads and processes.
There is 2/3 drawbacks to this code that you need to be carefull of:
It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.
Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.
The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.
Let me know if this help you in some way or if it is possible to improve this gist.
EDIT:
It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:
# Content of coverage.pth in your site-package folder
import os
if "COVERAGE_LOGGER_START" in os.environ:
import atexit
from coverage_logger import CoverageLoggerThread
thread_cov = CoverageLoggerThread(main=False)
thread_cov.start()
def close_cov()
thread_cov.shutdown()
thread_cov.join()
atexit.register(close_cov)
You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y

Since you are willing to run your code differently for the test, why not add a way to end the process for the test? That seems like it will be simpler than trying to hack coverage.

You can use pyrasite directly, with the following two programs.
# start.py
import sys
import coverage
sys.cov = cov = coverage.coverage()
cov.start()
And this one
# stop.py
import sys
sys.cov.stop()
sys.cov.save()
sys.cov.html_report()
Another way to go would be to trace the program using lptrace even if it only prints calls it can be useful.

Timing Code Execution Time

So, I am interested in timing some of the code I am setting up. Borrowing a timer function from the 4th edition of Learning Python, I tried:
import time
reps = 100
repslist = range(reps)
def timer(func):
start = time.clock()
for i in repslist:
ret = func()
elasped = time.clock()-start
return elapsed
Then, I paste in whatever I want to time, and put:
print(timer(func)) #replace func with the function you want to time
When I run it on my code, I do get an answer, but it's nonsense. Suspecting something was wrong, I put a time.sleep(0.1) call in my code, and got a result of 0.8231
Does anybody know why this might be the case or how to fix it? I suspect that the time.clock() call might be at fault.

According to the help docs for clock:
Return the CPU time or real time since the start of the process or since the first call to clock(). This has as much precision as the system records.
The second call to clock already returns the elapsed time between it and the first clock call. You don't need to manually subtract start.
Change
elasped = time.clock()-start
to
elasped = time.clock()

If you want to timer a function perhaps give decorators a try(documentation here):
import time
def timeit(f):
def timed(*args, **kw):
ts = time.time()
result = f(*args, **kw)
te = time.time()
print 'func:%r args:[%r, %r] took: %2.4f sec' % \
(f.__name__, args, kw, te-ts)
return result
return timed
Then when you write a function you just use the decorator, here:
#timeit
def my_example_function():
for i in range(10000):
print "x"
This will print out the time the function took to execute:
func:'my_example_function' args:[(), {}] took: 0.4220 sec

After fixing the typo in the first intended use of elapsed, your code works fine with either time.clock or time.time (or Py3's time.monotonic for that matter) on my Linux system.
The difference would be in the (OS specific) behavior for clock; on most UNIX-like OSes it will return the processor time used by the program since it launched (so time spent blocked, on I/O, locks, page faults, etc. wouldn't count), while on Windows it's a wall clock timer (so time spent blocked would count) that counts seconds since first call.
The UNIX-like version of time.clock is also fairly unreliable if used in a long running program when clock_t is only 32 bits; the value it returns will wrap roughly every 72 minutes of processor time.
Of course, time.time isn't perfect either; it follows the system clock, so an NTP time update (or any other change to the system clock) occurring between calls will give erroneous results (on Python 3.3+, you'd use time.monotonic to avoid this problem). It's also not guaranteed to have granularity finer than 1 second, so if your function doesn't take an awfully long time to run, on a system with low res time.time you won't get particularly useful results.
Really, you should be looking at the Python batteries designed for this (that also handle issues like garbage collection overhead and the like). The timeit module already has a function that does what you want, but handles all the edge cases and issues I mentioned. For example, to time some global function named foo for 100 reps, you'd just do:
import timeit
def foo():
...
print(timeit.timeit('foo()', 'from __main__ import foo', number=100))
It fixes most of the issues I mention by selecting the best timing function for the OS you're on (and also fixes other sources of jitter, e.g. cyclic garbage collection, which is disabled during the test and reenabled at the end).
Even if you don't want to use that for some reason, if you're using Python 3.3 or higher, take a look at the replacements for time.clock, e.g. time.perf_counter (includes time spent sleeping) or time.process_time (includes only CPU time), both of which are portable, reliable, fast, and high resolution for better accuracy.

The time.sleep() will terminate for any signal. read about it here ...
http://www.tutorialspoint.com/python/time_sleep.htm

Repeated single or multiple tests with Nose

Similar to this question, I'd like to have Nose run a test (or all tests) n times -- but not in parallel.
I have a few hundred tests in a project; some are some simple unit tests. Others are integration tests w/ some degree of concurrency. Frequently when debugging tests I want to "hit" a test harder; a bash loop works, but makes for a lot of cluttered output -- no more nice single "." for each passing test. Having the ability to beat on the selected tests for some number of trials seems like a natural thing to ask Nose to do, but I haven't found it anywhere in the docs.
What's the simplest way to get Nose to do this (other than a bash loop)?

You can write a nose test as a generator, and nose will then run each function
yielded:
def check_something(arg):
# some test ...
def test_something():
for arg in some_sequence:
yield (check_something, arg)
Using nose-testconfig, you could make the number of test runs a command line argument:
from testconfig import config
# ...
def test_something():
for n in range(int(config.get("runs", 1))):
yield (check_something, arg)
Which you'd call from the command line with e.g.
$ nosetests --tc=runs:5
... for more than one run.
Alternatively (but also using nose-testconfig), you could write a decorator:
from functools import wraps
from testconfig import config
def multi(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get("runs", 1))):
fn()
return wrapper
#multi
def test_something():
# some test ...
And then, if you want to divide your tests into different groups, each with their own command line argument for the number of runs:
from functools import wraps
from testconfig import config
def multi(cmd_line_arg):
def wrap(fn):
#wraps(fn)
def wrapper():
for n in range(int(config.get(cmd_line_arg, 1))):
fn()
return wrapper
return wrap
#multi("foo")
def test_something():
# some test ...
#multi("bar")
def test_something_else():
# some test ...
Which you can call like this:
$ nosetests --tc=foo:3 --tc=bar:7

You'll have to write a script to do this, but you can repeat the test names on the commandline X times.
nosetests testname testname testname testname testname testname testname
etc.

Solution I ended up using is create sh script run_test.sh:
var=0
while $1; do
((var++))
echo "*** RETRY $var"
done
Usage:
./run_test.sh "nosetests TestName"
It runs test infinitely but stops on first error.

One way is in the test itself:
Change this:
class MyTest(unittest.TestCase):
def test_once(self):
...
To this:
class MyTest(unittest.TestCase):
def assert_once(self):
...
def test_many(self):
for _ in range(5):
self.assert_once()

There should never be a reason to run a test more than once. It's important that your tests are deterministic (i.e. given the same state of the codebase, they always produce the same result.) If this isn't the case, then instead of running tests more than once, you should redesign the tests and/or code so that they are.
For example, one reason why tests fail intermittently is a race condition between the test and the code-under-test (CUT). In this circumstance, a naive response is to add a big 'voodoo sleep' to the test, to 'make sure' that the CUT is finished before the test starts asserting.
This is error-prone though, because if your CUT is slow for any reason (underpowered hardware, loaded box, busy database, etc) then it will fail sporadically. A better solution in this instance is to have your test wait for an event, rather than sleeping.
The event could be anything of your choosing. Sometimes, events you can use are already being generated (e.g. Javascript DOM events, the 'pageRendered' kind of events that Selenium tests can make use of.) Other times, it might be appropriate for you to add code to your CUT which raises an event when it's done (perhaps your architecture involves other components that are interested in events like this.)
Often though, you'll need to re-write the test such that it tries to detect whether your CUT is finished executing (e.g. does the output file exist yet?), and if not, sleeps for 50ms and then tries again. Eventually it will time out and fail, but only do this after a very long time (e.g. 100 times the expected execution time of your CUT)
Another approach is to design your CUT using 'onion/hexagonal/ports'n'adaptors' principles, which insists your business logic should be free of all external dependencies. This means that your business logic can be tested using plain old sub-millisecond unit tests, which never touch the network or filesystem. Once this is done, you need far fewer end-to-end system tests, because they are now serving just as integration tests, and don't need to try to manipulate every detail and edge-case of your business logic going through the UI. This approach will also yield big benefits in other areas, such as improved CUT design (reducing dependencies between components), tests are much easier to write, and the time taken to run the whole test suite is much reduced.
Using approaches like the above can entirely eliminate the problem of unreliable tests, and I'd recommend doing so, to improve not just your tests, but also your codebase, and your design abilities.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.