How do I log multiple very similar events gracefully in python? - python

With pythons logging module, is there a way to collect multiple events into one log entry? An ideal solution would be an extension of python's logging module or a custom formatter/filter for it so collecting logging events of the same kind happens in the background and nothing needs to be added in code body (e.g. at every call of a logging function).
Here an example that generates a large number of the same or very similar logging events:
import logging
for i in range(99999):
try:
asdf[i] # not defined!
except NameError:
logging.exception('foo') # generates large number of logging events
else: pass
# ... more code with more logging ...
for i in range(88888): logging.info('more of the same %d' % i)
# ... and so on ...
So we have the same exception 99999 times and log it. It would be nice, if the log just said something like:
ERROR:root:foo (occured 99999 times)
Traceback (most recent call last):
File "./exceptionlogging.py", line 10, in <module>
asdf[i] # not defined!
NameError: name 'asdf' is not defined
INFO:root:foo more of the same (occured 88888 times with various values)

You should probably be writing a message aggregate/statistics class rather than trying to hook onto the logging system's singletons but I guess you may have an existing code base that uses logging.
I'd also suggest that you should instantiate your loggers rather than always using the default root. The Python Logging Cookbook has extensive explanation and examples.
The following class should do what you are asking.
import logging
import atexit
import pprint
class Aggregator(object):
logs = {}
#classmethod
def _aggregate(cls, record):
id = '{0[levelname]}:{0[name]}:{0[msg]}'.format(record.__dict__)
if id not in cls.logs: # first occurrence
cls.logs[id] = [1, record]
else: # subsequent occurrence
cls.logs[id][0] += 1
#classmethod
def _output(cls):
for count, record in cls.logs.values():
record.__dict__['msg'] += ' (occured {} times)'.format(count)
logging.getLogger(record.__dict__['name']).handle(record)
#staticmethod
def filter(record):
# pprint.pprint(record)
Aggregator._aggregate(record)
return False
#staticmethod
def exit():
Aggregator._output()
logging.getLogger().addFilter(Aggregator)
atexit.register(Aggregator.exit)
for i in range(99999):
try:
asdf[i] # not defined!
except NameError:
logging.exception('foo') # generates large number of logging events
else: pass
# ... more code with more logging ...
for i in range(88888): logging.error('more of the same')
# ... and so on ...
Note that you don't get any logs until the program exits.
The result of running it this is:
ERROR:root:foo (occured 99999 times)
Traceback (most recent call last):
File "C:\work\VEMS\python\logcount.py", line 38, in
asdf[i] # not defined!
NameError: name 'asdf' is not defined
ERROR:root:more of the same (occured 88888 times)

Your question hides a subliminal assumption of how "very similar" is defined.
Log records can either be const-only (whose instances are strictly identical), or a mix of consts and variables (no consts at all is also considered a mix).
An aggregator for const-only log records is a piece of cake. You just need to decide whether process/thread will fork your aggregation or not.
For log records which include both consts and variables you'll need to decide whether to split your aggregation based on the variables you have in your record.
A dictionary-style counter (from collections import Counter) can serve as a cache, which will count your instances in O(1), but you may need some higher-level structure in order to write the variables down if you wish. Additionally, you'll have to manually handle the writing of the cache into a file - every X seconds (binning) or once the program has exited (risky - you may lose all in-memory data if something gets stuck).
A framework for aggregation would look something like this (tested on Python v3.4):
from logging import Handler
from threading import RLock, Timer
from collections import defaultdict
class LogAggregatorHandler(Handler):
_default_flush_timer = 300 # Number of seconds between flushes
_default_separator = "\t" # Seperator char between metadata strings
_default_metadata = ["filename", "name", "funcName", "lineno", "levelname"] # metadata defining unique log records
class LogAggregatorCache(object):
""" Keeps whatever is interesting in log records aggregation. """
def __init__(self, record=None):
self.message = None
self.counter = 0
self.timestamp = list()
self.args = list()
if record is not None:
self.cache(record)
def cache(self, record):
if self.message is None: # Only the first message is kept
self.message = record.msg
assert self.message == record.msg, "Non-matching log record" # note: will not work with string formatting for log records; e.g. "blah {}".format(i)
self.timestamp.append(record.created)
self.args.append(record.args)
self.counter += 1
def __str__(self):
""" The string of this object is used as the default output of log records aggregation. For example: record message with occurrences. """
return self.message + "\t (occurred {} times)".format(self.counter)
def __init__(self, flush_timer=None, separator=None, add_process_thread=False):
"""
Log record metadata will be concatenated to a unique string, separated by self._separator.
Process and thread IDs will be added to the metadata if set to True; otherwise log records across processes/threads will be aggregated together.
:param separator: str
:param add_process_thread: bool
"""
super().__init__()
self._flush_timer = flush_timer or self._default_flush_timer
self._cache = self.cache_factory()
self._separator = separator or self._default_separator
self._metadata = self._default_metadata
if add_process_thread is True:
self._metadata += ["process", "thread"]
self._aggregation_lock = RLock()
self._store_aggregation_timer = self.flush_timer_factory()
self._store_aggregation_timer.start()
# Demo logger which outputs aggregations through a StreamHandler:
self.agg_log = logging.getLogger("aggregation_logger")
self.agg_log.addHandler(logging.StreamHandler())
self.agg_log.setLevel(logging.DEBUG)
self.agg_log.propagate = False
def cache_factory(self):
""" Returns an instance of a new caching object. """
return defaultdict(self.LogAggregatorCache)
def flush_timer_factory(self):
""" Returns a threading.Timer daemon object which flushes the Handler aggregations. """
timer = Timer(self._flush_timer, self.flush)
timer.daemon = True
return timer
def find_unique(self, record):
""" Extracts a unique metadata string from log records. """
metadata = ""
for single_metadata in self._metadata:
value = getattr(record, single_metadata, "missing " + str(single_metadata))
metadata += str(value) + self._separator
return metadata[:-len(self._separator)]
def emit(self, record):
try:
with self._aggregation_lock:
metadata = self.find_unique(record)
self._cache[metadata].cache(record)
except Exception:
self.handleError(record)
def flush(self):
self.store_aggregation()
def store_aggregation(self):
""" Write the aggregation data to file. """
self._store_aggregation_timer.cancel()
del self._store_aggregation_timer
with self._aggregation_lock:
temp_aggregation = self._cache
self._cache = self.cache_factory()
# ---> handle temp_aggregation and write to file <--- #
for key, value in sorted(temp_aggregation.items()):
self.agg_log.info("{}\t{}".format(key, value))
# ---> re-create the store_aggregation Timer object <--- #
self._store_aggregation_timer = self.flush_timer_factory()
self._store_aggregation_timer.start()
Testing this Handler class with random log severity in a for-loop:
if __name__ == "__main__":
import random
import logging
logger = logging.getLogger()
handler = LogAggregatorHandler()
logger.addHandler(handler)
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.DEBUG)
logger.info("entering logging loop")
for i in range(25):
# Randomly choose log severity:
severity = random.choice([logging.DEBUG, logging.INFO, logging.WARN, logging.ERROR, logging.CRITICAL])
logger.log(severity, "test message number %s", i)
logger.info("end of test code")
If you want to add more stuff, this is what a Python log record looks like:
{'args': ['()'],
'created': ['1413747902.18'],
'exc_info': ['None'],
'exc_text': ['None'],
'filename': ['push_socket_log.py'],
'funcName': ['<module>'],
'levelname': ['DEBUG'],
'levelno': ['10'],
'lineno': ['17'],
'module': ['push_socket_log'],
'msecs': ['181.387901306'],
'msg': ['Test message.'],
'name': ['__main__'],
'pathname': ['./push_socket_log.py'],
'process': ['65486'],
'processName': ['MainProcess'],
'relativeCreated': ['12.6709938049'],
'thread': ['140735262810896'],
'threadName': ['MainThread']}
One more thing to think about:
Most features you run depend on a flow of several consecutive commands (which will ideally report log records accordingly); e.g. a client-server communication will typically depend on receiving a request, processing it, reading some data from the DB (which requires a connection and some read commands), some kind of parsing/processing, constructing the response packet and reporting the response code.
This highlights one of the main disadvantages of using an aggregation approach: by aggregating log records you lose track of the time and order of the actions that took place. It will be extremely difficult to figure out what request was incorrectly structured if you only have the aggregation at hand.
My advice in this case is that you keep both the raw data and the aggregation (using two file handlers or something similar), so that you can investigate a macro-level (aggregation) and a micro-level (normal logging).
However, you are still left with the responsibility of finding out that things have gone wrong, and then manually investe what caused it. When developing on your PC this is an easy enough task; but deploying your code in several production servers makes these tasks cumbersome, wasting a lot of your time.
Accordingly, there are several companies developing products specifically for log management. Most aggregate similar log records together, but others incorporate machine learning algorithms for automatic aggregation and learning your software's behavior. Outsourcing your log handling can then enable you to focus on your product, instead of on your bugs.
Disclaimer: I work for Coralogix, one such solution.

You can subclass the logger class and override the exception method to put your error types in a cache until they reach a certain counter before they are emitted to the log.
import logging
from collections import defaultdict
MAX_COUNT = 99999
class MyLogger(logging.getLoggerClass()):
def __init__(self, name):
super(MyLogger, self).__init__(name)
self.cache = defaultdict(int)
def exception(self, msg, *args, **kwargs):
err = msg.__class__.__name__
self.cache[err] += 1
if self.cache[err] > MAX_COUNT:
new_msg = "{err} occurred {count} times.\n{msg}"
new_msg = new_msg.format(err=err, count=MAX_COUNT, msg=msg)
self.log(logging.ERROR, new_msg, *args, **kwargs)
self.cache[err] = None
log = MyLogger('main')
try:
raise TypeError("Useful error message")
except TypeError as err:
log.exception(err)
Please note this isn't copy paste code.
You need to add your handlers (I recommend formatter, too) yourself.
https://docs.python.org/2/howto/logging.html#handlers
Have fun.

Create a counter and only log it for count=1, then increment thereafter and write out in a finally block (to ensure it gets logged no matter how bad the application crashes and burns). This could of course pose an issue if you have the same exception for different reasons, but you could always search for the line number to verify it's the same issue or something similar. A minimal example:
name_error_exception_count = 0
try:
for i in range(99999):
try:
asdf[i] # not defined!
except NameError:
name_error_exception_count += 1
if name_error_exception_count == 1:
logging.exception('foo')
else: pass
except Exception:
pass # this is just to get the finally block, handle exceptions here too, maybe
finally:
if name_error_exception_count > 0:
logging.exception('NameError exception occurred {} times.'.format(name_error_exception_count))

Related

How to overrider Logger class with method to select log level and output line of the log

I need to record logs in many places of my Django application and I want to have it in a standardized way so that when people contribute they don't have to worry to much about it.
So instead of calling:
logger.info(<standardized message here>)
I would like to write a function or probably a class that inherits Logger and selects the correct .info, .warning, .error and so on and outputs the message the way we want.
So in the code I would only call
log(request, 200, "info", "Success message")
And the function I wrote looks like this:
logger = logging.getLogger(__name__)
def log(request=None, status_code=None, level=None, message=None):
"""
Create a log stream in a standardized format
<request_path> <request_method> <status_code>: user <user_id> <message>
"""
method = request.method
path = request.path
user_id = None
if not request.user.is_anonymous:
user_id = request.user.id
log = f"{path} {method} {status_code}: user {user_id} {message}"
if level == 'info':
logger_type = logger.info
elif level == 'warning':
logger_type = logger.warning
elif level == 'error':
logger_type = logger.error
elif level == 'debug':
logger_type = logger.debug
return logger_type(log)
The problem is that our log formatter also records the line in the code where the log happened, and because the log is called in the log function, the row reflects the return logger_type(log) instead of the actual line in the code where log was called.
How can I write a class with a method to do the automatic selection of log level based on the input and still stream the log from the line where it happened?
I want to have it in a standardized way
If you really want to do things in a standardized way, then use what the logging module gives you and don't create your own helper functions.
The logging already lets you pass in "extra" arguments for the logging message (doc):
logger.info("success message", extra={"request": request, "status_code": 200})
You can then put your logic extract parts of the request in a formatter object.
As well as following the standard, this will give you additional flexibility, as your formatter can do different things based on what extra parameters it gets.
One thing that the Python logging module is missing that other loggers have is the idea of a thread-local "context" object. You could use such and object to preserve the request at the time it's processed, so that it's available in all log messages without being passed in explicitly. You could implement this by adding a request_context dictionary to the current thread, and look for it in your formatter.
If you would like to stick with your helper function approach, the logging methods accept kwarg stacklevel that specifies the corresponding number of stack frames to skip when computing the line number and function name ref. The default is 1, so setting it to 2 should look to the frame above you helper function. For example:
import logging
logging.basicConfig(format="%(lineno)d")
logger = logging.getLogger(__name__)
def log():
logger.error("message", stacklevel=2)
log()

Testing a function based on third party service

I'm trying to figure out how to create unit tests for a function, which behavior is based on a third party service.
Suppose a function like this:
def sync_check():
delta_secs = 90
now = datetime.datetime.now().utcnow()
res = requests.get('<url>')
alert = SlackAlert()
last_value = res[-1]['date'] # Last element of the array is the most recent
secs = (now - last_value).seconds
if secs >= delta_secs:
alert.notify("out of sync. Delay: {} seconds".format(secs))
else:
alert.notify('in sync')
What's best practice to write unit test for this function? I need to test both if and else branches, but this depends on the third party service.
The first thing that come to my mind is to create a fake webserver and point to that one (changing url) but this way the codebase would include testing logic, like:
if test:
url = <mock_web_server_url>
else:
url = <third_party_service_url>
Moreover, unit testing would trigger slack alerts, which doesn't have to happen.
So there I shoulde change again the codebase like:
if secs >= delta_secs:
if test:
logging.debug("out of sync alert sent - testing mode")
else:
alert.notify("out of sync. Delay: {} seconds".format(secs))
else:
if test:
logging.debug("in sync alert sent - testing mode")
else:
alert.notify('in sync')
Which I don't really like.
Am I missing any design to solve this problem?
Check out Dependency Injection to test code that depends on third party services, without having to check whether you're running in test mode, like in your example. The basic idea is to have the slack alert service be an argument of your function, so for unit testing you can use a fake service that acts the way you want it to for each test.
Your code would end up looking something like this:
def sync_check(alert):
delta_secs = 90
now = datetime.datetime.now().utcnow()
res = requests.get('<url>')
last_value = res[-1]['date'] # Last element of the array is the most recent
secs = (now - last_value).seconds
if secs >= delta_secs:
alert.notify("out of sync. Delay: {} seconds".format(secs))
else:
alert.notify('in sync')
and in a test case, you could have your alert object be something as simple as:
class TestAlert:
def __init__(self):
self.message = None
def notify(self, message):
self.message = message
You could then test your function by passing on an instance of your TestAlert class, and check the logged output if you want to, by accessing the message attribute. This code would not access any third party services.
def test_sync_check():
alert = TestAlert()
sync_check(alert)
assert alert.message == 'in sync'

Understanding flow of execution of Python code

I'm trying to do home assignment connected with python from Data Manipulation at Scale: Systems and Algorithms at Curesra. Generally I have problems with understanding base code which was presented as an example of MapReduce alogorythm. I would be grateful for helping me understand it in 2 places, details below.
I tired to go step by step through code flow of below two files after running command:
python wordcount.py 'data/books.json'
File wordcount.py is opened
mr = MapReduce.MapReduce() - me object is created
def __init__(self): part from MapReduce.py is
executed
We come back to wordcount.py
Functions def mapper(record): and def reducer(key,list_of_values): are created but for the time being without execution
Python go to if __name__ == '__main__':
` inputdata = open(sys.argv[1]) - json file is assigned to a
variable
mr.execute(inputdata, mapper, reducer) - A call to the function from MapReduce.py.
And here is my first question we haven't deffined mapper or reducer variable/object so far. Is it just null/no value passed to this function or we somehow defined this variable before but I missed this?
Later me move to def execute(self, data, mapper, reducer): in
MapReduce.py
And there we have mapper(record).
So this is reference to a function in wordcount.py, am I right? But if we have reference to a function in different file shouldn't we use import at the beginning of the file and define from which file this function came?
(...) further code execution
wordcount.py file:
import MapReduce
import sys
"""
Word Count Example in the Simple Python MapReduce Framework
"""
mr = MapReduce.MapReduce()
# =============================
# Do not modify above this line
def mapper(record):
# key: document identifier
# value: document contents
key = record[0]
value = record[1]
words = value.split()
for w in words:
mr.emit_intermediate(w, 1)
def reducer(key, list_of_values):
# key: word
# value: list of occurrence counts
total = 0
for v in list_of_values:
total += v
mr.emit((key, total))
# Do not modify below this line
# =============================
if __name__ == '__main__':
inputdata = open(sys.argv[1])
mr.execute(inputdata, mapper, reducer)
MapReduce.py file:
import json
class MapReduce:
def __init__(self):
self.intermediate = {}
self.result = []
def emit_intermediate(self, key, value):
self.intermediate.setdefault(key, [])
self.intermediate[key].append(value)
def emit(self, value):
self.result.append(value)
def execute(self, data, mapper, reducer):
for line in data:
record = json.loads(line)
mapper(record)
for key in self.intermediate:
reducer(key, self.intermediate[key])
#jenc = json.JSONEncoder(encoding='latin-1')
jenc = json.JSONEncoder()
for item in self.result:
print jenc.encode(item)
Thank you in advance for help with that.
In python everything is a object, that include functions, so you can pass a functionA as argument to another functionB (or class or whenever), and if functionB expect that you to do it, it will assume that you give it a functions with the right firm and a proceed as normal.
In yours case
mr.execute(inputdata, mapper, reducer)
here mapper, reducer are the functions previously defined that are passed as argument to the method execute of the instance mr of the class MapReduce and as you can see, said method use it as the functions that it expect.
Thank to this you can, as the that code show, make generic code that do some calculus that can be used in similar way by many applications by given the user the options of supplies his/her own functions.
A much more generic example of this is the function map, this function receive a function that do something, map don't care what it does or where it comefrom, only that receive as many argument as map himself receive (others that say functions) and return a value to build a new list with the results.

Simultaneously modify different keys in ZODB

I'm using ZODB as a persistent storage for objects that are going to be modified through a webservice.
Below is an example to which I reduced the issue.
The increment-function is what is called from multiple threads.
My problem is, that when increment is called simultaneously from two threads, for different keys, I'm getting the conflict-error.
I imagine it should be possible to resolve this, at least as long different keys are modified, in a proper way?
If so, I didn't manage to find an example on how to... (the zodb-documentation seems to be somewhat spread across different sites :/ )
Glad about any ideas...
import time
import transaction
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
from ZODB.POSException import ConflictError
def test_db():
store = FileStorage('zodb_storage.fs')
return DB(store)
db_test = test_db()
# app here is a flask-app
#app.route('/increment/<string:key>')
def increment(key):
'''increment the value of a certain key'''
# open connection
conn = db_test.open()
# get the current value:
root = conn.root()
val = root.get(key,0)
# calculate new value
# in the real application this might take some seconds
time.sleep(0.1)
root[key] = val + 1
try:
transaction.commit()
return '%s = %g' % (key, val)
except ConflictError:
transaction.abort()
return 'ConflictError :-('
You have two options here: implement conflict resolution, or retry the commit with fresh data.
Conflict resolution only applies to custom types you store in the ZODB, and can only be applied if you know how to merge your change into the newly-changed state.
The ZODB looks for a _p_resolveConflict() method on custom types and calls that method with the old state, the saved state you are in conflict with, and the new state you tried to commit; you are supposed to return the merged state. For a simple counter, like in your example, that'd be a as simple as updating the saved state with the change between the old and new states:
class Counter(Persistent):
def __init__(self, start=0):
self._count = start
def increment(self):
self._count += 1
return self._count
def _p_resolveConflict(self, old, saved, new):
# default __getstate__ returns a dictionary of instance attributes
saved['_count'] += new['_count'] - old['_count']
return saved
The other option is to retry the commit; you want to limit the number of retries, and you probably want to encapsulate this in a decorator on your method, but the basic principle is that you loop up to a limit, make your calculations based on ZODB data (which, after a conflict error, will auto-read fresh data where needed), then attempt to commit. If the commit is successful you are done:
max_retries = 10
retry = 0
conn = db_test.open()
root = conn.root()
while retry < max_retries:
val = root.get(key,0)
time.sleep(0.1)
root[key] = val + 1
try:
transaction.commit()
return '%s = %g' % (key, val)
except ConflictError:
retry += 1
raise CustomExceptionIndicatingTooManyRetries

Python unittest.TestCase execution order

Is there a way in Python unittest to set the order in which test cases are run?
In my current TestCase class, some testcases have side effects that set conditions for the others to run properly. Now I realize the proper way to do this is to use setUp() to do all setup related things, but I would like to implement a design where each successive test builds slightly more state that the next can use. I find this much more elegant.
class MyTest(TestCase):
def test_setup(self):
# Do something
def test_thing(self):
# Do something that depends on test_setup()
Ideally, I would like the tests to be run in the order they appear in the class. It appears that they run in alphabetical order.
Don't make them independent tests - if you want a monolithic test, write a monolithic test.
class Monolithic(TestCase):
def step1(self):
...
def step2(self):
...
def _steps(self):
for name in dir(self): # dir() result is implicitly sorted
if name.startswith("step"):
yield name, getattr(self, name)
def test_steps(self):
for name, step in self._steps():
try:
step()
except Exception as e:
self.fail("{} failed ({}: {})".format(step, type(e), e))
If the test later starts failing and you want information on all failing steps instead of halting the test case at the first failed step, you can use the subtests feature: https://docs.python.org/3/library/unittest.html#distinguishing-test-iterations-using-subtests
(The subtest feature is available via unittest2 for versions prior to Python 3.4: https://pypi.python.org/pypi/unittest2 )
It's a good practice to always write a monolithic test for such expectations. However, if you are a goofy dude like me, then you could simply write ugly looking methods in alphabetical order so that they are sorted from a to b as mentioned in the Python documentation - unittest — Unit testing framework
Note that the order in which the various test cases will be run is
determined by sorting the test function names with respect to the
built-in ordering for strings
Example
def test_a_first():
print "1"
def test_b_next():
print "2"
def test_c_last():
print "3"
From unittest — Unit testing framework, section Organizing test code:
Note: The order in which the various tests will be run is determined by sorting the test method names with respect to the built-in ordering for strings.
So just make sure test_setup's name has the smallest string value.
Note that you should not rely on this behavior — different test functions are supposed to be independent of the order of execution. See ngcohlan's answer above for a solution if you explicitly need an order.
Another way that I didn't see listed in any related questions: Use a TestSuite.
Another way to accomplish ordering is to add the tests to a unitest.TestSuite. This seems to respect the order in which the tests are added to the suite using suite.addTest(...). To do this:
Create one or more TestCase subclasses,
class FooTestCase(unittest.TestCase):
def test_ten():
print('Testing ten (10)...')
def test_eleven():
print('Testing eleven (11)...')
class BarTestCase(unittest.TestCase):
def test_twelve():
print('Testing twelve (12)...')
def test_nine():
print('Testing nine (09)...')
Create a callable test-suite generation added in your desired order, adapted from the documentation and this question:
def suite():
suite = unittest.TestSuite()
suite.addTest(BarTestCase('test_nine'))
suite.addTest(FooTestCase('test_ten'))
suite.addTest(FooTestCase('test_eleven'))
suite.addTest(BarTestCase('test_twelve'))
return suite
Execute the test-suite, e.g.,
if __name__ == '__main__':
runner = unittest.TextTestRunner(failfast=True)
runner.run(suite())
For context, I had a need for this and wasn't satisfied with the other options. I settled on the above way of doing test ordering.
I didn't see this TestSuite method listed any of the several "unit-test ordering questions" (e.g., this question and others including execution order, or changing order, or tests order).
I ended up with a simple solution that worked for me:
class SequentialTestLoader(unittest.TestLoader):
def getTestCaseNames(self, testCaseClass):
test_names = super().getTestCaseNames(testCaseClass)
testcase_methods = list(testCaseClass.__dict__.keys())
test_names.sort(key=testcase_methods.index)
return test_names
And then
unittest.main(testLoader=utils.SequentialTestLoader())
A simple and flexible way is to assign a comparator function to unittest.TestLoader.sortTestMethodsUsing:
Function to be used to compare method names when sorting them in getTestCaseNames() and all the loadTestsFrom*() methods.
Minimal usage:
import unittest
class Test(unittest.TestCase):
def test_foo(self):
""" test foo """
self.assertEqual(1, 1)
def test_bar(self):
""" test bar """
self.assertEqual(1, 1)
if __name__ == "__main__":
test_order = ["test_foo", "test_bar"] # could be sys.argv
loader = unittest.TestLoader()
loader.sortTestMethodsUsing = lambda x, y: test_order.index(x) - test_order.index(y)
unittest.main(testLoader=loader, verbosity=2)
Output:
test_foo (__main__.Test)
test foo ... ok
test_bar (__main__.Test)
test bar ... ok
Here's a proof of concept for running tests in source code order instead of the default lexical order (output is as above).
import inspect
import unittest
class Test(unittest.TestCase):
def test_foo(self):
""" test foo """
self.assertEqual(1, 1)
def test_bar(self):
""" test bar """
self.assertEqual(1, 1)
if __name__ == "__main__":
test_src = inspect.getsource(Test)
unittest.TestLoader.sortTestMethodsUsing = lambda _, x, y: (
test_src.index(f"def {x}") - test_src.index(f"def {y}")
)
unittest.main(verbosity=2)
I used Python 3.8.0 in this post.
Tests which really depend on each other should be explicitly chained into one test.
Tests which require different levels of setup, could also have their corresponding setUp() running enough setup - various ways thinkable.
Otherwise unittest handles the test classes and test methods inside the test classes in alphabetical order by default (even when loader.sortTestMethodsUsing is None). dir() is used internally which sorts by guarantee.
The latter behavior can be exploited for practicability - e.g. for having the latest-work-tests run first to speed up the edit-testrun-cycle.
But that behavior should not be used to establish real dependencies. Consider that tests can be run individually via command-line options etc.
One approach can be to let those sub tests be not be treated as tests by the unittest module by appending _ in front of them and then building a test case which builds on the right order of these sub-operations executed.
This is better than relying on the sorting order of unittest module as that might change tomorrow and also achieving topological sort on the order will not be very straightforward.
An example of this approach, taken from here (Disclaimer: my own module), is as below.
Here, test case runs independent tests, such as checking for table parameter not set (test_table_not_set) or test for primary key (test_primary_key) still in parallel, but a CRUD test makes sense only if done in right order and state set by previous operations. Hence those tests have been rather made just separate unit, but not test. Another test (test_CRUD) then builds a right order of those operations and tests them.
import os
import sqlite3
import unittest
from sql30 import db
DB_NAME = 'review.db'
class Reviews(db.Model):
TABLE = 'reviews'
PKEY = 'rid'
DB_SCHEMA = {
'db_name': DB_NAME,
'tables': [
{
'name': TABLE,
'fields': {
'rid': 'uuid',
'header': 'text',
'rating': 'int',
'desc': 'text'
},
'primary_key': PKEY
}]
}
VALIDATE_BEFORE_WRITE = True
class ReviewTest(unittest.TestCase):
def setUp(self):
if os.path.exists(DB_NAME):
os.remove(DB_NAME)
def test_table_not_set(self):
"""
Tests for raise of assertion when table is not set.
"""
db = Reviews()
try:
db.read()
except Exception as err:
self.assertIn('No table set for operation', str(err))
def test_primary_key(self):
"""
Ensures, primary key is honored.
"""
db = Reviews()
db.table = 'reviews'
db.write(rid=10, rating=5)
try:
db.write(rid=10, rating=4)
except sqlite3.IntegrityError as err:
self.assertIn('UNIQUE constraint failed', str(err))
def _test_CREATE(self):
db = Reviews()
db.table = 'reviews'
# backward compatibility for 'write' API
db.write(tbl='reviews', rid=1, header='good thing', rating=5)
# New API with 'create'
db.create(tbl='reviews', rid=2, header='good thing', rating=5)
# Backward compatibility for 'write' API, without tbl,
# explicitly passed
db.write(tbl='reviews', rid=3, header='good thing', rating=5)
# New API with 'create', without table name explicitly passed.
db.create(tbl='reviews', rid=4, header='good thing', rating=5)
db.commit() # Save the work.
def _test_READ(self):
db = Reviews()
db.table = 'reviews'
rec1 = db.read(tbl='reviews', rid=1, header='good thing', rating=5)
rec2 = db.read(rid=1, header='good thing')
rec3 = db.read(rid=1)
self.assertEqual(rec1, rec2)
self.assertEqual(rec2, rec3)
recs = db.read() # Read all
self.assertEqual(len(recs), 4)
def _test_UPDATE(self):
db = Reviews()
db.table = 'reviews'
where = {'rid': 2}
db.update(condition=where, header='average item', rating=2)
db.commit()
rec = db.read(rid=2)[0]
self.assertIn('average item', rec)
def _test_DELETE(self):
db = Reviews()
db.table = 'reviews'
db.delete(rid=2)
db.commit()
self.assertFalse(db.read(rid=2))
def test_CRUD(self):
self._test_CREATE()
self._test_READ()
self._test_UPDATE()
self._test_DELETE()
def tearDown(self):
os.remove(DB_NAME)
you can start with:
test_order = ['base']
def index_of(item, list):
try:
return list.index(item)
except:
return len(list) + 1
2nd define the order function:
def order_methods(x, y):
x_rank = index_of(x[5:100], test_order)
y_rank = index_of(y[5:100], test_order)
return (x_rank > y_rank) - (x_rank < y_rank)
3rd set it in the class:
class ClassTests(unittest.TestCase):
unittest.TestLoader.sortTestMethodsUsing = staticmethod(order_methods)
ncoghlan's answer was exactly what I was looking for when I came to this question. I ended up modifying it to allow each step-test to run, even if a previous step had already thrown an error; this helps me (and maybe you!) to discover and plan for the propagation of error in multi-threaded database-centric software.
class Monolithic(TestCase):
def step1_testName1(self):
...
def step2_testName2(self):
...
def steps(self):
'''
Generates the step methods from their parent object
'''
for name in sorted(dir(self)):
if name.startswith('step'):
yield name, getattr(self, name)
def test_steps(self):
'''
Run the individual steps associated with this test
'''
# Create a flag that determines whether to raise an error at
# the end of the test
failed = False
# An empty string that the will accumulate error messages for
# each failing step
fail_message = ''
for name, step in self.steps():
try:
step()
except Exception as e:
# A step has failed, the test should continue through
# the remaining steps, but eventually fail
failed = True
# Get the name of the method -- so the fail message is
# nicer to read :)
name = name.split('_')[1]
# Append this step's exception to the fail message
fail_message += "\n\nFAIL: {}\n {} failed ({}: {})".format(name,
step,
type(e),
e)
# Check if any of the steps failed
if failed is True:
# Fail the test with the accumulated exception message
self.fail(fail_message)
I also wanted to specify a particular order of execution to my tests. The main differences to other answers in here are:
I wanted to perverse a more verbose test
method name without replacing whole name with step1, step2 etc.
I also wanted the printed method execution in the console to have some granularity apposed to using a Monolithic solution in some of the other answers.
So for the execution for monolithic test method is looked like this:
test_booking (__main__.TestBooking) ... ok
I wanted:
test_create_booking__step1 (__main__.TestBooking) ... ok
test_process_booking__step2 (__main__.TestBooking) ... ok
test_delete_booking__step3 (__main__.TestBooking) ... ok
How to achieve this
I provided a suffix to my method name with the __step<order> for example (order of definition is not important):
def test_create_booking__step1(self):
[...]
def test_delete_booking__step3(self):
[...]
def test_process_booking__step2(self):
[...]
For the test suite override the __iter__ function which will build an iterator for the test methods.
class BookingTestSuite(unittest.TestSuite):
""" Extends the functionality of the the standard test suites """
def __iter__(self):
for suite in self._tests:
suite._tests = sorted(
[x for x in suite._tests if hasattr(x, '_testMethodName')],
key = lambda x: int(x._testMethodName.split("step")[1])
)
return iter(self._tests)
This will sort test methods into order and execute them accordingly.

Categories

Resources