Python property testing with timeout

Python property testing with timeout - python

I have a certain amount of time to test a system. Can I write a Python property test that runs property tests until one hour is up? I looked for a solution in hypothesis but I couldn't find one.
I imagine that property-testing libraries have some kind of test-case generator, in which can I could just pull and execute from it until the timeout is up. This would be an acceptable lazy solution.

Hypothesis does not have a generator you can pull from - the internals are implemented quite differently to the usual lazy-infinite-list construction used by Quickcheck (because... Haskell).
We have an open issue to add a fuzzing mode, which will (hopefully) be the subject of an undergrad group project at Imperial College in early 2019.
Until that's ready, you can add #settings(timeout=60*60, suppress_health_check=HealthCheck.all(), max_examples=10**9) to a test and it should run for an hour (or until finding a bug).

Related

Run Unittests in Python with highlighting of Keywords in Terminal (determined during the run)

I'm administrating a quite a huge amount of Python Unittests and as sometimes the Stacktraces are weirdly long I had one thought on optimizing the outputs in order to see the file I'm interested faster.
Currently I am running tests with ack:
python3 unittests.py | ack --passthru 'GraphTests.py|Versioning/Testing/Tests.py'
Which does work as I desired it. But as the amount of tests keeps growing and I want to keep it dynamic I wanted to read the classes from whatever I've set in the testing suite.
suite.addTest(loader.loadTestsFromModule(UC3))
What would be the best way to accomplish this?
My first thought was to split the unittests up into two files, one loader, one caller. The loader adds all the unittests as before and executes the caller.py with an ack, including the list of files.
Is that a reasonable approach? Are there better ways? I don't think it is possible to fill the ack patterns in after I executed the line.
There's also the idea of just piping the results into a file that I read afterwards, but from my experience until now I am not sure if that will work as planned and not just cause extra-troubles. (I use an experimental unittest-framework that adds colouring during the execution of unittests using format-characters like '\033[91m' - see: https://stackoverflow.com/questions/15580303/python-output-complex-line-with-floats-colored-by-value)
Is there an option I don't see?
Edit (1): Also I forgot to add: Getting into the debugger is a different experience. With ack it doesn't seem to work properly any more

How to compare cpp and python outputs?

I am trying to reproduce some functionality from Python in C++. It is an involved numerical method with a bunch of subfunctions.
Is there a nice way to compare the values of the Python functions with the C++ functions that I am writing and that should mirror them?
Could someone paste some code or give a mini tutorial?
Or some pointers or references?
Many thanks!
Artabalt

Testing.
Write tests for the functionality on Python (unless they already exist; Python code is usually tested).
Port the tests to C++. Then make your C++ functions pass the tests. Ideally, make the tests a target on your makefile and run them whenever you can.
Edit:
You can test randomly, if it's a good idea or not, might depend on your particular case.
The rule of thumb is to use a couple of each border case (you can use code coverage to see if there's a case that you're missing).
As an example, at my work I use an integration test that requires a lot of components. We simulate a betting process (all the process excluding user interface), at a given point in time, with a hardcoded server response, and we simulate the hardware printer with a redirection to a bmp.
Now, you can't test everything. I can't test every valid combination of numbers (6 numbers from 0 to 36?), in every possible order. Neither can I test every invalid combination (infinite?). You test a couple. The meaningful ones.
So we give it always the same numbers. The test is that it has to produce always the same bmp (has to render fonts the same, has to have the same texts, the same betting date, etc.).
Now, depending on your code and your application is the level of automation that generating this tests can have.
I don't know your particular case, but I would start creating a little program (the simpler, the better) that uses your library. One implementation in Python, one implementation in C++. If you were testing a string library, for example, you do a small program that deletes a letter inside a string, adds up a text, erases every other letter, etc.
Then you automate for a couple of test cases:
cat largetext.txt | ./my_python_program > output.pyhthon_program.txt
cat largetext.txt | ./my_cpp_program > output.cpp_program.txt
diff output.pyhthon_program.txt output.cpp_program.txt || echo "ERROR"
The ideal case is that everytime you compile, the tests get run (this can be done if the code is simple; if not, you can do testing at the end of the day, as an example).
This doesn't grant the programs are the same (you can't test every possible input). But gives you some confidence. If you see something is not right, you first add it to your test, then make it fail, then fix it until it passess.
Regards.

Statistical Profiling in Python

I would like to know that the Python interpreter is doing in my production environments.
Some time ago I wrote a simple tool called live-trace which runs a daemon thread which collects stacktraces every N milliseconds.
But signal handling in the interpreter itself has one disadvantage:
Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time.
Source: https://docs.python.org/2/library/signal.html
How could I work around above constraint and get a stacktrace, even if the interpreter is in some C code for several seconds?
Related: https://github.com/23andMe/djdt-flamegraph/issues/5

I use py-spy with speedscope now. It is a very cool combination.
py-spy works on Windows/Linux/macOS, can output flame graphs by its own and is actively deployed, eg. subprocess profiling support was added in October 2019.

Have you tried Pyflame? It's based on ptrace, so it shouldn't be affected by CPython's signal handling subtleties.

Maybe the perf-tool from Brendan Gregg can help

how to handle time-consuming tests

I have tests which have a huge variance in their runtime. Most will take much less than a second, some maybe a few seconds, some of them could take up to minutes.
Can I somehow specify that in my Nosetests?
In the end, I want to be able to run only a subset of my tests which take e.g. less than 1 second (via my specified expected runtime estimate).

Have a look at this write up about attribute plugin for nose tests, where you can manually tag tests as #attr('slow') and #attr('fast'). You can tun nosetests -a '!slow' afterward to run your tests quickly.
It would be great if you can do it automatically, but I'm afraid that you would have to write additional code to do it on the fly. If you are into rapid development, I would run the nose with xunit xml output enabled (which tracks the runtime of each test). Your test module can dynamically read in your xml output file from previous runs and set attribute settings for tests accordingly to filter out quick tests. This way you do not have to do it manually, alas with more work (and you have to run all tests at least once).

Making Nose fail slow tests

I want to my tests to fail if they take longer than a certain time to run (say 500ms) because it sucks when a load of slightly slow tests mount up and suddenly you have this big delay every time you run the test suite. Are there any plugins or anything for Nose that do this already?

For cases where where timing is important (e.g. realtime requirements):
http://nose.readthedocs.org/en/latest/testing_tools.html
nose.tools.timed(limit)
Test must finish within specified time limit to pass.
Example use:
from nose.tools import timed
#timed(.1)
def test_that_fails():
time.sleep(.2)

I respectfully suggest that changing the meaning of "broken" is a bad idea.
The meaning of a failed/"red" test should never be anything other than "this functionality is broken". To do anything else risks diluting the value of the tests.
If you implement this and then next week a handful of tests fail, would it be an indicator that
Your tests are running slowly?
The code is broken?
Both of the above at the same time?
I suggest it would be better to gather MI from your build process and monitor it in order to spot slow tests building up, but let red mean "broken functionality" rather then "broken functionality and/or slow test."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.