Debug memory usage during py.test run

Debug memory usage during py.test run - python

We have test which passes if run stand alone. But if we run all tests, py.test fails since no memory is left.
My question: How to display the memory usage of the py.test process before and after each test?
This way we could be able to find the tests which have memory leaks.
Other solutions are welcome, too.
We run Python 2.7 on linux.
The root of the memory problem was solved: Django changed Queryset iteration to load all instances. In my case millions :-) See: https://docs.djangoproject.com/en/1.6/releases/1.6/#queryset-iteration
But I am still interested in the general question.

pytest-xdist plugin gives you --boxed option, where each test is ran in own subprocess.
You that to work around your test, and also to track resource usage (not sure how atm).
Finally, it is quite possible that it is interaction of your tests and not a single test alone that piles up memory. You can use -k selector or pytest-random plugin's flag --random to verify my conjecture.
https://pypi.python.org/pypi/pytest-xdist
https://pypi.python.org/pypi/pytest-random

Related

Profile cpu and memory usage during py.test execution

I am facing the following project. I want to create pictures showing the cpu and memory level during the execution of a set of performance test suites. I would appreciate any suggestion.
Currently the only approach I am considering is to use the command top or the python module psutil and execute them in parallel to the tests. However I was wondering whether there exists already a better approach, may be a py.test plugin.
A nice to have would be be able to compare those parameters from one execution to another.
The tests are executed Under Linux (Ubuntu).

There are two useful tools for line-by-line timing and memory consumption for functions:
line profiler
memory profiler
Installations are easy
$ pip install line_profiler memory_profiler
To do the profiling, decorate your function with #profile, and then run
$ python -m memory_profiler example.py or $ python -m line_profiler example.py

There are a whole bunch of ways of doing this, ranging from getting broad system statistics and then averaging them (top), to using processor hardware counters (e.g. using Intel VTune).
psutil seems perfectly fine. My only comment is to make sure you take many measurements and then average them to get rid of spurious spikes and such.
Some other possible ways of taking these measurements are /proc/[pid]/stat (see man page), time, or if you get really obsessive, you can use some programatic techniques, e.g. for Windows.
Here's a good discussion about programmatically getting benchmarking values. It also discusses some of the traps you can get into, which you should be familiar with even if you are not using a programmatic method.
Intel has a lot of good information about processor benchmarking; it's their bread and butter.
The only other comment I can make is that you need to select your benchmark carefully. Intel emphasizes CPU because it is what they are best at. The same is true for other companies. In truth, there are a whole host of other important factors that come into play depending upon the application domain.
Look at the different media-based benchmarks. They may be more appropriate than one simply based upon processor time. I can't readily find the benchmarks but bing is a wonder.

What's a good way to find the difference between two nose tests runs?

I am trying to prepare a pull request for my changes to matplotlib here: https://github.com/shmuller/matplotlib.git. After merging with upstream/master (https://github.com/matplotlib/matplotlib.git), I wanted to find out if I broke anything, so I run the test suite (python tests.py -v -a) on upstream/master. I get:
Ran 4688 tests in 555.109s
FAILED(KNOWNFAIL=330, SKIP=9, errors=197, failures=16)
Now on my merged branch:
Ran 4682 tests in 555.070s
FAILED(KNOWNFAIL=330, SKIP=9, errors=200, failures=18)
Darn! Quite close, but not the same! So I did break something that wasn't broken before. Since there are thousands of tests, and lots of errors and failures to begin with, it does not appear obvious to find out what I broke.
So my question is: What's a good way to find out which tests broke that weren't broken before?
tests.py essentially does:
import nose
nose.main()
so I am hoping for a feature in nose that helps me figure out what I broke, but couldn't find anything in the help (nosetests --help). I obviously could log and diff the whole output, but I'm hoping for a more elegant solution.

Save the logs to two files, A and B. Then use a diff tool like Meld or Emacs' M-x ediff to see the differences.
If you have a guess about what test(s) are relevant to the code you changed, then you could run
nosetests /path/to/test_file.py
Fix the errors relevant to the code you changed, and then see if outputs are identical (by running diff).
If you run
nosetests --with-id
then on subsequent runs, adding the --failed flag will cause nosetests to re-run only the failed tests. That may also help you zero-in on the differences.
nosetests --with-id --failed

Fabric: How can I unit test my fabfile?

In the previous project I was working on, our fabfile got out of control. While the rest of our project was well-tested, we didn't write a single test for our fabfile. Refactoring was scary, and we weren't confident a fabric command would work how we expected until we ran the command.
I'm starting a new project, and I'd like to make sure our fabfile is well-tested from the beginning. Obey the Testing Goat has a great article discussing some possible strategies, yet it has more questions than answers. Using fabtest is a possibility, although it seems to be dead.
Has anyone successfully unit tested their fabfile? If so, how?

run your Fabfile task in a Docker instance
use docker diff to verify that the right files were changed by
the Fabfile.
This is still quite a bit of work, but it allows testing without excessive Fabfile modifications.

Have you tried python-vagrant? It seems to do the same thing that fabtest does, but it includes some Fabric demos and is still used and maintained.

The slides - mentioned by Henrik Andersson - from back then are available here
Robin Kåveland Hansen replied to me:
There are some examples of the types of refactoring that we did in order to keep our fabric code well-tested there.
In general I would say the best advice is to try avoiding low-level code such as shell commands in higher level code that makes decisions about what code to run, eg. isolate effect-full code from code that makes decisions.
Branching increases the amount of test-cases that you need and it's a lot more effort to write good test-cases for code that changes state on some server.
At the time, we used mock to mock out fabric to write test-cases for branch-less code that has side-effects on the server, so the code + tests would look a lot like this
Obviously this has the weakness that it won't pick up bugs in the shell commands themselves. My experience is that this is rarely the cause of serious problems, though.
Other options using mock would be to use the following idea to run the
tests locally on your machine instead of remotely
Maybe the most robust approach is to run the tests in vagrant, but that has the disadvantage of requiring lots of setup and has a tendency to make the tests slower.
I think it's important to have fast tests, because then you can run them all the time and they give you a really nice feedback-loop.
The deploy-script I've written for my current employer has ~150 test cases and runs in less than 0.5 seconds, so the deploy-script will actually do a self-test before deploying.
This ensures that it is tested on each developer machine all the time, which has picked up a good few bugs for example for cases where linux and mac osx behave differently.

Python benchmark tool like nosetests?

What I want
I would like to create a set of benchmarks for my Python project. I would like to see the performance of these benchmarks change as I introduce new code. I would like to do this in the same way that I test Python, by running the utility command like nosetests and getting a nicely formatted readout.
What I like about nosetests
The nosetests tool works by searching through my directory structure for any functions named test_foo.py and runs all functions test_bar() contained within. It runs all of those functions and prints out whether or not they raised an exception.
I'd like something similar that searched for all files bench_foo.py and ran all contained functions bench_bar() and reported their runtimes.
Questions
Does such a tool exist?
If not what are some good starting points? Is some of the nose source appropriate for this?

nosetests can run any type of test, so you can decide if they test functionality, input/output validity etc., or performance or profiling (or anything else you'd like). The Python Profiler is a great tool, and it comes with your Python installation.
import unittest
import cProfile
class ProfileTest(unittest.TestCase):
test_run_profiler:
cProfile.run('foo(bar)')
cProfile.run('baz(bar)')
You just add a line to the test, or add a test to the test case for all the calls you want to profile, and your main source is not polluted with test code.
If you only want to time execution and not all the profiling information, timeit is another useful tool.

The wheezy documentation has a good example on how to do this with nose. The important part if you just want to have the timings is to use options -q for quiet run, -s for not capturing the output (so you will see the output of the report) and -m benchmark to only run the 'timing' tests.
I recommend using py.test for testing over. To run the example from wheezy with that, change the name of the runTest method to test_bench_run and run only this benchmark with:
py.test -qs -k test_bench benchmark_hello.py
(-q and -s having the same effect as with nose and -k to select the pattern of the test names).
If you put your benchmark tests in file in a separate file or directory from normal tests they are of course more easy to select and don't need special names.

How to speed up pytest

Is there some way to speed up the repeated execution of pytest? It seems to spend a lot of time collecting tests, even if I specify which files to execute on the command line. I know it isn't a disk speed issue either since running pyflakes across all the .py files is very fast.
The various answers represent different ways pytest can be slow. They helped sometimes, did not in others. I'm adding one more answer that explains a common speed problem. But it's not possible to select "The" answer here.

Using the norecursedirs option in pytest.ini or tox.ini can save a lot of collection time, depending on what other files you have in your working directory. My collection time is roughly halved for a suite of 300 tests when I have that in place (0.34s vs 0.64s).
If you're already using tox like I am, you just need to add the following in your tox.ini:
[pytest]
norecursedirs = docs *.egg-info .git appdir .tox
You can also add it in a free-standing pytest.ini file.
The pytest documentation has more details on pytest configuration files.

I was having the same problem where I was calling pytest at the root of my project and my tests were three subdirectories down. The collection was taking 6-7 seconds before 0.4 seconds of actual test execution.
My solution initially was to call pytest with the relative path to the tests:
pytest src/www/tests/
If doing that speeds up your collection also, you can add the relative path to the tests to the end of the addopts setting in your pytest.ini - eg:
[pytest]
addopts = --doctest-glob='test_*.md' -x src/www/tests/
This dropped the collection + execution time down to about a second and I could still just call pytest as I was before.

With xdist you can parallelize pytest runs. It allows even to ship tests to remote machines. Depends on your setup it can speedup quite a bit :)

In bash, try { find -name '*_test.py'; find -name 'test_*.py'; } | xargs pytest.
For me, this brings total test time down to a fraction of a second.

For me, adding PYTHONDONTWRITEBYTECODE=1 to my environment variables achieved a massive speedup! Note that I am using network drives which might be a factor.
Windows Batch: set PYTHONDONTWRITEBYTECODE=1
Unix: export PYTHONDONTWRITEBYTECODE=1
subprocess.run: Add keyword env={'PYTHONDONTWRITEBYTECODE': '1'}
PyCharm already set this variable automatically for me.
Note that the first two options only remain active for your current terminal session.

In the special case where you are running under cygwin's python, its unix-style file handling is slow. See pytest.py test very slow startup in cygwin for how to speed things up in that special situation.

If you have some antivirus software running, try turning it off. I had this exact same problem. Collecting tests ran incredibly slow. It turned out to be my antivirus software (Avast) that was causing the problem. When I disabled the antivirus software, test collection ran about five times faster. I tested it several times, turning the antivirus on and off, so I have no doubt that was the cause in my case.
Edit: To be clear, I don't think antivirus should be turned off and left off. I just recommend turning it off temporarily to see if it is the source of the slow down. In my case, it was, so I looked for other antivirus solutions that didn't have the same issue.

Pytest imports all modules in the testpaths directories to look for tests. The import itself can be slow. This is the same startup time you'd experience if you ran those tests directly, however, since it imports all of the files it will be a lot longer. It's kind of a worst-case scenario.
This doesn't add time to the whole test run though, as it would need to import those files anyway to execute the tests.
If you narrow down the search on the command line, to specific files or directories, it will only import those ones. This can be a significant speedup while running specific tests.
Speeding up those imports involves modifying those modules. The size of the module, and the transitive imports, slow down the startup. Additionally look for any code that is executed -- code outside of a function. That also needs to be executed during the test collection phase.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.