I am facing the following project. I want to create pictures showing the cpu and memory level during the execution of a set of performance test suites. I would appreciate any suggestion.
Currently the only approach I am considering is to use the command top or the python module psutil and execute them in parallel to the tests. However I was wondering whether there exists already a better approach, may be a py.test plugin.
A nice to have would be be able to compare those parameters from one execution to another.
The tests are executed Under Linux (Ubuntu).
There are two useful tools for line-by-line timing and memory consumption for functions:
line profiler
memory profiler
Installations are easy
$ pip install line_profiler memory_profiler
To do the profiling, decorate your function with #profile, and then run
$ python -m memory_profiler example.py or $ python -m line_profiler example.py
There are a whole bunch of ways of doing this, ranging from getting broad system statistics and then averaging them (top), to using processor hardware counters (e.g. using Intel VTune).
psutil seems perfectly fine. My only comment is to make sure you take many measurements and then average them to get rid of spurious spikes and such.
Some other possible ways of taking these measurements are /proc/[pid]/stat (see man page), time, or if you get really obsessive, you can use some programatic techniques, e.g. for Windows.
Here's a good discussion about programmatically getting benchmarking values. It also discusses some of the traps you can get into, which you should be familiar with even if you are not using a programmatic method.
Intel has a lot of good information about processor benchmarking; it's their bread and butter.
The only other comment I can make is that you need to select your benchmark carefully. Intel emphasizes CPU because it is what they are best at. The same is true for other companies. In truth, there are a whole host of other important factors that come into play depending upon the application domain.
Look at the different media-based benchmarks. They may be more appropriate than one simply based upon processor time. I can't readily find the benchmarks but bing is a wonder.
Related
I have a set of scripts and utility modules that were written for a recent version of Python 3. Now suddenly, I have a need to make sure that all this code works properly under an older version of Python 3. I can't get the user to update to a more recent Python version -- that's not an option. So I need to identify all the instances where I've used some functionality that was introduced since the old version they have installed, so I can remove it or develop workarounds.
Approach #1: eyeball all the code and compare against documentation. Not ideal when there's this much code to look at.
Approach #2: create a virtual environment locally based on the old version in question using pyenv, run everything, see where it fails, and make fixes. I'm doing this anyway, because backporting to the older Python will also mean going backwards in a number of needed third-party modules from PyPi, and I'll need to make sure that the suite still functions properly. But I don't think it's a good way to identify all my version incompatibilities, because much of the code is only exercised based on particular characteristics of input data, and it'd be hard to make sure I exercise all the code (I don't yet have good unit tests that ensure every line will get executed).
Approach #3: in my virtual environment based on the older version, I used pyenv to install the pylint module, then used this pylint module to check my code. It ran; but it didn't identify issues with standard library calls. For example, I know that several of my functions call subprocess.run() with the "check_output=" Boolean argument, which didn't become available until version 3.7. I expected the 3.6 pylint run to spot this and yell at me; but it didn't. Does pylint not check standard library calls against definitions?
Anyway, this is all I've thought of so far. Any ideas gratefully appreciated. Thanks.
If you want to use pylint to check 3.6 code the most effective way is to use a 3.6 interpreter and environment and then run pylint in it. If you want to use the latest pylint version, you can use the py-version option using 3.6 but this is probably going to catch less issue because pylint will not check what you would have in python 3.6, only some known "hard coded" issue in python 3.6 (like for example f-strings for python 3.5, not missing args in subprocess.run).
As noted in the comments, the real issue is that you do not have a proper test suite, so the question is how can you get one cheaply.
Adding unit test can be time consuming. Before doing that, you can add actual end-to-end tests (which will take some computational time and longer feedback time, but that will be easier to implement), by simply running the program with the current version of python that it is working with and storing the results and then adding a test to show you reproduce the same results.
This kind of test is usually expensive to maintain (as each time you are updating the behavior, you have to update the results). However, there are a safeguard against regression, and allow you to perform some heavy refactoring on legacy code in order to move to a more testable structure.
In your case, these end-to-end test will allow you to test against several versions of python the actual application (not only parts of it).
Once you have a better test suite, you can them decide if this heavy end-to-end tests are worth keeping based on the maintenance burden of the test suite (let's not forget that the test suite should not slow you down in your development, so if it is the bottleneck, that means you should rethink your testing)
What will take time is to generate good input data to your end-to-end tests, to help you with that, you should use some coverage tool (you might even spot unreachable code thanks to that). If there are part of your code that you don't manage to reach, I would not bother at first about it, as it means it will be unlikely to be reached by your client (and if it is the case and it fails at your client, be sure to have proper logging implemented to be able to add the test case to your test suite)
I am working on translating a rather complex script from Matlab to Python and the results are fine.
However Matlab takes around 5 seconds to complete, whether Python takes over 2 minutes for the same starting conditions.
Being surprised of Python's poor performance I took a look onto CPU usage and noticed that Python does not use more than 1% while executing. Basically CPU usage is around 0.2% and barely changes whether im running the script or not. My system has 8 logical cores, so this does not appear to be a multi-core issue. Also, no other hardware is seeing any high usage (at least that is what task manager is telling me).
I am running my program with IPython 3.9.10 through Spyder 5.3.0. The Python installation is rather fresh and I did not change much except for installing a few standard modules.
I did not post any code because it will be a bit much. But essentially it is only a big hunk of "basic" operations of scalars and vectors. Also, there is a function that gets solved with scipy.optimize.minimize and nothing in the program relies on other software or inputs.
So, all in all my question is, where comes this limitation from and are there ways to "tell" Python to use more resources?
Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.
We own a corporate level forum which is developed using python (with Django framework). We are intermittently observing memory usage spikes on our production setup and we wish to track the cause.
The occurrences of these incidences are random and not directly related to the load (as per current study).
I have browsed a lot on internet and especially stackoverflow for some suggestions and was not able to get any similar situation.
Yes, I was able to locate a lot of profiler utils like Python memory profiler but these require some code level inclusion of these modules and as this happens to be in production profiler are not a great help (we plan to review our implementation in the next release).
We wish to review this issue based on occurrence.
Thus I wish to check whether there is any tool that we can use to create a dump for analysis offline (just like heapdumps in java).
Any pointers?
Is gdb the only option?
OS: Linux
Python: 2.7 (currently we do not plan to upgrade until that can help in fixing this issue)
Cheers!
AJ
maybe you can try using valgrind. It's a bit tricky, but you can follow up here if you are interested in it
How to use valgrind with python?
In the previous project I was working on, our fabfile got out of control. While the rest of our project was well-tested, we didn't write a single test for our fabfile. Refactoring was scary, and we weren't confident a fabric command would work how we expected until we ran the command.
I'm starting a new project, and I'd like to make sure our fabfile is well-tested from the beginning. Obey the Testing Goat has a great article discussing some possible strategies, yet it has more questions than answers. Using fabtest is a possibility, although it seems to be dead.
Has anyone successfully unit tested their fabfile? If so, how?
run your Fabfile task in a Docker instance
use docker diff to verify that the right files were changed by
the Fabfile.
This is still quite a bit of work, but it allows testing without excessive Fabfile modifications.
Have you tried python-vagrant? It seems to do the same thing that fabtest does, but it includes some Fabric demos and is still used and maintained.
The slides - mentioned by Henrik Andersson - from back then are available here
Robin Kåveland Hansen replied to me:
There are some examples of the types of refactoring that we did in order to keep our fabric code well-tested there.
In general I would say the best advice is to try avoiding low-level code such as shell commands in higher level code that makes decisions about what code to run, eg. isolate effect-full code from code that makes decisions.
Branching increases the amount of test-cases that you need and it's a lot more effort to write good test-cases for code that changes state on some server.
At the time, we used mock to mock out fabric to write test-cases for branch-less code that has side-effects on the server, so the code + tests would look a lot like this
Obviously this has the weakness that it won't pick up bugs in the shell commands themselves. My experience is that this is rarely the cause of serious problems, though.
Other options using mock would be to use the following idea to run the
tests locally on your machine instead of remotely
Maybe the most robust approach is to run the tests in vagrant, but that has the disadvantage of requiring lots of setup and has a tendency to make the tests slower.
I think it's important to have fast tests, because then you can run them all the time and they give you a really nice feedback-loop.
The deploy-script I've written for my current employer has ~150 test cases and runs in less than 0.5 seconds, so the deploy-script will actually do a self-test before deploying.
This ensures that it is tested on each developer machine all the time, which has picked up a good few bugs for example for cases where linux and mac osx behave differently.