Is there a tool that is able to measure the frequency of function calls in a project and counts other aspects (for statistics purposes) of Python code?
thank you
The Python profiler should provide you with quite a bit of information:
python -m cProfile -o fooprof myscript.py
You load the results using a simple script:
#!/usr/bin/env python
import pstats
p = pstats.Stats('fooprof')
p.strip_dirs().sort_stats(-1).print_stats()
Alternatively, if you do not specify -o fooprof the results are printed to stdout.
See the documentation at http://docs.python.org/library/profile.html
I'm not sure what "other aspects" you are wanting to count, but this will determine how many times the function is called. If you are interested in the "frequency" of calls, then you can find the average frequency as a function of total execution time and the number of times the function was called. For example:
Suppose foo() was called 100 times in 10 seconds. The average frequency of calls is then 10/second.
I guess you want to do static code analysis. How many locations in your code
call a function.
This is very hard to do in dynamic languages like python, because there are so many
ways functions may be called otherwise than by proper name, and even the python bytecode compiler
won't know always which function is going to be called in a place, and it may even change
during execution. And there's standard OO polymorphism too.
Consider:
def doublefx(f, x):
return f(x)*2
print doublefx(math.sqrt, 9) # 6
f = stdin.readline()
print doublefx(getattr(math, f), 9) # ?
No way any static analyses tool will find out which functions in math.* are going to
be called by this code. Even the first example would be very hard to reason about,
the second is impossible.
The following tool does some static analysis regarding complexity.
http://sourceforge.net/projects/pymetrics/
Other analysis tools like PyLint and PyChecker rather focus on style and probable
errors.
I've never used it, but it looks like cProfile might be a good place to start. (Note that of the three profilers mentioned on that page, one (hotshot) is experimental and not well-supported, one (profile) "adds significant overhead to profiled programs", and the last is cProfile, so probably go with that.) Links to the source code for profile.py and pstats.py are provided at the top of the page.
Related
So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)
I am trying to profile our django unittests (if the tests are faster, we'll run 'em more often). I've ran it through python's built in cProfile profiler, producing a pstats file.
However the signal to noise ration is bad. There are too many functions listed. Lots and lots of django internal functions are called when I make one database query. This makes it hard to see what's going on.
Is there anyway I can "roll up" all function calls that are outside a certain directory?
e.g. if I call a python function outside my directory, and it then calls 5 other functions (all outside my directory), then it should roll all those up, so it looks like there was only one function call, and it should show the cumulative time for the whole thing.
This, obviously, is bad if you want to profile (say) Django, but I don't want to do that.
I looked at the pstats.Stats object, but can't see an obvious way to modify this data.
I have little experience with python, but a lot in performance tuning, so here's a possibility:
Rather than run profiling as part of the unit tests, just do overall execution time measurements. If those don't change much, you're OK.
When you do detect a change, or if you just want to make things run faster, use this method. It has a much higher "signal-to-noise ratio", as you put it.
The difference is you're not hoping the profiler can figure out what you need to be looking at. It's more like a break point in a debugger, where the break occurs not at a place of your choosing, but with good probability at a time of unnecessary slowness.
If on two or more occasions you see it doing something that might be replaced with something better, on average it will pay off, and fixing it will make other problems easier to find by the same method.
I have an application that consists of multiple python scripts. Some of these scripts are calling C code. The application is now running much slower than it was, so I would like to profile it to see where the problem lies. Is there a tool, software package or just a way to profile such an application? A tool that will follow the python code into the C code and profile these calls as well?
Note 1: I am well aware of the standard Python profiling tools. I'm specifically looking here for combined Python/C profiling.
Note 2: the Python modules are calling C code using ctypes (see http://docs.python.org/library/ctypes.html for details).
Thanks!
Stackshots work. Since you have combined Python and C you can handle them separately. For Python, you can hit Ctrl-C while it's being slow to examine the stack. Do this several times. That will expose anything you can fix in the python code. For the C code, run the whole thing under a debugger like GDB and hit Ctrl-C to get a stack trace in C. Several of those will expose anything you can fix in the C code. I'm told OProfile can also do this. (Another way is to use lsstack if it is available.)
This is a little-known method that works on this principle: Suppose you have an infinite loop or a nearly infinite loop. How would you find it? You would halt the program and see what it was doing, right? Suppose the program only took twice as long as necessary. Each time you halted it, the chance that you would catch it doing the unnecessary thing is 50%. So all you have to do is halt it a number of times. As soon as you see it doing something that could be improved, on as few as 2 samples, you know you can fix that for a healthy speedup. Then you can repeat it to get the next problem. Measuring is not the point. Catching things you can improve is the point.
The combination would be pretty hard, but you can use some of the standard profilers like valgrind, gprof or even oprofile (although I never managed to get meaningful output out of it).
My project targets a low-cost and low-resource embedded device. I am dependent on a relatively large and sprawling Python code base, of which my use of its APIs is quite specific.
I am keen to prune the code of this library back to its bare minimum, by executing my test suite within a coverage tools like Ned Batchelder's coverage or figleaf, then scripting removal of unused code within the various modules/files. This will help not only with understanding the libraries' internals, but also make writing any patches easier. Ned actually refers to the use of coverage tools to "reverse engineer" complex code in one of his online talks.
My question to the SO community is whether people have experience of using coverage tools in this way that they wouldn't mind sharing? What are the pitfalls if any? Is the coverage tool a good choice? Or would I be better off investing my time with figleaf?
The end-game is to be able to automatically generate a new source tree for the library, based on the original tree, but only including the code actually used when I run nosetests.
If anyone has developed a tool that does a similar job for their Python applications and libraries, it would be terrific to get a baseline from which to start development.
Hopefully my description makes sense to readers...
What you want isn't "test coverage", it is the transitive closure of "can call" from the root of the computation. (In threaded applications, you have to include "can fork").
You want to designate some small set (perhaps only 1) of functions that make up the entry points of your application, and want to trace through all possible callees (conditional or unconditional) of that small set. This is the set of functions you must have.
Python makes this very hard in general (IIRC, I'm not a deep Python expert) because of dynamic dispatch and especially due to "eval". Reasoning about what function can get called can be pretty tricky for a static analyzers applied to highly dynamic languages.
One might use test coverage as a way to seed the "can call" relation with specific "did call" facts; that could catch a lot of dynamic dispatches (dependent on your test suite coverage). Then the result you want is the transitive closure of "can or did" call. This can still be erroneous, but is likely to be less so.
Once you get a set of "necessary" functions, the next problem will be removing the unnecessary functions from the source files you have. If the number of files you start with is large, the manual effort to remove the dead stuff may be pretty high. Worse, you're likely to revise your application, and then the answer as to what to keep changes. So for every change (release), you need to reliably recompute this answer.
My company builds a tool that does this analysis for Java packages (with appropriate caveats regarding dynamic loads and reflection): the input is a set of Java files and (as above) a designated set of root functions. The tool computes the call graph, and also finds all dead member variables and produces two outputs: a) the list of purportedly dead methods and members, and b) a revised set of files with all the "dead" stuff removed. If you believe a), then you use b). If you think a) is wrong, then you add elements listed in a) to the set of roots and repeat the analysis until you think a) is right. To do this, you need a static analysis tool that parse Java, compute the call graph, and then revise the code modules to remove the dead entries. The basic idea applies to any language.
You'd need a similar tool for Python, I'd expect.
Maybe you can stick to just dropping files that are completely unused, although that may still be a lot of work.
As others have pointed out, coverage can tell you what code has been executed. The trick for you is to be sure that your test suite truly exercises the code fully. The failure case here is over-pruning because your tests skipped some code that will really be needed in production.
Be sure to get the latest version of coverage.py (v3.4): it adds a new feature to indicate files that are never executed at all.
BTW:: for a first cut prune, Python provides a neat trick: remove all the .pyc files in your source tree, then run your tests. Files that still have no .pyc file were clearly not executed!
I haven't used coverage for pruning out, but it seems like it should do well. I've used the combination of nosetests + coverage, and it worked better for me than figleaf. In particular, I found the html report from nosetests+coverage to be helpful -- this should be helpful to you in understanding where the unused portions of the library are.
So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)