I'm having issues interpreting cProfile data. To show you my problem I created this simple script.
The function D calls B and C, which both call A.
Function A clearly takes up 1 sec (+overhead).
If we look at the snakeviz results then you can see that the reporting is a bit weird. I understand that in total 2 sec has been spent by function A, but inside function C, function A has spent only 1 sec, and that's what I am interested in. Does anybody know if there is a setting (or a different viewer) where I do not have this issue?
import time
import cProfile
def A():
time.sleep(1)
def B():
A()
def C():
A()
def D():
B()
C()
cProfile.run('D()','profileResults.prf')
snakeviz results
Unfortunately, the Python profile does not store the entire call tree. (This would be too expensive.) I've documented the problem here and as a snakeviz issue.
I recently created tuna for visualizing Python profiles to work around some of these problems. tuna cannot show the entire call tree, but at least it doesn't show wrong info either.
Install with
pip3 install tuna
Create a runtime profile
python -m cProfile -o program.prof yourfile.py
and just run tuna on the file
tuna program.prof
Related
I'm trying to access some Fortran subroutines using F2PY, but I've ran into the following problem during consecutive calls from IPython. Take this minimal Fortran code (hope that I didn't code anything stupid; my Fortran is a bit rusty..):
! test.f90
module mod
integer i
contains
subroutine foo
i = i+1
print*,i
end subroutine foo
end module mod
If I compile this using F2PY (f2py3.5 -c -m test test.f90), import it in Python and call it twice:
# run.py
import test
test.mod.foo()
test.mod.foo()
The resulting output is:
$ python run.py
1
2
So on every call of foo(), i is incremented, which is supposed to happen. But between different calls of run.py (either from the command line or IPython interpreter), everything should be "reset", i.e. the printed counter should start from 1 for every call. This happens when calling run.py from the command line, but if I call the script multiple times from IPython, i keeps increasing:
In [1]: run run.py
1
2
In [2]: run run.py
3
4
I know that there are lots of posts showing how to reload imports (using autoreload in IPython, importlib.reload(), ...), but none of them seem to work for this example. Is there a way to force a clean reload/import?
Some side notes: (1) The Fortran code that I'm trying to access is quite large, old and messy, so I'd prefer not to change anything in there; (2) I could easily do test.mod.i = something in between calls, but the real Fortran code is too complex for such solutions; (3) I'd really prefer a solution which I can put in the Python code over e.g. settings (autoreload, ..) which I have to manually put in the IPython interpreter (forget it once and ...)
If you can slightly change your fortran code you may be able to reset without re-import (probably faster too).
The change is about introducing i as a common and resetting it from outside. Your changed fortran code will look this
! test.f90
module mod
common /set1/ i
contains
subroutine foo
common /set1/ i
i = i+1
print*,i
end subroutine foo
end module mod
reset the variable i from python as below:
import test
test.mod.foo()
test.mod.foo()
test.set1.i = 0 #reset here
test.mod.foo()
This should produce the result as follows:
python run.py
1
2
1
I have a program in Python that takes in several command line arguments and uses them in several functions. How can I use cProfile (within my code)to obtain the running time of each function? (I still want the program to run normally when it's done). Yet I can't figure out how, for example I cannot use
cProfile.run('loadBMPImage(sys.argv[1])')
to test the run time of the function loadBMPImage. I cannot use sys.argv[1] as an argument. Any idea how I can use cProfile to test the running time of each function and print to stdout, if each function depends on command line arguments? Also the cProfile must be integrated into the code itself. Thanks
There are several ways.
import cProfile
import pstats
import sys
def function(n):
a = 1
for i in range(n):
a += 1
return a
First one is to use a simple wrapper runctx() that allows you to specify globals and locals for the executed string. In the example below I use globals() to pass the function object, and locals to pass the parameter, but it can be arranged differently, of course.
def profile1():
cProfile.runctx("function(n)", globals(), dict(n=int(sys.argv[1])), filename='test')
return pstats.Stats('test')
A better way where you don't need to mess with exec is to use the Profile class. This way you can just profile a piece of regular code:
def profile2():
pr = cProfile.Profile()
pr.enable()
function(int(sys.argv[1]))
pr.disable()
return pstats.Stats(pr)
Just for completeness' sake to make the example runnable
if __name__ == '__main__':
profile1().print_stats()
profile2().print_stats()
I run the
python program with -m cProfile
example:
python -m cProfile <myprogram.py>
This will require zero changes to the myprogram.py
my 2 pence.
when running python3 -m cProfile yourprogram.py,
the cwd and sys.argv[0] seems to get changed (i
didn't check thoroughly), thus hurting the
implicit context of yourprogram, especially if
it's being usually run as executable.
thus, i'd rather recommend wrapping your original
code in a function, and run it by cProfile.run(),
even though your code changes a little.
def yourfunction():
import sys
print (sys.argv)
import cProfile
cProfile.run("yourfunction()")
good luck!
This question already has answers here:
Easiest way to calculate execution time of a python script?
(4 answers)
Closed 9 years ago.
What is the best (precise) way to measure time execution of function, for example:
def some_function():
# ...
I would prefer call this function 1000 times and then count average time, like this:
start = time.time()
for i in range(1000):
some_function()
elapsed = (time.time() - start)/1000
but maybe is there better way?
You should use timeit module I think
import timeit
t = timeit.Timer('some_function(*args)', # code to run
'from __main__ import some_function, args') # initial code
#run before time measurement
t.timeit(100) # times to run
I agree that timeit is the de facto 'module' to run native timings on Python source. However, if you are interested in doing some heavy-lifting in terms of profiling, you might find something like runsnakerun useful (a visualization tool for the profiler native to Python).
Run Snake Run Website
Python profiler
A snippet from runsnakerun (which really just uses data from pythons profiler):
sortable data-grid views for raw profile information
identity: function name, file-name, directory name
time-spent: cummulative, cummulative-per, local and local-per time
overall data-grid view
(all) callers-of-this-function, (all) callees-of-this-function views
Just for added yes I know'isms... you asked for something simple, and this is WAY over the top. But, I thought I'd share another possible solution in case you require additional information down the road. And if you don't find it useful, maybe someone else will!
To get an output profile file that will run in runsnakerun, run something like:
$ python -m cProfile -o <outputfilename> <script-name> <options>
Alternatively, if you developing in *nix you can use time, but now you have added overhead and potentially lose some precision that the Python module timeit might offer.
Different needs require different solutions - just adding to your bag-o-tricks.
HTH
I used to use a nice Apple profiler that is built into the System Monitor application. As long as your C++ code was compiled with debug information, you could sample your running application and it would print out an indented tree telling you what percent of the parent function's time was spent in this function (and the body vs. other function calls).
For instance, if main called function_1 and function_2, function_2 calls function_3, and then main calls function_3:
main (100%, 1% in function body):
function_1 (9%, 9% in function body):
function_2 (90%, 85% in function body):
function_3 (100%, 100% in function body)
function_3 (1%, 1% in function body)
I would see this and think, "Something is taking a long time in the code in the body of function_2. If I want my program to be faster, that's where I should start."
How can I most easily get this exact profiling output for a Python program?
I've seen people say to do this:
import cProfile, pstats
prof = cProfile.Profile()
prof = prof.runctx("real_main(argv)", globals(), locals())
stats = pstats.Stats(prof)
stats.sort_stats("time") # Or cumulative
stats.print_stats(80) # 80 = how many to print
But it's quite messy compared to that elegant call tree. Please let me know if you can easily do this, it would help quite a bit.
I just stumbled on this as well, and spent some time learning how to generate a call graph (the normal results of cProfile is not terribly informative). Future reference, here's another way to generate a beautiful call-tree graphic with cProfile + gprof2dot + graphViz.
———————
Install GraphViz: http://www.graphviz.org/Download_macos.php
easy_install gprof2dot
Run profile on the code.
python -m cProfile -o myLog.profile <myScript.py> arg1 arg2 ...
Run gprof2dot to convert the call profile into a dot file
gprof2dot -f pstats myLog.profile -o callingGraph.dot
Open with graphViz to visualize the graph
Here's what the end result would look like!
Graph is color-coded- red means higher concentration of time.
I recently wanted the same thing, so I took a stab at implementing one myself.
The project's on GitHub, https://github.com/joerick/pyinstrument
Here's how you would use it:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# Code you want to profile
profiler.stop()
print(profiler.output_text())
The gprof2dot approach extracts all information nicely, so I'm a fan. However, sometimes I want to look at timing data in a call tree, so I created tuna.
Install with
pip install tuna
and display your profile with
tuna program.prof
Check out this library http://pycallgraph.slowchop.com/ for call graphs. It works really well. If you want to profile specific functions, check out http://mg.pov.lt/blog/profiling.html
This is a result from the profilehooks module.
I'm profiling in Python using cProfile. I found a function that takes a lot of CPU time. How do I find out which function is calling this heavy function the most?
EDIT:
I'll settle for a workaround: Can I write a Python line inside that heavy function that will print the name of the function that called it?
I almost always view the output of the cProfile module using Gprof2dot, basically it converts the output into a graphvis graph (a .dot file), for example:
It makes it very easy to determine which function is slowest, and which function[s] called it.
Usage is:
python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
That may not answer your question directly, but will definitely help. If use the profiler with option --sort cumulative it will sort the functions by cumulative time. Which is helpful to detect not only heavy functions but the functions that call them.
python -m cProfile --sort cumulative myScript.py
There is a workaround to get the caller function:
import inspect
print inspect.getframeinfo(inspect.currentframe().f_back)[2]
You can add as many f_back as you want in case you want the caller caller etc
If you want to calculate frequent calls you can do this:
record = {}
caller = inspect.getframeinfo(inspect.currentframe().f_back)[2]
record[caller] = record.get(caller, 0) + 1
Then print them by order of frequency:
print sorted(record.items(), key=lambda a: a[1])
inspect.stack() will give you the current caller stack.
You might want to take a look at pycallgraph.
It is possible to do it using profiler cProfile in standard library.
In pstats.Stats (the profiler result) there is method print_callees (or alternatively print_callers).
Example code:
import cProfile, pstats
pr = cProfile.Profile()
pr.enable()
# ... do something ...
pr.disable()
ps = pstats.Stats(pr).strip_dirs().sort_stats('cumulative')
ps.print_callees()
Result will be something like:
Function called...
ncalls tottime cumtime
ElementTree.py:1517(_start_list) -> 24093 0.048 0.124 ElementTree.py:1399(start)
46429 0.015 0.041 ElementTree.py:1490(_fixtext)
70522 0.015 0.015 ElementTree.py:1497(_fixname)
ElementTree.py:1527(_data) -> 47827 0.017 0.026 ElementTree.py:1388(data)
47827 0.018 0.053 ElementTree.py:1490(_fixtext)
On the left you have the caller, on the right you have the callee.
(for example _fixtext was called from _data 47827 times and from _start_list 46429 times)
See also:
docs.python.org/..#print_callees - show call hierarchy. Group by the caller. (used above)
docs.python.org/..#print_callers - show call hierarchy. Group by the callee.
Couple of notes:
Your code needs to be edited for this (insert those profile statements).
(i.e. not possible to use from command line like python -m cProfile myscript.py. Though it is possible to write separate script for that)
A bit unrelated, but strip_dirs() must go before sort_stats() (otherwise sorting does not work)
I have not used cProfile myself, but most profilers give you a call hierarchy.
Googling I found this slides about cProfile. Maybe that helps. Page 6 looks like cProfile does provide a hierarchy.
Sorry I'm not familiar with Python, but there's a general method that works, assuming you can manually interrupt execution at a random time.
Just do so, and display the call stack. It will tell you, with high probability, what you want to know. If you want to be more certain, just do it several times.
It works because the guilty caller has to be on the call stack for the fraction of time that's being wasted, which exposes it to your interrupts for that much of the time, whether it is spread over many short calls or a few lengthy ones.
NOTE: This process is more like diagnosis than measurement. Suppose that bad call is wasting 90% of the time. Then each time you halt it, the probability is 90% that the bad call statement is right there on the call stack for you to see, and you will be able to see that it's bad. However, if you want to exactly measure the wastage, that's a different problem. For that, you will need a lot more samples, to see what % of them contain that call. Or alternatively, just fix the guilty call, clock the speedup, and that will tell you exactly what the wastage was.
Pycscope does this. I just found it today, so I can't speak to how good it is, but the few examples I've tried have been pretty good (though not perfect).
https://pypi.python.org/pypi/pycscope/
You would use this to generate a cscope file and then a cscope plugin from an editor, VIM specifically. I tried using it with vanilla cscope, it seems that plain cscope gets confused.