I'm profiling in Python using cProfile. I found a function that takes a lot of CPU time. How do I find out which function is calling this heavy function the most?
EDIT:
I'll settle for a workaround: Can I write a Python line inside that heavy function that will print the name of the function that called it?
I almost always view the output of the cProfile module using Gprof2dot, basically it converts the output into a graphvis graph (a .dot file), for example:
It makes it very easy to determine which function is slowest, and which function[s] called it.
Usage is:
python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
That may not answer your question directly, but will definitely help. If use the profiler with option --sort cumulative it will sort the functions by cumulative time. Which is helpful to detect not only heavy functions but the functions that call them.
python -m cProfile --sort cumulative myScript.py
There is a workaround to get the caller function:
import inspect
print inspect.getframeinfo(inspect.currentframe().f_back)[2]
You can add as many f_back as you want in case you want the caller caller etc
If you want to calculate frequent calls you can do this:
record = {}
caller = inspect.getframeinfo(inspect.currentframe().f_back)[2]
record[caller] = record.get(caller, 0) + 1
Then print them by order of frequency:
print sorted(record.items(), key=lambda a: a[1])
inspect.stack() will give you the current caller stack.
You might want to take a look at pycallgraph.
It is possible to do it using profiler cProfile in standard library.
In pstats.Stats (the profiler result) there is method print_callees (or alternatively print_callers).
Example code:
import cProfile, pstats
pr = cProfile.Profile()
pr.enable()
# ... do something ...
pr.disable()
ps = pstats.Stats(pr).strip_dirs().sort_stats('cumulative')
ps.print_callees()
Result will be something like:
Function called...
ncalls tottime cumtime
ElementTree.py:1517(_start_list) -> 24093 0.048 0.124 ElementTree.py:1399(start)
46429 0.015 0.041 ElementTree.py:1490(_fixtext)
70522 0.015 0.015 ElementTree.py:1497(_fixname)
ElementTree.py:1527(_data) -> 47827 0.017 0.026 ElementTree.py:1388(data)
47827 0.018 0.053 ElementTree.py:1490(_fixtext)
On the left you have the caller, on the right you have the callee.
(for example _fixtext was called from _data 47827 times and from _start_list 46429 times)
See also:
docs.python.org/..#print_callees - show call hierarchy. Group by the caller. (used above)
docs.python.org/..#print_callers - show call hierarchy. Group by the callee.
Couple of notes:
Your code needs to be edited for this (insert those profile statements).
(i.e. not possible to use from command line like python -m cProfile myscript.py. Though it is possible to write separate script for that)
A bit unrelated, but strip_dirs() must go before sort_stats() (otherwise sorting does not work)
I have not used cProfile myself, but most profilers give you a call hierarchy.
Googling I found this slides about cProfile. Maybe that helps. Page 6 looks like cProfile does provide a hierarchy.
Sorry I'm not familiar with Python, but there's a general method that works, assuming you can manually interrupt execution at a random time.
Just do so, and display the call stack. It will tell you, with high probability, what you want to know. If you want to be more certain, just do it several times.
It works because the guilty caller has to be on the call stack for the fraction of time that's being wasted, which exposes it to your interrupts for that much of the time, whether it is spread over many short calls or a few lengthy ones.
NOTE: This process is more like diagnosis than measurement. Suppose that bad call is wasting 90% of the time. Then each time you halt it, the probability is 90% that the bad call statement is right there on the call stack for you to see, and you will be able to see that it's bad. However, if you want to exactly measure the wastage, that's a different problem. For that, you will need a lot more samples, to see what % of them contain that call. Or alternatively, just fix the guilty call, clock the speedup, and that will tell you exactly what the wastage was.
Pycscope does this. I just found it today, so I can't speak to how good it is, but the few examples I've tried have been pretty good (though not perfect).
https://pypi.python.org/pypi/pycscope/
You would use this to generate a cscope file and then a cscope plugin from an editor, VIM specifically. I tried using it with vanilla cscope, it seems that plain cscope gets confused.
Related
While using profiler to look for the location where most of the execution time is spent in my python code, i found that it is from a package used in the code. So a function in a package is called 100s of times with different input arguments. In total this function takes the maximum time to execute.
So I want to implement some caching, so that if same parameters are passed, I can use the already extracted output from cache. So first I want to check if same parameters are being passed multiple times at all.
Is there any way I can enable some python level configuration, so that I can get arguments passed to the function on each iteration?
As I am not allowed to make any changes to this package Package1. So enabling something outside (like enabling debug mode) the pakage only may help.
Package1
module1
def function1()
for i in range(10000):
###Want to get arguments passed
###for each iteration of below function to a logfile
retvalue = function2(ar1,arg2,arg3)
My Code
package1.module1.function1()
You can use the cache from python to cache the values.
from functools import cache
#cache
def function2(*args):
return function1(*args) # function1 imported from module you can't change
Instead of logging the input args, you can use profiler to see if the runtime has improved. If it has, you can be sure that some calls are duplicate.
This question already has answers here:
Easiest way to calculate execution time of a python script?
(4 answers)
Closed 9 years ago.
What is the best (precise) way to measure time execution of function, for example:
def some_function():
# ...
I would prefer call this function 1000 times and then count average time, like this:
start = time.time()
for i in range(1000):
some_function()
elapsed = (time.time() - start)/1000
but maybe is there better way?
You should use timeit module I think
import timeit
t = timeit.Timer('some_function(*args)', # code to run
'from __main__ import some_function, args') # initial code
#run before time measurement
t.timeit(100) # times to run
I agree that timeit is the de facto 'module' to run native timings on Python source. However, if you are interested in doing some heavy-lifting in terms of profiling, you might find something like runsnakerun useful (a visualization tool for the profiler native to Python).
Run Snake Run Website
Python profiler
A snippet from runsnakerun (which really just uses data from pythons profiler):
sortable data-grid views for raw profile information
identity: function name, file-name, directory name
time-spent: cummulative, cummulative-per, local and local-per time
overall data-grid view
(all) callers-of-this-function, (all) callees-of-this-function views
Just for added yes I know'isms... you asked for something simple, and this is WAY over the top. But, I thought I'd share another possible solution in case you require additional information down the road. And if you don't find it useful, maybe someone else will!
To get an output profile file that will run in runsnakerun, run something like:
$ python -m cProfile -o <outputfilename> <script-name> <options>
Alternatively, if you developing in *nix you can use time, but now you have added overhead and potentially lose some precision that the Python module timeit might offer.
Different needs require different solutions - just adding to your bag-o-tricks.
HTH
I'm looking at several cases where it would be far, far, far easier to accept nearly-raw code. So,
What's the worst you can do with an expression if you can't lambda, and how?
What's the worst you can do with executed code if you can't use import and how?
(can't use X == string is scanned for X)
Also, B is unecessary if someone can think of such an expr that given d = {key:value,...}:
expr.format(key) == d[key]
Without changing the way the format looks.
The worst you can do with an expression is on the order of
__import__('os').system('rm -rf /')
if the server process is running as root. Otherwise, you can fill up memory and crash the process with
2**2**1024
or bring the server to a grinding halt by executing a shell fork bomb:
__import__('os').system(':(){ :|:& };:')
or execute a temporary (but destructive enough) fork bomb in Python itself:
[__import__('os').fork() for i in xrange(2**64) for x in range(i)]
Scanning for __import__ won't help, since there's an infinite number of ways to get to it, including
eval(''.join(['__', 'im', 'po', 'rt', '__']))
getattr(__builtins__, '__imp' + 'ort__')
getattr(globals()['__built' 'ins__'], '__imp' + 'ort__')
Note that the eval and exec functions can also be used to create any of the above in an indirect way. If you want safe expression evaluation on a server, use ast.literal_eval.
Arbitrary Python code?
Opening, reading, writing, creating files on the partition. Including filling up all the disk space.
Infinite loops that put load on the CPU.
Allocating all the memory.
Doing things that are in pure Python modules without importing them by copy/pasting their code into the expression (messing with built in Python internals and probably finding a way to access files, execute them or import modules).
...
No amount of whitelisting or blacklisting is going to keep people from getting to dangerous parts of Python. You mention running in a sandbox where "open" is not defined, for example. But I can do this to get it:
real_open = getattr(os, "open")
and if you say I won't have os, then I can do:
real_open = getattr(sys.modules['os'], "open")
or
real_open = random.__builtins__['open']
etc, etc, etc. Everything is connected, and the real power is in there somewhere. Bad guys will find it.
I used to use a nice Apple profiler that is built into the System Monitor application. As long as your C++ code was compiled with debug information, you could sample your running application and it would print out an indented tree telling you what percent of the parent function's time was spent in this function (and the body vs. other function calls).
For instance, if main called function_1 and function_2, function_2 calls function_3, and then main calls function_3:
main (100%, 1% in function body):
function_1 (9%, 9% in function body):
function_2 (90%, 85% in function body):
function_3 (100%, 100% in function body)
function_3 (1%, 1% in function body)
I would see this and think, "Something is taking a long time in the code in the body of function_2. If I want my program to be faster, that's where I should start."
How can I most easily get this exact profiling output for a Python program?
I've seen people say to do this:
import cProfile, pstats
prof = cProfile.Profile()
prof = prof.runctx("real_main(argv)", globals(), locals())
stats = pstats.Stats(prof)
stats.sort_stats("time") # Or cumulative
stats.print_stats(80) # 80 = how many to print
But it's quite messy compared to that elegant call tree. Please let me know if you can easily do this, it would help quite a bit.
I just stumbled on this as well, and spent some time learning how to generate a call graph (the normal results of cProfile is not terribly informative). Future reference, here's another way to generate a beautiful call-tree graphic with cProfile + gprof2dot + graphViz.
———————
Install GraphViz: http://www.graphviz.org/Download_macos.php
easy_install gprof2dot
Run profile on the code.
python -m cProfile -o myLog.profile <myScript.py> arg1 arg2 ...
Run gprof2dot to convert the call profile into a dot file
gprof2dot -f pstats myLog.profile -o callingGraph.dot
Open with graphViz to visualize the graph
Here's what the end result would look like!
Graph is color-coded- red means higher concentration of time.
I recently wanted the same thing, so I took a stab at implementing one myself.
The project's on GitHub, https://github.com/joerick/pyinstrument
Here's how you would use it:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# Code you want to profile
profiler.stop()
print(profiler.output_text())
The gprof2dot approach extracts all information nicely, so I'm a fan. However, sometimes I want to look at timing data in a call tree, so I created tuna.
Install with
pip install tuna
and display your profile with
tuna program.prof
Check out this library http://pycallgraph.slowchop.com/ for call graphs. It works really well. If you want to profile specific functions, check out http://mg.pov.lt/blog/profiling.html
This is a result from the profilehooks module.
I find myself adding debugging "print" statements quite often -- stuff like this:
print("a_variable_name: %s" % a_variable_name)
How do you all do that? Am I being neurotic in trying to find a way to optimize this? I may be working on a function and put in a half-dozen or so of those lines, figure out why it's not working, and then cut them out again.
Have you developed an efficient way of doing that?
I'm coding Python in Emacs.
Sometimes a debugger is great, but sometimes using print statements is quicker, and easier to setup and use repeatedly.
This may only be suitable for debugging with CPython (since not all Pythons implement inspect.currentframe and inspect.getouterframes), but I find this useful for cutting down on typing:
In utils_debug.py:
import inspect
def pv(name):
record=inspect.getouterframes(inspect.currentframe())[1]
frame=record[0]
val=eval(name,frame.f_globals,frame.f_locals)
print('{0}: {1}'.format(name, val))
Then in your script.py:
from utils_debug import pv
With this setup, you can replace
print("a_variable_name: %s' % a_variable_name)
with
pv('a_variable_name')
Note that the argument to pv should be the string (variable name, or expression), not the value itself.
To remove these lines using Emacs, you could
C-x ( # start keyboard macro
C-s pv('
C-a
C-k # change this to M-; if you just want to comment out the pv call
C-x ) # end keyboard macro
Then you can call the macro once with C-x e
or a thousand times with C-u 1000 C-x e
Of course, you have to be careful that you do indeed want to remove all lines containing pv(' .
Don't do that. Use a decent debugger instead. The easiest way to do that is to use IPython and either to wait for an exception (the debugger will set off automatically), or to provoke one by running an illegal statement (e.g. 1/0) at the part of the code that you wish to inspect.
I came up with this:
Python string interpolation implementation
I'm just testing it and its proving handy for me while debugging.