Measure time execution of function [duplicate] - python

This question already has answers here:
Easiest way to calculate execution time of a python script?
(4 answers)
Closed 9 years ago.
What is the best (precise) way to measure time execution of function, for example:
def some_function():
# ...
I would prefer call this function 1000 times and then count average time, like this:
start = time.time()
for i in range(1000):
some_function()
elapsed = (time.time() - start)/1000
but maybe is there better way?

You should use timeit module I think
import timeit
t = timeit.Timer('some_function(*args)', # code to run
'from __main__ import some_function, args') # initial code
#run before time measurement
t.timeit(100) # times to run

I agree that timeit is the de facto 'module' to run native timings on Python source. However, if you are interested in doing some heavy-lifting in terms of profiling, you might find something like runsnakerun useful (a visualization tool for the profiler native to Python).
Run Snake Run Website
Python profiler
A snippet from runsnakerun (which really just uses data from pythons profiler):
sortable data-grid views for raw profile information
identity: function name, file-name, directory name
time-spent: cummulative, cummulative-per, local and local-per time
overall data-grid view
(all) callers-of-this-function, (all) callees-of-this-function views
Just for added yes I know'isms... you asked for something simple, and this is WAY over the top. But, I thought I'd share another possible solution in case you require additional information down the road. And if you don't find it useful, maybe someone else will!
To get an output profile file that will run in runsnakerun, run something like:
$ python -m cProfile -o <outputfilename> <script-name> <options>
Alternatively, if you developing in *nix you can use time, but now you have added overhead and potentially lose some precision that the Python module timeit might offer.
Different needs require different solutions - just adding to your bag-o-tricks.
HTH

Related

What is the correct way (if any) to use Python 2 and 3 libraries in the same program?

I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?
The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.
I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2

How to Consume an mpi4py application from a serial python script

I tried to make a library based on mpi4py, but I want to use it in serial python code.
$ python serial_source.py
but inside serial_source.py exists some function called parallel_bar
from foo import parallel_bar
# Can I to make this with mpi4py like a common python source code?
result = parallel_bar(num_proc = 5)
The motivation for this question is about finding the right way to use mpi4py to optimize programs in python which were not necessarily designed to be run completely in parallel.
This is indeed possible and is in the documentation of mpi4py in the section Dynamic Process Management. What you need is the so called Spawn functionality which is not available with MSMPI (in case you are working with Windows) see also Spawn not implemented in MSMPI.
Example
The first file provides a kind of wrapper to your function to hide all the MPI stuff, which I guess is your intention. Internally it calls the "actual" script containing your parallel code in 4 newly spawned processes.
Finally, you can open a python terminal and call:
from my_prog import parallel_fun
parallel_fun()
# Hi from 0/4
# Hi from 3/4
# Hi from 1/4
# Hi from 2/4
# We got the magic number 6
my_prog.py
import sys
import numpy as np
from mpi4py import MPI
def parallel_fun():
comm = MPI.COMM_SELF.Spawn(
sys.executable,
args = ['child.py'],
maxprocs=4)
N = np.array(0, dtype='i')
comm.Reduce(None, [N, MPI.INT], op=MPI.SUM, root=MPI.ROOT)
print(f'We got the magic number {N}')
Here the child file with the parallel code:
child.py
from mpi4py import MPI
import numpy as np
comm = MPI.Comm.Get_parent()
print(f'Hi from {comm.Get_rank()}/{comm.Get_size()}')
N = np.array(comm.Get_rank(), dtype='i')
comm.Reduce([N, MPI.INT], None, op=MPI.SUM, root=0)
Unfortunately I don't think this is possible as you have to run the MPI code specifically with mpirun.
The best you can do is the opposite where you write generic chunks of code which can be called either by an MPI process or a normal python process.
The only other solution is to wrapper the whole MPI part of your code into an external call and call it with subprocess in your non MPI code, however this will be tied to your system configuration quite heavily, and is not really that portable.
Subprocess is detailed in this thread Using python with subprocess Popen, and is worth a look, the complexity here is making the correct call in the first place i.e
command = "/your/instance/of/mpirun /your/instance/of/python your_script.py -arguments"
And then getting the result back into your single threaded code, which dependent on size there are many ways, but something like parallel hdf5 would be a good place to look if you have to pass back big array data.
Sorry I cant give you an easy solution.

How can you get the call tree with Python profilers?

I used to use a nice Apple profiler that is built into the System Monitor application. As long as your C++ code was compiled with debug information, you could sample your running application and it would print out an indented tree telling you what percent of the parent function's time was spent in this function (and the body vs. other function calls).
For instance, if main called function_1 and function_2, function_2 calls function_3, and then main calls function_3:
main (100%, 1% in function body):
function_1 (9%, 9% in function body):
function_2 (90%, 85% in function body):
function_3 (100%, 100% in function body)
function_3 (1%, 1% in function body)
I would see this and think, "Something is taking a long time in the code in the body of function_2. If I want my program to be faster, that's where I should start."
How can I most easily get this exact profiling output for a Python program?
I've seen people say to do this:
import cProfile, pstats
prof = cProfile.Profile()
prof = prof.runctx("real_main(argv)", globals(), locals())
stats = pstats.Stats(prof)
stats.sort_stats("time") # Or cumulative
stats.print_stats(80) # 80 = how many to print
But it's quite messy compared to that elegant call tree. Please let me know if you can easily do this, it would help quite a bit.
I just stumbled on this as well, and spent some time learning how to generate a call graph (the normal results of cProfile is not terribly informative). Future reference, here's another way to generate a beautiful call-tree graphic with cProfile + gprof2dot + graphViz.
———————
Install GraphViz: http://www.graphviz.org/Download_macos.php
easy_install gprof2dot
Run profile on the code.
python -m cProfile -o myLog.profile <myScript.py> arg1 arg2 ...
Run gprof2dot to convert the call profile into a dot file
gprof2dot -f pstats myLog.profile -o callingGraph.dot
Open with graphViz to visualize the graph
Here's what the end result would look like!
Graph is color-coded- red means higher concentration of time.
I recently wanted the same thing, so I took a stab at implementing one myself.
The project's on GitHub, https://github.com/joerick/pyinstrument
Here's how you would use it:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# Code you want to profile
profiler.stop()
print(profiler.output_text())
The gprof2dot approach extracts all information nicely, so I'm a fan. However, sometimes I want to look at timing data in a call tree, so I created tuna.
Install with
pip install tuna
and display your profile with
tuna program.prof
Check out this library http://pycallgraph.slowchop.com/ for call graphs. It works really well. If you want to profile specific functions, check out http://mg.pov.lt/blog/profiling.html
This is a result from the profilehooks module.

Printing Variable names and contents as debugging tool; looking for emacs/Python shortcut

I find myself adding debugging "print" statements quite often -- stuff like this:
print("a_variable_name: %s" % a_variable_name)
How do you all do that? Am I being neurotic in trying to find a way to optimize this? I may be working on a function and put in a half-dozen or so of those lines, figure out why it's not working, and then cut them out again.
Have you developed an efficient way of doing that?
I'm coding Python in Emacs.
Sometimes a debugger is great, but sometimes using print statements is quicker, and easier to setup and use repeatedly.
This may only be suitable for debugging with CPython (since not all Pythons implement inspect.currentframe and inspect.getouterframes), but I find this useful for cutting down on typing:
In utils_debug.py:
import inspect
def pv(name):
record=inspect.getouterframes(inspect.currentframe())[1]
frame=record[0]
val=eval(name,frame.f_globals,frame.f_locals)
print('{0}: {1}'.format(name, val))
Then in your script.py:
from utils_debug import pv
With this setup, you can replace
print("a_variable_name: %s' % a_variable_name)
with
pv('a_variable_name')
Note that the argument to pv should be the string (variable name, or expression), not the value itself.
To remove these lines using Emacs, you could
C-x ( # start keyboard macro
C-s pv('
C-a
C-k # change this to M-; if you just want to comment out the pv call
C-x ) # end keyboard macro
Then you can call the macro once with C-x e
or a thousand times with C-u 1000 C-x e
Of course, you have to be careful that you do indeed want to remove all lines containing pv(' .
Don't do that. Use a decent debugger instead. The easiest way to do that is to use IPython and either to wait for an exception (the debugger will set off automatically), or to provoke one by running an illegal statement (e.g. 1/0) at the part of the code that you wish to inspect.
I came up with this:
Python string interpolation implementation
I'm just testing it and its proving handy for me while debugging.

Profiling in Python: Who called the function?

I'm profiling in Python using cProfile. I found a function that takes a lot of CPU time. How do I find out which function is calling this heavy function the most?
EDIT:
I'll settle for a workaround: Can I write a Python line inside that heavy function that will print the name of the function that called it?
I almost always view the output of the cProfile module using Gprof2dot, basically it converts the output into a graphvis graph (a .dot file), for example:
It makes it very easy to determine which function is slowest, and which function[s] called it.
Usage is:
python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
That may not answer your question directly, but will definitely help. If use the profiler with option --sort cumulative it will sort the functions by cumulative time. Which is helpful to detect not only heavy functions but the functions that call them.
python -m cProfile --sort cumulative myScript.py
There is a workaround to get the caller function:
import inspect
print inspect.getframeinfo(inspect.currentframe().f_back)[2]
You can add as many f_back as you want in case you want the caller caller etc
If you want to calculate frequent calls you can do this:
record = {}
caller = inspect.getframeinfo(inspect.currentframe().f_back)[2]
record[caller] = record.get(caller, 0) + 1
Then print them by order of frequency:
print sorted(record.items(), key=lambda a: a[1])
inspect.stack() will give you the current caller stack.
You might want to take a look at pycallgraph.
It is possible to do it using profiler cProfile in standard library.
In pstats.Stats (the profiler result) there is method print_callees (or alternatively print_callers).
Example code:
import cProfile, pstats
pr = cProfile.Profile()
pr.enable()
# ... do something ...
pr.disable()
ps = pstats.Stats(pr).strip_dirs().sort_stats('cumulative')
ps.print_callees()
Result will be something like:
Function called...
ncalls tottime cumtime
ElementTree.py:1517(_start_list) -> 24093 0.048 0.124 ElementTree.py:1399(start)
46429 0.015 0.041 ElementTree.py:1490(_fixtext)
70522 0.015 0.015 ElementTree.py:1497(_fixname)
ElementTree.py:1527(_data) -> 47827 0.017 0.026 ElementTree.py:1388(data)
47827 0.018 0.053 ElementTree.py:1490(_fixtext)
On the left you have the caller, on the right you have the callee.
(for example _fixtext was called from _data 47827 times and from _start_list 46429 times)
See also:
docs.python.org/..#print_callees - show call hierarchy. Group by the caller. (used above)
docs.python.org/..#print_callers - show call hierarchy. Group by the callee.
Couple of notes:
Your code needs to be edited for this (insert those profile statements).
(i.e. not possible to use from command line like python -m cProfile myscript.py. Though it is possible to write separate script for that)
A bit unrelated, but strip_dirs() must go before sort_stats() (otherwise sorting does not work)
I have not used cProfile myself, but most profilers give you a call hierarchy.
Googling I found this slides about cProfile. Maybe that helps. Page 6 looks like cProfile does provide a hierarchy.
Sorry I'm not familiar with Python, but there's a general method that works, assuming you can manually interrupt execution at a random time.
Just do so, and display the call stack. It will tell you, with high probability, what you want to know. If you want to be more certain, just do it several times.
It works because the guilty caller has to be on the call stack for the fraction of time that's being wasted, which exposes it to your interrupts for that much of the time, whether it is spread over many short calls or a few lengthy ones.
NOTE: This process is more like diagnosis than measurement. Suppose that bad call is wasting 90% of the time. Then each time you halt it, the probability is 90% that the bad call statement is right there on the call stack for you to see, and you will be able to see that it's bad. However, if you want to exactly measure the wastage, that's a different problem. For that, you will need a lot more samples, to see what % of them contain that call. Or alternatively, just fix the guilty call, clock the speedup, and that will tell you exactly what the wastage was.
Pycscope does this. I just found it today, so I can't speak to how good it is, but the few examples I've tried have been pretty good (though not perfect).
https://pypi.python.org/pypi/pycscope/
You would use this to generate a cscope file and then a cscope plugin from an editor, VIM specifically. I tried using it with vanilla cscope, it seems that plain cscope gets confused.

Categories

Resources