When writing code In Python I usually use Cprofile which prints the profile results in the console:
import cProfile, pstats, StringIO
pr = cProfile.Profile()
pr.enable()
#do stuff
pr.disable()
s = StringIO.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats()
print s.getvalue()
Is there any alternatives in C++?
Edit - I'm using VS 2008 Express, Windows 64 bits.
Just go to Analyze -> Profiler -> Attach/Detach.
Valgrind, specifically callgrind, is pretty much the standard way of doing this in C++. It's a little more complicated than python though, since python can basically monkey patch all calls to every method to generate call graphs and stuff.
http://valgrind.org/docs/manual/cl-manual.html
Related
I am new to Cython. I've written a super simple test programm to access benefits of Cython. Yet, my pure python is alot faster. Am I doing something wrong?
test.py:
import timeit
imp = '''
import pyximport; pyximport.install()
from hello_cy import hello_c
from hello_py import hello_p
'''
code_py = '''
hello_p()
'''
code_cy = '''
hello_c()
'''
print(timeit.timeit(stmt=code_py, setup=imp))
print(timeit.timeit(stmt=code_cy, setup=imp))
hello_py.py:
def hello_p():
print('Hello World')
hello_cy.pyx:
from libc.stdio cimport printf
cpdef void hello_c():
cdef char * hello_world = 'hello from C world'
printf(hello_world)
hello_py timeit takes 14.697s
hello_cy timeit takes 98s
Am I missing something? How can I make my calls to cpdef functions run faster?
Thank you very much!
I strongly suspect a problem in your configuration.
I have (partially) reproduced your tests in Windows 10, Python 3.10.0, Cython 0.29.26, MSVC 2022, and got quite different results
Because in my tests the Cython code is slightly faster. I made 2 changes:
in hello_cy.pyx, to make both code closer, I have added the newline:
...
printf("%s\n", hello_world)
In the main script I have splitted the call of the functions and the display of the times:
...
pyp = timeit.timeit(stmt=code_py, setup=imp)
pyc = timeit.timeit(stmt=code_cy, setup=imp)
print(pyp)
print(pyc)
When I run the script I get (after pages of hello...):
...
hello from C world
hello from C world
19.135732599999756
14.712803700007498
Which looks more like what could be expected...
Anyway, we do not really know what is tested here, because as much as possible, the IO should not be tested because it depends of a lot of things outside of the programs themselves.
It just isn't a meaningful test - the two functions aren't the same:
Python print prints a string followed by a newline. I think it then flushes the output.
printf scans the string for formatting characters (e.g. %s) and then prints it without an extra newline. By default printf is line bufferred (i.e. flushes after each new line). Since you never post a new line or flush it, then it may be slowed down by managing an increasingly huge buffer.
In summary, don't be mislead by fairly meaningless microbenchmarks. Especially for terminal IO which is rarely actually a limiting factor.
It takes too long because pyximport compiles the cython code on the fly (so you are measuring also compilation from cython to C and compilation C code to native library). You should measure calling already compiled code - see https://cython.readthedocs.io/en/latest/src/quickstart/build.html#building-a-cython-module-using-setuptools
I have a program in Python that takes in several command line arguments and uses them in several functions. How can I use cProfile (within my code)to obtain the running time of each function? (I still want the program to run normally when it's done). Yet I can't figure out how, for example I cannot use
cProfile.run('loadBMPImage(sys.argv[1])')
to test the run time of the function loadBMPImage. I cannot use sys.argv[1] as an argument. Any idea how I can use cProfile to test the running time of each function and print to stdout, if each function depends on command line arguments? Also the cProfile must be integrated into the code itself. Thanks
There are several ways.
import cProfile
import pstats
import sys
def function(n):
a = 1
for i in range(n):
a += 1
return a
First one is to use a simple wrapper runctx() that allows you to specify globals and locals for the executed string. In the example below I use globals() to pass the function object, and locals to pass the parameter, but it can be arranged differently, of course.
def profile1():
cProfile.runctx("function(n)", globals(), dict(n=int(sys.argv[1])), filename='test')
return pstats.Stats('test')
A better way where you don't need to mess with exec is to use the Profile class. This way you can just profile a piece of regular code:
def profile2():
pr = cProfile.Profile()
pr.enable()
function(int(sys.argv[1]))
pr.disable()
return pstats.Stats(pr)
Just for completeness' sake to make the example runnable
if __name__ == '__main__':
profile1().print_stats()
profile2().print_stats()
I run the
python program with -m cProfile
example:
python -m cProfile <myprogram.py>
This will require zero changes to the myprogram.py
my 2 pence.
when running python3 -m cProfile yourprogram.py,
the cwd and sys.argv[0] seems to get changed (i
didn't check thoroughly), thus hurting the
implicit context of yourprogram, especially if
it's being usually run as executable.
thus, i'd rather recommend wrapping your original
code in a function, and run it by cProfile.run(),
even though your code changes a little.
def yourfunction():
import sys
print (sys.argv)
import cProfile
cProfile.run("yourfunction()")
good luck!
I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?
The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.
I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2
I tried to make a library based on mpi4py, but I want to use it in serial python code.
$ python serial_source.py
but inside serial_source.py exists some function called parallel_bar
from foo import parallel_bar
# Can I to make this with mpi4py like a common python source code?
result = parallel_bar(num_proc = 5)
The motivation for this question is about finding the right way to use mpi4py to optimize programs in python which were not necessarily designed to be run completely in parallel.
This is indeed possible and is in the documentation of mpi4py in the section Dynamic Process Management. What you need is the so called Spawn functionality which is not available with MSMPI (in case you are working with Windows) see also Spawn not implemented in MSMPI.
Example
The first file provides a kind of wrapper to your function to hide all the MPI stuff, which I guess is your intention. Internally it calls the "actual" script containing your parallel code in 4 newly spawned processes.
Finally, you can open a python terminal and call:
from my_prog import parallel_fun
parallel_fun()
# Hi from 0/4
# Hi from 3/4
# Hi from 1/4
# Hi from 2/4
# We got the magic number 6
my_prog.py
import sys
import numpy as np
from mpi4py import MPI
def parallel_fun():
comm = MPI.COMM_SELF.Spawn(
sys.executable,
args = ['child.py'],
maxprocs=4)
N = np.array(0, dtype='i')
comm.Reduce(None, [N, MPI.INT], op=MPI.SUM, root=MPI.ROOT)
print(f'We got the magic number {N}')
Here the child file with the parallel code:
child.py
from mpi4py import MPI
import numpy as np
comm = MPI.Comm.Get_parent()
print(f'Hi from {comm.Get_rank()}/{comm.Get_size()}')
N = np.array(comm.Get_rank(), dtype='i')
comm.Reduce([N, MPI.INT], None, op=MPI.SUM, root=0)
Unfortunately I don't think this is possible as you have to run the MPI code specifically with mpirun.
The best you can do is the opposite where you write generic chunks of code which can be called either by an MPI process or a normal python process.
The only other solution is to wrapper the whole MPI part of your code into an external call and call it with subprocess in your non MPI code, however this will be tied to your system configuration quite heavily, and is not really that portable.
Subprocess is detailed in this thread Using python with subprocess Popen, and is worth a look, the complexity here is making the correct call in the first place i.e
command = "/your/instance/of/mpirun /your/instance/of/python your_script.py -arguments"
And then getting the result back into your single threaded code, which dependent on size there are many ways, but something like parallel hdf5 would be a good place to look if you have to pass back big array data.
Sorry I cant give you an easy solution.
I used to use a nice Apple profiler that is built into the System Monitor application. As long as your C++ code was compiled with debug information, you could sample your running application and it would print out an indented tree telling you what percent of the parent function's time was spent in this function (and the body vs. other function calls).
For instance, if main called function_1 and function_2, function_2 calls function_3, and then main calls function_3:
main (100%, 1% in function body):
function_1 (9%, 9% in function body):
function_2 (90%, 85% in function body):
function_3 (100%, 100% in function body)
function_3 (1%, 1% in function body)
I would see this and think, "Something is taking a long time in the code in the body of function_2. If I want my program to be faster, that's where I should start."
How can I most easily get this exact profiling output for a Python program?
I've seen people say to do this:
import cProfile, pstats
prof = cProfile.Profile()
prof = prof.runctx("real_main(argv)", globals(), locals())
stats = pstats.Stats(prof)
stats.sort_stats("time") # Or cumulative
stats.print_stats(80) # 80 = how many to print
But it's quite messy compared to that elegant call tree. Please let me know if you can easily do this, it would help quite a bit.
I just stumbled on this as well, and spent some time learning how to generate a call graph (the normal results of cProfile is not terribly informative). Future reference, here's another way to generate a beautiful call-tree graphic with cProfile + gprof2dot + graphViz.
———————
Install GraphViz: http://www.graphviz.org/Download_macos.php
easy_install gprof2dot
Run profile on the code.
python -m cProfile -o myLog.profile <myScript.py> arg1 arg2 ...
Run gprof2dot to convert the call profile into a dot file
gprof2dot -f pstats myLog.profile -o callingGraph.dot
Open with graphViz to visualize the graph
Here's what the end result would look like!
Graph is color-coded- red means higher concentration of time.
I recently wanted the same thing, so I took a stab at implementing one myself.
The project's on GitHub, https://github.com/joerick/pyinstrument
Here's how you would use it:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# Code you want to profile
profiler.stop()
print(profiler.output_text())
The gprof2dot approach extracts all information nicely, so I'm a fan. However, sometimes I want to look at timing data in a call tree, so I created tuna.
Install with
pip install tuna
and display your profile with
tuna program.prof
Check out this library http://pycallgraph.slowchop.com/ for call graphs. It works really well. If you want to profile specific functions, check out http://mg.pov.lt/blog/profiling.html
This is a result from the profilehooks module.