So far, when I want to inspect what might cause some sort of code to run faster compared to a very similar method, I'm using the dis module. However, the further steps of comparing what the causes is basically adding/removing lines.
Is there a more sophisticated way of actually listing what the high-offenders are?
What kind of code do you want to analyze? If you want to analyze pure python code. You can use profile. For example:
import cProfile
cProfile.run("x=1")
Or you can run a function: cProfile.run("function()")
Then it will show you something like the following:
4 function calls in 0.013 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.013 0.013 0.013 0.013 <ipython-input-7-8201fb940887>:1(fun)
1 0.000 0.000 0.013 0.013 <string>:1(<module>)
1 0.000 0.000 0.013 0.013 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Related
I ran this code with python 3.7 to see what happens when I call import numpy.
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
import numpy
profiler.disable()
# Get and print table of stats
stats = pstats.Stats(profiler).sort_stats('time')
stats.print_stats()
The first few lines of output look like this:
79557 function calls (76496 primitive calls) in 0.120 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
32/30 0.015 0.000 0.017 0.001 {built-in method _imp.create_dynamic}
318 0.015 0.000 0.015 0.000 {built-in method builtins.compile}
115 0.011 0.000 0.011 0.000 {built-in method marshal.loads}
648 0.006 0.000 0.006 0.000 {built-in method posix.stat}
119 0.004 0.000 0.005 0.000 <frozen importlib._bootstrap_external>:914(get_data)
246/244 0.004 0.000 0.007 0.000 {built-in method builtins.__build_class__}
329 0.002 0.000 0.012 0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
59 0.002 0.000 0.002 0.000 {built-in method posix.getcwd}
It spends a lot of time on builtins.compile. Is it creating the bytecode for NumPy for pycache? Why would that happen every time?
I'm on Mac OS. What I really want is to speed up the import, and it seems to me that compile should not be necessary.
User L3viathan pointed out in a comment that the code for numpy contains explicit calls to compile. This explains why builtins.compile is getting called. Thanks!
I am developing a program that uses pandas dataframes and large dictionaries. The dataframe is read from a CSV that is approx. 700MB.
I am using Python 3.7.3 on Windows
I noticed that the program I am running is extremely slow, and slows down after each loop of the algorithm.
The program reads every line of the dataframe, checks some conditions on every item of every line of the df, and if those conditions are met, it stores the item and his state in a dictionary. This dictionary can get pretty big.
I have tried profiling my code with CProfile and I have found that the garbage-collector is the function that uses up about 90% of the execution time.
I have seen similar problems resolved by calling gc.disable() but this did nothing for me.
Weirdly (I have no idea if this is normal) but if I print(len(gc.get_objects())) as the first line of the code I get 51053 which seems a lot considering no function has been called yet.
My CProfile attempt : (on a small chunk of the CSV, as it would take hours to complete the attempt on the full CSV)
cProfile.run('get_pfs_errors("Logs/L5/L5_2000.csv")', 'restats.txt')
import pstats
from pstats import SortKey
p = pstats.Stats('restats.txt')
p.sort_stats(SortKey.CUMULATIVE).print_stats(10)
p.sort_stats(SortKey.TIME).print_stats(10)
Here are the stats from CProfile :
Tue Jun 18 15:40:19 2019 restats.txt
1719320 function calls (1459451 primitive calls) in 7.569 seconds
Ordered by: cumulative time
List reduced from 819 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.569 7.569 {built-in method builtins.exec}
1 0.001 0.001 7.569 7.569 <string>:1(<module>)
1 0.000 0.000 7.568 7.568 C:/Users/BC744818/Documents/OPTISS_L1_5/test_profile.py:6(get_pfs_errors)
1 0.006 0.006 7.503 7.503 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:416(compute_pfs_rules)
1 0.197 0.197 7.498 7.498 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:323(test_logs)
264 0.001 0.000 6.532 0.025 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\series.py:982(__setitem__)
529 0.010 0.000 6.158 0.012 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\generic.py:3205(_check_setitem_copy)
528 6.125 0.012 6.125 0.012 {built-in method gc.collect}
264 0.004 0.000 3.430 0.013 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\series.py:985(setitem)
264 0.004 0.000 3.413 0.013 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\indexing.py:183(__setitem__)
Tue Jun 18 15:40:19 2019 restats.txt
1719320 function calls (1459451 primitive calls) in 7.569 seconds
Ordered by: internal time
List reduced from 819 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
528 6.125 0.012 6.125 0.012 {built-in method gc.collect}
264 0.405 0.002 0.405 0.002 {built-in method gc.get_objects}
1 0.197 0.197 7.498 7.498 C:\Users\BC744818\Documents\OPTISS_L1_5\utils\compute_pfs_rules.py:323(test_logs)
71280/33 0.048 0.000 0.091 0.003 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\copy.py:132(deepcopy)
159671 0.033 0.000 0.056 0.000 {built-in method builtins.isinstance}
289 0.026 0.000 0.026 0.000 {built-in method nt.stat}
167191/83791 0.024 0.000 0.040 0.000 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\json\encoder.py:333(_iterencode_dict)
8118/33 0.019 0.000 0.090 0.003 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\copy.py:236(_deepcopy_dict)
167263/83794 0.017 0.000 0.048 0.000 C:\Users\BC744818\AppData\Local\Programs\Python\Python37\lib\json\encoder.py:277(_iterencode_list)
1067/800 0.017 0.000 0.111 0.000 C:\Users\BC744818\Documents\OPTISS_L1_5\venv\lib\site-packages\pandas\core\indexes\base.py:253(__new__)
Thank you #user9993950, I solved it thanks to you.
When I tested this program, I had a SettingWithCopyWarning but I wanted to fix the speed of the program before fixing this warning.
Yet it so happens that by fixing the warning I also greatly increased the speed of the program and gc is no longer taking up all of the running time
I don't know what caused this though, if someone knows and wants to share the knowledge please do.
I can't understand what the {built-in method load} output in cProfile means.
I know that question was already asked but i saw no real answer. I couldn't find out by myself either.
The executed script (Go_IA_vs_IA.py) imports functions from Go_settings.py and use them. You can find them here if it can be any useful for you : https://github.com/Ashargin/Go
Here is what I obtain :
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
25 3.793 0.152 3.793 0.152 {built-in method load}
1 0.071 0.071 3.938 3.938 Go_IA_vs_IA.py:1(<module>)
28481/205 0.027 0.000 0.061 0.000 copy.py:137(deepcopy)
5930/918 0.010 0.000 0.046 0.000 copy.py:215(_deepcopy_list)
... more lines
Obviously I want to optimize that line as it contributes for 96% of the time spent.
So, I have a script with a lot of debugging output that I can toggle on/off with a -v flag. My current code looks like this:
def vprint( obj ):
if args.verbose:
print obj
However, I'm thinking this is inefficient since every time I call vprint(), it has to jump to that function and check the value of args.verbose. I came up with this, which should be slightly more efficient:
if args.verbose:
def vprint( obj ):
print obj
else:
def vprint( obj ):
pass
While the if is now removed, it still has to jump to that function. So I was wondering if there was a way to define vprint as something like a function pointer that goes nowhere, so it could skip that altogether? Or is Python smart enough to know not to waste time on a function that's just pass?
Unless your performance analysis has lead you here, it's probably not worth optimizing. A quick set of tests yields a minor (0.040) improvement over 1000000 iterations:
1000004 function calls in 0.424 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.424 0.424 <string>:1(<module>)
1 0.242 0.242 0.424 0.424 test.py:14(testit)
1 0.000 0.000 0.424 0.424 test.py:21(testit1)
1000000 0.182 0.000 0.182 0.000 test.py:6(vprint)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000004 function calls in 0.408 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.408 0.408 <string>:1(<module>)
1000000 0.142 0.000 0.142 0.000 test.py:10(vprint2)
1 0.266 0.266 0.408 0.408 test.py:14(testit)
1 0.000 0.000 0.408 0.408 test.py:18(testit2)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Test code follows;
#!/usr/bin/python
import cProfile
verbose=False
def vprint(msg):
if verbose:
print msg
def vprint2(msg):
pass
def testit(fcn):
for i in xrange(1000000):
fcn(i)
def testit2():
testit(vprint2)
def testit1():
testit(vprint)
if __name__ == '__main__':
cProfile.run('testit1()')
cProfile.run('testit2()')
I am trying to optimize some code with Cython, but cProfile is not providing enough information.
To do a good job at profiling, should I create many sub-routines func2, func3,... , func40 ?
Note below that i have a function func1 in mycython.pyx, but it has many for loops and internal manipulations. But cProfile does not tell me stats for those loops .
2009 function calls in 81.254 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 81.254 81.254 <string>:1(<module>)
2 0.000 0.000 0.021 0.010 blah.py:1495(len)
2000 0.000 0.000 0.000 0.000 blah.py:1498(__getitem__)
1 0.214 0.214 0.214 0.214 mycython.pyx:718(func2)
1 80.981 80.981 81.216 81.216 mycython.pyx:743(func1)
1 0.038 0.038 81.254 81.254 {mycython.func1}
2 0.021 0.010 0.021 0.010 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Yes, it does. The finest granularity available to cProfile is a function call. You must split up func1 into multiple functions. (Note that you can make them functions defined inside func1 and thus only available to func1.)
If you want finer-grained profiling (line-level), then you need a different profiler. Take a look at this line-level profiler, but I don't think it works for Cython.
You need to enable profiling support for your Cython code. Use
# cython: profile=True
http://docs.cython.org/src/tutorial/profiling_tutorial.html