I have tried using the line_profiler module for getting a line-by-line profile over a Python file. This is what I've done so far:
1) Installed line_profiler from pypi by using the .exe file (I am on WinXP and Win7). Just clicked through the installation wizard.
2) Written a small piece of code (similar to what has been asked in another answered question here).
from line_profiler import LineProfiler
def do_stuff(numbers):
print numbers
numbers = 2
profile = LineProfiler(do_stuff(numbers))
profile.print_stats()
3) Run the code from IDLE/PyScripter. I got only the time.
Timer unit: 4.17188e-10 s
How do I get full line-by-line profile over the code I execute? I have never used any advanced Python features like decorators, so it is hard for me to understand how shall I use the guidelines provided by several posts like here and here.
This answer is a copy of my answer here for how to get line_profiler statistics from within a Python script (without using kernprof from the command line or having to add #profile decorators to functions and class methods). All answers (that I've seen) to similar line_profiler questions only describe using kernprof.
The line_profiler test cases (found on GitHub) have an example of how to generate profile data from within a Python script. You have to wrap the function that you want to profile and then call the wrapper passing any desired function arguments.
from line_profiler import LineProfiler
import random
def do_stuff(numbers):
s = sum(numbers)
l = [numbers[i]/43 for i in range(len(numbers))]
m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()
Output:
Timer unit: 1e-06 s
Total time: 0.000649 s
File: <ipython-input-2-2e060b054fea>
Function: do_stuff at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 def do_stuff(numbers):
5 1 10 10.0 1.5 s = sum(numbers)
6 1 186 186.0 28.7 l = [numbers[i]/43 for i in range(len(numbers))]
7 1 453 453.0 69.8 m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
Adding Additional Functions to Profile
Also, you can add additional functions to be profiled as well. For example, if you had a second called function and you only wrap the calling function, you'll only see the profile results from the calling function.
from line_profiler import LineProfiler
import random
def do_other_stuff(numbers):
s = sum(numbers)
def do_stuff(numbers):
do_other_stuff(numbers)
l = [numbers[i]/43 for i in range(len(numbers))]
m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()
The above would only produce the following profile output for the calling function:
Timer unit: 1e-06 s
Total time: 0.000773 s
File: <ipython-input-3-ec0394d0a501>
Function: do_stuff at line 7
Line # Hits Time Per Hit % Time Line Contents
==============================================================
7 def do_stuff(numbers):
8 1 11 11.0 1.4 do_other_stuff(numbers)
9 1 236 236.0 30.5 l = [numbers[i]/43 for i in range(len(numbers))]
10 1 526 526.0 68.0 m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
In this case, you can add the additional called function to profile like this:
from line_profiler import LineProfiler
import random
def do_other_stuff(numbers):
s = sum(numbers)
def do_stuff(numbers):
do_other_stuff(numbers)
l = [numbers[i]/43 for i in range(len(numbers))]
m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp.add_function(do_other_stuff) # add additional function to profile
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()
Output:
Timer unit: 1e-06 s
Total time: 9e-06 s
File: <ipython-input-4-dae73707787c>
Function: do_other_stuff at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 def do_other_stuff(numbers):
5 1 9 9.0 100.0 s = sum(numbers)
Total time: 0.000694 s
File: <ipython-input-4-dae73707787c>
Function: do_stuff at line 7
Line # Hits Time Per Hit % Time Line Contents
==============================================================
7 def do_stuff(numbers):
8 1 12 12.0 1.7 do_other_stuff(numbers)
9 1 208 208.0 30.0 l = [numbers[i]/43 for i in range(len(numbers))]
10 1 474 474.0 68.3 m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
NOTE: Adding functions to profile in this way does not require changes to the profiled code (i.e., no need to add #profile decorators).
Just follow Dan Riti's example from the first link, but use your code. All you have to do after installing the line_profiler module is add a #profile decorator right before each function you wish to profile line-by-line and make sure each one is called at least once somewhere else in the code—so for your trivial example code that would be something like this:
example.py file:
#profile
def do_stuff(numbers):
print numbers
numbers = 2
do_stuff(numbers)
Having done that, run your script via the kernprof.py✶ that was installed in your C:\Python27\Scripts directory. Here's the (not very interesting) actual output from doing this in a Windows 7 command-line session:
> python "C:\Python27\Scripts\kernprof.py" -l -v example.py
2
Wrote profile results to example.py.lprof
Timer unit: 3.2079e-07 s
File: example.py
Function: do_stuff at line 2
Total time: 0.00185256 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 #profile
2 def do_stuff(numbers):
3 1 5775 5775.0 100.0 print numbers
You likely need to adapt this last step—the running of your test script with kernprof.py instead of directly by the Python interpreter—in order to do the equivalent from within IDLE or PyScripter.
✶Update
It appears that in line_profiler v1.0, the kernprof utility is distributed as an executable, not a .py script file as it was when I wrote the above. This means the following now needs to used to invoke it from the command-line:
> "C:\Python27\Scripts\kernprof.exe" -l -v example.py
load the line_profiler and numpy
%load_ext line_profiler
import numpy as np
define a function for example:
def take_sqr(array):
sqr_ar = [np.sqrt(x) for x in array]
return sqr_ar
use line_profiler to count the time as follows:
%lprun -f take_sqr take_sqr([1,2,3])
the output looks like this:
Timer unit: 1e-06 s
Total time: 6e-05 s File: <ipython-input-5-e50c1b05a473> Function:
take_sqr at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def take_sqr(array):
2 4 59.0 14.8 98.3 sqr_ar = [np.sqrt(x) for x in array]
3 1 1.0 1.0 1.7 return sqr_ar
Found a good use to line_profiler using decorator i.e. #profile that worked for me:
def profile(func):
from functools import wraps
#wraps(func)
def wrapper(*args, **kwargs):
from line_profiler import LineProfiler
prof = LineProfiler()
try:
return prof(func)(*args, **kwargs)
finally:
prof.print_stats()
return wrapper
Credits to: pavelpatrin
If you're using PyCharm, you can also take a look at
https://plugins.jetbrains.com/plugin/16536-line-profiler
It's a plugin I created that allows you to load and visualize line profiler results into the PyCharm editor.
Just an addition to #Lhenkel answer.
This is a decorator for async functions
def async_profile(func):
"""line profiler for an async funciton"""
from functools import wraps
#wraps(func)
async def wrapper(*args, **kwargs):
from line_profiler import LineProfiler
prof = LineProfiler()
try:
return await prof(func)(*args, **kwargs)
finally:
prof.print_stats()
return wrapper
To use these decorators with methods read this answer
Related
If I import numpy in a single process, it takes approximately 0.0749 seconds:
python -c "import time; s=time.time(); import numpy; print(time.time() - s)"
Now if I run the same code in multiple Processes, they all import significantly slower:
import subprocess
cmd = 'python -c "import time; s=time.time(); import numpy; print(time.time() - s)"'
for n in range(5):
m = 2**n
print(f"Importing numpy on {m} Process(es):")
processes = []
for i in range(m):
processes.append(subprocess.Popen(cmd, shell=True))
for p in processes:
p.wait()
print()
gives the output:
Importing numpy on 1 Process(es):
0.07726049423217773
Importing numpy on 2 Process(es):
0.110260009765625
0.11645245552062988
Importing numpy on 4 Process(es):
0.13133740425109863
0.1264667510986328
0.13683867454528809
0.153900146484375
Importing numpy on 8 Process(es):
0.13650751113891602
0.15682148933410645
0.17088770866394043
0.1705784797668457
0.1690073013305664
0.18076491355895996
0.18901371955871582
0.18936467170715332
Importing numpy on 16 Process(es):
0.24082279205322266
0.24885773658752441
0.25356197357177734
0.27071142196655273
0.29327893257141113
0.2999141216278076
0.297823429107666
0.31664466857910156
0.20108580589294434
0.33217334747314453
0.24672770500183105
0.34597229957580566
0.24964046478271484
0.3546409606933594
0.26511287689208984
0.2684178352355957
The import time per Process seems to grow almost linearly with the number of Processes (especially as the number of Processes grows large), it seems we spend a total of about O(n^2) time on importing. I know there is an import lock, but not sure why it is there. Are there any work arounds? And if I work on a server with many users running many tasks, could I be slowed down by someone spawning tons of workers that just import common packages?
The pattern is clearer for larger n, here's a script that shows that more clearly by just reporting the average import time for n workers:
import multiprocessing
import time
def f(x):
s = time.time()
import numpy as np
return time.time() - s
ps = []
for n in range(10):
m = 2**n
with multiprocessing.Pool(m) as p:
print(f"importing with {m} worker(s): {sum(p.map(f, range(m)))/m}")
output:
importing with 1 worker(s): 0.06654548645019531
importing with 2 worker(s): 0.11186492443084717
importing with 4 worker(s): 0.11750376224517822
importing with 8 worker(s): 0.14901494979858398
importing with 16 worker(s): 0.20824094116687775
importing with 32 worker(s): 0.32718323171138763
importing with 64 worker(s): 0.5660803504288197
importing with 128 worker(s): 1.034045523032546
importing with 256 worker(s): 1.8989756992086768
importing with 512 worker(s): 3.558808562345803
extra details about environment in which I ran this:
python version: 3.8.6
pip list:
Package Version
---------- -------
numpy 1.20.1
pip 21.0.1
setuptools 53.0.0
wheel 0.36.2
os:
NAME="Pop!_OS"
VERSION="20.10"
Is it just reading from filesystem that is the problem?
I've added this simple test where instead of importing, I now just read the numpy files and do some sanity check calculations:
import subprocess
cmd = 'python read_numpy.py'
for n in range(5):
m = 2**n
print(f"Running on {m} Process(es):")
processes = []
for i in range(m):
processes.append(subprocess.Popen(cmd, shell=True))
for p in processes:
p.wait()
print()
with read_numpy.py:
import os
import time
file_path = "/home/.virtualenvs/multiprocessing-import/lib/python3.8/site-packages/numpy"
t1 = time.time()
parity = 0
for root, dirs, filenames in os.walk(file_path):
for name in filenames:
contents = open(os.path.join(root, name), "rb").read()
parity = (parity + sum([x%2 for x in contents]))%2
print(parity, time.time() - t1)
Running this gives me the following output:
Running on 1 Process(es):
1 0.8050086498260498
Running on 2 Process(es):
1 0.8164374828338623
1 0.8973987102508545
Running on 4 Process(es):
1 0.8233649730682373
1 0.81931471824646
1 0.8731539249420166
1 0.8883578777313232
Running on 8 Process(es):
1 0.9382946491241455
1 0.9511561393737793
1 0.9752676486968994
1 1.0584545135498047
1 1.1573944091796875
1 1.163221836090088
1 1.1602907180786133
1 1.219961166381836
Running on 16 Process(es):
1 1.337137222290039
1 1.3456192016601562
1 1.3102262020111084
1 1.527071475982666
1 1.5436983108520508
1 1.651414394378662
1 1.656200647354126
1 1.6047494411468506
1 1.6851506233215332
1 1.6949374675750732
1 1.744239330291748
1 1.798882246017456
1 1.8150532245635986
1 1.8266475200653076
1 1.769331455230713
1 1.8609044551849365
There is some slowdown, 0.805 seconds for 1 worker, and between 0.819 and 0.888 seconds for 4 workers. Compared to import: 0.07 seconds for 1 worker, and between 0.126 and 0.153 seconds for 4 workers. Seems like there might be something other than filesystem reads slowing down import
I know that pdb is an interactive system and it is very helpful.
My ultimate goal is to gather all memory states after executing each command in certain function, command by command. For example, with a code snippet
0: def foo() :
1: if True:
2: x=1
3: else:
4: x=2
5: x
then the memory state of each command is
0: empty
1: empty
2: x = 1
3: x = 1
4: (not taken)
5: x = 1
To do this, what I'd like to do with pdb is to write a script that interact with pdb class. I know that s is a function to step forward in statements and print var(in the above case, var is x) is a function to print the value of certain variable. I can gather variables at each command. Then, I want to run a script like below:
import pdb
pdb.run('foo()')
while(not pdb.end()):
pdb.s()
pdb.print('x')
But I cannot find any way how to implement this functionality. Can anybody help me??
Try memory_profiler:
The line-by-line memory usage mode is used much in the same way of the
line_profiler: first decorate the function you would like to profile
with #profile and then run the script with a special script (in this
case with specific arguments to the Python interpreter).
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Or Heapy:
The aim of Heapy is to support debugging and optimization regarding
memory related issues in Python programs.
Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 35144 27 2140412 26 2140412 26 str
1 38397 29 1309020 16 3449432 42 tuple
2 530 0 739856 9 4189288 50 dict (no owner)
I am new to the line_profiler package in python. Am I reading the result incorrectly, or shouldn't the components in the output below add up to 1.67554 seconds? Instead, they add up to 3.918 seconds (2426873 microseconds + 1491105 microseconds). Thanks!
# test.py
import numpy as np
def tf():
arr = np.random.randn(3000,6000)
np.where(arr>1,arr,np.nan)
import test
%lprun -f test.tf test.tf()
Timer unit: 4.27654e-07 s
File: test.py
Function: tf at line 9
Total time: 1.67554 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def tf():
10 1 2426873 2426873.0 61.9 arr = np.random.randn(3000,6000)
11 1 1491105 1491105.0 38.1 np.where(arr>1,arr,np.nan)
You misread the time there; those are not microseconds.
From the documentation:
Time: The total amount of time spent executing the line in the timer's units. In the header information before the tables, you will see a line "Timer unit:" giving the conversion factor to seconds. It may be different on different systems.
Emphasis mine. Your output shows each Timer unit is about 0.428 microseconds. The totals match if you multiply the units with the Timer unit value:
>>> unit = 4.27654e-07
>>> 2426873 * unit + 1491105 * unit
1.675538963612
Please excuse this naive question of mine. I am trying to monitor memory usage of my python code, and have come across the promising memory_profiler package. I have a question about interpreting the output generated by #profile decorator.
Here is a sample output that I get by running my dummy code below:
dummy.py
from memory_profiler import profile
#profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
Calling dummy.py by "python dummy.py" returns the table below.
Line # Mem usage Increment Line Contents
3 8.2 MiB 0.0 MiB #profile
4 def my_func():
5 15.8 MiB 7.6 MiB a = [1] * (10 ** 6)
6 168.4 MiB 152.6 MiB b = [2] * (2 * 10 ** 7)
7 15.8 MiB -152.6 MiB del b
8 15.8 MiB 0.0 MiB return a
My question is what does the 8.2 MiB in the first line of the table correspond to. My guess is that it is the initial memory usage by the python interpreter itself; but I am not sure. If that is the case, is there a way to have this baseline usage automatically subtracted from the memory usage of the script?
Many thanks for your time and consideration!
Noushin
According to the docs:
The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one.
So, that 8.2 MiB is the memory usage after the first line has been executed. That includes the memory needed to start up Python, load your script and all of its imports (including memory_profiler itself), and so on.
There don't appear to be any documented options for removing that from each entry. But it wouldn't be too hard to post-process the results.
Alternatively, do you really need to do that? The third column shows how much additional memory has been used after each line, and either that, or the sum of that across a range of lines, seems more interesting than the difference between each line's second column and the start.
The difference in memory between lines is given in the second column or you could write a small script to process the output.
Can some body help me as how to find how much time and how much memory does it take for a code in python?
Use this for calculating time:
import time
time_start = time.clock()
#run your code
time_elapsed = (time.clock() - time_start)
As referenced by the Python documentation:
time.clock()
On Unix, return the current processor time as a floating
point number expressed in seconds. The precision, and in fact the very
definition of the meaning of “processor time”, depends on that of the
C function of the same name, but in any case, this is the function to
use for benchmarking Python or timing algorithms.
On Windows, this function returns wall-clock seconds elapsed since the
first call to this function, as a floating point number, based on the
Win32 function QueryPerformanceCounter(). The resolution is typically
better than one microsecond.
Reference: http://docs.python.org/library/time.html
Use this for calculating memory:
import resource
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Reference: http://docs.python.org/library/resource.html
Based on #Daniel Li's answer for cut&paste convenience and Python 3.x compatibility:
import time
import resource
time_start = time.perf_counter()
# insert code here ...
time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0
print ("%5.1f secs %5.1f MByte" % (time_elapsed,memMb))
Example:
2.3 secs 140.8 MByte
There is a really good library called jackedCodeTimerPy for timing your code. You should then use resource package that Daniel Li suggested.
jackedCodeTimerPy gives really good reports like
label min max mean total run count
------- ----------- ----------- ----------- ----------- -----------
imports 0.00283813 0.00283813 0.00283813 0.00283813 1
loop 5.96046e-06 1.50204e-05 6.71864e-06 0.000335932 50
I like how it gives you statistics on it and the number of times the timer is run.
It's simple to use. If i want to measure the time code takes in a for loop i just do the following:
from jackedCodeTimerPY import JackedTiming
JTimer = JackedTiming()
for i in range(50):
JTimer.start('loop') # 'loop' is the name of the timer
doSomethingHere = 'This is really useful!'
JTimer.stop('loop')
print(JTimer.report()) # prints the timing report
You can can also have multiple timers running at the same time.
JTimer.start('first timer')
JTimer.start('second timer')
do_something = 'amazing'
JTimer.stop('first timer')
do_something = 'else'
JTimer.stop('second timer')
print(JTimer.report()) # prints the timing report
There are more use example in the repo. Hope this helps.
https://github.com/BebeSparkelSparkel/jackedCodeTimerPY
Use a memory profiler like guppy
>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 25773 53 1612820 49 1612820 49 str
1 11699 24 483960 15 2096780 64 tuple
2 174 0 241584 7 2338364 72 dict of module
3 3478 7 222592 7 2560956 78 types.CodeType
4 3296 7 184576 6 2745532 84 function
5 401 1 175112 5 2920644 89 dict of class
6 108 0 81888 3 3002532 92 dict (no owner)
7 114 0 79632 2 3082164 94 dict of type
8 117 0 51336 2 3133500 96 type
9 667 1 24012 1 3157512 97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 136 77 136 77 dict (no owner)
1 1 33 28 16 164 93 list
2 1 33 12 7 176 100 int
>>> x=[]
>>> h.iso(x).sp
0: h.Root.i0_modules['__main__'].__dict__['x']