Calculating computational time and memory for a code in python - python

Can some body help me as how to find how much time and how much memory does it take for a code in python?

Use this for calculating time:
import time
time_start = time.clock()
#run your code
time_elapsed = (time.clock() - time_start)
As referenced by the Python documentation:
time.clock()
On Unix, return the current processor time as a floating
point number expressed in seconds. The precision, and in fact the very
definition of the meaning of “processor time”, depends on that of the
C function of the same name, but in any case, this is the function to
use for benchmarking Python or timing algorithms.
On Windows, this function returns wall-clock seconds elapsed since the
first call to this function, as a floating point number, based on the
Win32 function QueryPerformanceCounter(). The resolution is typically
better than one microsecond.
Reference: http://docs.python.org/library/time.html
Use this for calculating memory:
import resource
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Reference: http://docs.python.org/library/resource.html

Based on #Daniel Li's answer for cut&paste convenience and Python 3.x compatibility:
import time
import resource
time_start = time.perf_counter()
# insert code here ...
time_elapsed = (time.perf_counter() - time_start)
memMb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024.0/1024.0
print ("%5.1f secs %5.1f MByte" % (time_elapsed,memMb))
Example:
2.3 secs 140.8 MByte

There is a really good library called jackedCodeTimerPy for timing your code. You should then use resource package that Daniel Li suggested.
jackedCodeTimerPy gives really good reports like
label min max mean total run count
------- ----------- ----------- ----------- ----------- -----------
imports 0.00283813 0.00283813 0.00283813 0.00283813 1
loop 5.96046e-06 1.50204e-05 6.71864e-06 0.000335932 50
I like how it gives you statistics on it and the number of times the timer is run.
It's simple to use. If i want to measure the time code takes in a for loop i just do the following:
from jackedCodeTimerPY import JackedTiming
JTimer = JackedTiming()
for i in range(50):
JTimer.start('loop') # 'loop' is the name of the timer
doSomethingHere = 'This is really useful!'
JTimer.stop('loop')
print(JTimer.report()) # prints the timing report
You can can also have multiple timers running at the same time.
JTimer.start('first timer')
JTimer.start('second timer')
do_something = 'amazing'
JTimer.stop('first timer')
do_something = 'else'
JTimer.stop('second timer')
print(JTimer.report()) # prints the timing report
There are more use example in the repo. Hope this helps.
https://github.com/BebeSparkelSparkel/jackedCodeTimerPY

Use a memory profiler like guppy
>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 25773 53 1612820 49 1612820 49 str
1 11699 24 483960 15 2096780 64 tuple
2 174 0 241584 7 2338364 72 dict of module
3 3478 7 222592 7 2560956 78 types.CodeType
4 3296 7 184576 6 2745532 84 function
5 401 1 175112 5 2920644 89 dict of class
6 108 0 81888 3 3002532 92 dict (no owner)
7 114 0 79632 2 3082164 94 dict of type
8 117 0 51336 2 3133500 96 type
9 667 1 24012 1 3157512 97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 136 77 136 77 dict (no owner)
1 1 33 28 16 164 93 list
2 1 33 12 7 176 100 int
>>> x=[]
>>> h.iso(x).sp
0: h.Root.i0_modules['__main__'].__dict__['x']

Related

Memory leak in python-igraph with induced_subgraph function?

I'm working with igraph in Python and encountered a problem when calling induced_subgraph function a million times - the problem is, there is a consumption of memory.
import gc
import igraph
import resource
g = igraph.Graph(n=4, edges=[[0, 1], [0,2], [0, 3]])
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
for i in range (1,1000001):
igraph_list_v = [0,1,2,3]
s = g.induced_subgraph(igraph_list_v)
del s
if i%100000 == 0:
gc.collect()
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss - mem)
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Output:
0
4752
0
4752
0
4752
0
4752
0
4488
0
4752
0
4732
0
4752
0
4752
0
4752
Zeros are from gc.collect() function. Based on output, the induced_subgraph function consumes 4752 kilobytes every 100000 calls.
Printed memory is increasing with each print statement. This is a dummy case, but I call induced_subgraph function many million times and it breaks my script due to memory error. What is a possible workaround and what is actually reason for such behavior? Also keep in mind that I am on beginner level, so keep answer as simple as possible. Thank you for help!

Guarantee that a C program starting at exactly the same time as a Python program will have the same integer time value

I am currently writing code involving piping data from a C program to a Python program. This requires that they both have exactly the same time value as an integer. My method of getting the time is:
time(0) for C
int(time.time()) for Python
However, I am getting inconsistencies in output leading me to believe that this is not resulting in the same value. The C program takes < 0.001s to run, while:
time ./cprog | python pythonprog.py
gives times typically looking like this:
real 0m0.043s
user 0m0.052s
sys 0m0.149s
Approximately one in every 5 runs results in the expected output. Can I make this more consistent?
Not a solution - but an explanation.
When starting python (or other interpreted/VM langauge), there is usually startup cost associated with read and parsing the many modules that are needed. Even a small Python program like 'print 5' will will perform large number of IO.
The startup cost will delay the initial lookup for the current time.
From strace output, invoking a python code will result in >200 open calls, ~ (f)stat, >20 mmap calls, etc.
strace -c python prog.py
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
31.34 0.000293 1 244 178 openat
9.84 0.000092 1 100 fstat
9.20 0.000086 1 90 60 stat
8.98 0.000084 1 68 rt_sigaction
7.81 0.000073 1 66 close
2.14 0.000020 2 9 brk
1.82 0.000017 9 2 munmap
1.39 0.000013 1 26 mmap
1.18 0.000011 2 5 lstat
...

Python threads memory leak

I have a class like this:
class Detector:
...
def detect:
sniff(iface='eth6', filter='vlan or not vlan and udp port 53', prn=self.spawnThread, store=0)
def spawnThread(self, pkt):
t = threading.Thread(target=self.predict, args=(pkt,))
t.start()
def predict(self, pkt):
# do something
# write log file with logging module
wheresniff() is a method from scapy, for every packet it captured, it passes the packet to spawnThread, in spawnThread I want to create different threads to run predict method.
But there seems to be a memory leak, I checked with Heapy and get this output:
Partition of a set of 623561 objects. Total size = 87355208 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 236145 38 26871176 31 26871176 31 str
1 139658 22 13805832 16 40677008 47 tuple
2 6565 1 7366648 8 48043656 55 dict (no owner)
3 1408 0 6592768 8 54636424 63 dict of module
4 25764 4 3297792 4 57934216 66 types.CodeType
5 17737 3 3223240 4 61157456 70 list
6 24878 4 2985360 3 64142816 73 function
7 14367 2 2577384 3 66720200 76 unicode
8 2445 0 2206320 3 68926520 79 type
9 2445 0 2173752 2 71100272 81 dict of type
the count and size of tuple objects keeps growing, I think that's what causes memory leak but I don't know where and why. Thanks for any feed back!
Update: If I directly call predict from sniff without using threads, there is no memory leak. And also, there are no other tuple objects anywhere else in the class. in __init__ I just initiated some strings like paths and names.
class Detector:
...
def detect(self):
sniff(iface='eth6', filter='vlan or not vlan and udp port 53',
prn=self.predict, store=0)
def predict(self, pkt):
# do something with pkt
# write log file with loggin module

How can I embed pdb in my program? - Python

I know that pdb is an interactive system and it is very helpful.
My ultimate goal is to gather all memory states after executing each command in certain function, command by command. For example, with a code snippet
0: def foo() :
1: if True:
2: x=1
3: else:
4: x=2
5: x
then the memory state of each command is
0: empty
1: empty
2: x = 1
3: x = 1
4: (not taken)
5: x = 1
To do this, what I'd like to do with pdb is to write a script that interact with pdb class. I know that s is a function to step forward in statements and print var(in the above case, var is x) is a function to print the value of certain variable. I can gather variables at each command. Then, I want to run a script like below:
import pdb
pdb.run('foo()')
while(not pdb.end()):
pdb.s()
pdb.print('x')
But I cannot find any way how to implement this functionality. Can anybody help me??
Try memory_profiler:
The line-by-line memory usage mode is used much in the same way of the
line_profiler: first decorate the function you would like to profile
with #profile and then run the script with a special script (in this
case with specific arguments to the Python interpreter).
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Or Heapy:
The aim of Heapy is to support debugging and optimization regarding
memory related issues in Python programs.
Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 35144 27 2140412 26 2140412 26 str
1 38397 29 1309020 16 3449432 42 tuple
2 530 0 739856 9 4189288 50 dict (no owner)

heapy reports memory usage << top

NB: This is my first foray into memory profiling with Python, so perhaps I'm asking the wrong question here. Advice re improving the question appreciated.
I'm working on some code where I need to store a few million small strings in a set. This, according to top, is using ~3x the amount of memory reported by heapy. I'm not clear what all this extra memory is used for and how I can go about figuring out whether I can - and if so how to - reduce the footprint.
memtest.py:
from guppy import hpy
import gc
hp = hpy()
# do setup here - open files & init the class that holds the data
print 'gc', gc.collect()
hp.setrelheap()
raw_input('relheap set - enter to continue') # top shows 14MB resident for python
# load data from files into the class
print 'gc', gc.collect()
h = hp.heap()
print h
raw_input('enter to quit') # top shows 743MB resident for python
The output is:
$ python memtest.py
gc 5
relheap set - enter to continue
gc 2
Partition of a set of 3197065 objects. Total size = 263570944 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3197061 100 263570168 100 263570168 100 str
1 1 0 448 0 263570616 100 types.FrameType
2 1 0 280 0 263570896 100 dict (no owner)
3 1 0 24 0 263570920 100 float
4 1 0 24 0 263570944 100 int
So in summary, heapy shows 264MB while top shows 743MB. What's using the extra 500MB?
Update:
I'm running 64 bit python on Ubuntu 12.04 in VirtualBox in Windows 7.
I installed guppy as per the answer here:
sudo pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppy

Categories

Resources