Please excuse this naive question of mine. I am trying to monitor memory usage of my python code, and have come across the promising memory_profiler package. I have a question about interpreting the output generated by #profile decorator.
Here is a sample output that I get by running my dummy code below:
dummy.py
from memory_profiler import profile
#profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
Calling dummy.py by "python dummy.py" returns the table below.
Line # Mem usage Increment Line Contents
3 8.2 MiB 0.0 MiB #profile
4 def my_func():
5 15.8 MiB 7.6 MiB a = [1] * (10 ** 6)
6 168.4 MiB 152.6 MiB b = [2] * (2 * 10 ** 7)
7 15.8 MiB -152.6 MiB del b
8 15.8 MiB 0.0 MiB return a
My question is what does the 8.2 MiB in the first line of the table correspond to. My guess is that it is the initial memory usage by the python interpreter itself; but I am not sure. If that is the case, is there a way to have this baseline usage automatically subtracted from the memory usage of the script?
Many thanks for your time and consideration!
Noushin
According to the docs:
The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one.
So, that 8.2 MiB is the memory usage after the first line has been executed. That includes the memory needed to start up Python, load your script and all of its imports (including memory_profiler itself), and so on.
There don't appear to be any documented options for removing that from each entry. But it wouldn't be too hard to post-process the results.
Alternatively, do you really need to do that? The third column shows how much additional memory has been used after each line, and either that, or the sum of that across a range of lines, seems more interesting than the difference between each line's second column and the start.
The difference in memory between lines is given in the second column or you could write a small script to process the output.
Related
I'm coding a Python script which makes many plots. These plots are called from a main program which calls them recursively (by this, I mean hundreds of times).
As the main function runs, I see how my computer's RAM memory fills up during the execution. Furthermore, even after the main function finishes, the RAM memory usage is still much much higher than before the main program execution. Sometimes it can even completely fill the RAM memory.
I tried to delete the heaviest variables and use garbage collector but the net RAM memory usage is always higher. Why this is happening?
I attached a simple (and exaggerated) example of one of my functions and I used memory profiler to see line-by-line the memory usage.
Line # Mem usage Increment Occurrences Line Contents
=============================================================
15 100.926 MiB 100.926 MiB 1 #profile
16 def my_func():
17 108.559 MiB 7.633 MiB 1 a = [1] * (10 ** 6)
18 261.148 MiB 152.590 MiB 1 b = [2] * (2 * 10 ** 7)
19 421.367 MiB 160.219 MiB 1 c = a + b
20 428.609 MiB 7.242 MiB 1 plt.figure(dpi=10000)
21 430.328 MiB 1.719 MiB 1 plt.plot(np.random.rand(1000),np.random.rand(1000))
22 487.738 MiB 57.410 MiB 1 plt.show()
23 487.738 MiB 0.000 MiB 1 plt.close('all')
24 167.297 MiB -320.441 MiB 1 del a,b,c
25 118.922 MiB -48.375 MiB 1 print(gc.collect())
I tried to delete the heaviest variables and use garbage collector but the net RAM memory usage is always higher.
I was trying to replicate the memory usage test here.
Essentially, the post claims that given the following code snippet:
import copy
import memory_profiler
#profile
def function():
x = list(range(1000000)) # allocate a big list
y = copy.deepcopy(x)
del x
return y
if __name__ == "__main__":
function()
Invoking
python -m memory_profiler memory-profile-me.py
prints, on a 64-bit computer
Filename: memory-profile-me.py
Line # Mem usage Increment Line Contents
================================================
4 #profile
5 9.11 MB 0.00 MB def function():
6 40.05 MB 30.94 MB x = list(range(1000000)) # allocate a big list
7 89.73 MB 49.68 MB y = copy.deepcopy(x)
8 82.10 MB -7.63 MB del x
9 82.10 MB 0.00 MB return y
I copied and pasted the same code but my profiler yields
Line # Mem usage Increment Line Contents
================================================
3 44.711 MiB 44.711 MiB #profile
4 def function():
5 83.309 MiB 38.598 MiB x = list(range(1000000)) # allocate a big list
6 90.793 MiB 7.484 MiB y = copy.deepcopy(x)
7 90.793 MiB 0.000 MiB del x
8 90.793 MiB 0.000 MiB return y
This post could be outdated --- either the profiler package or python could have changed. In any case, my questions are, in Python 3.6.x
(1) Should copy.deepcopy(x) (as defined in the code above) consume a nontrivial amount of memory?
(2) Why couldn't I replicate?
(3) If I repeat x = list(range(1000000)) after del x, would the memory increase by the same amount as I first assigned x = list(range(1000000)) (as in line 5 of my code)?
copy.deepcopy() recursively copies mutable object only, immutable objects such as integers or strings are not copied. The list being copied consists of immutable integers, so the y copy ends up sharing references to the same integer values:
>>> import copy
>>> x = list(range(1000000))
>>> y = copy.deepcopy(x)
>>> x[-1] is y[-1]
True
>>> all(xv is yv for xv, yv in zip(x, y))
True
So the copy only needs to create a new list object with 1 million references, an object that takes a little over 8MB of memory on my Python 3.6 build on Mac OS X 10.13 (a 64-bit OS):
>>> import sys
>>> sys.getsizeof(y)
8697464
>>> sys.getsizeof(y) / 2 ** 20 # Mb
8.294548034667969
An empty list object takes 64 bytes, each reference takes 8 bytes:
>>> sys.getsizeof([])
64
>>> sys.getsizeof([None])
72
Python list objects overallocate space to grow, converting a range() object to a list causes it to make a little more space for additional growth than when using deepcopy, so x is slightly larger still, having room for an additional 125k objects before having to resize again:
>>> sys.getsizeof(x)
9000112
>>> sys.getsizeof(x) / 2 ** 20
8.583175659179688
>>> ((sys.getsizeof(x) - 64) // 8) - 10**6
125006
while the copy only has additional space for left for about 87k:
>>> ((sys.getsizeof(y) - 64) // 8) - 10**6
87175
On Python 3.6 I can't replicate the article claims either, in part because Python has seen a lot of memory management improvements, and in part because the article is wrong on several points.
The behaviour of copy.deepcopy() regarding lists and integers has never changed in the long history of the copy.deepcopy() (see the first revision of the module, added in 1995), and the interpretation of the memory figures is wrong, even on Python 2.7.
Specifically, I can reproduce the results using Python 2.7 This is what I see on my machine:
$ python -V
Python 2.7.15
$ python -m memory_profiler memtest.py
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 28.406 MiB 28.406 MiB #profile
5 def function():
6 67.121 MiB 38.715 MiB x = list(range(1000000)) # allocate a big list
7 159.918 MiB 92.797 MiB y = copy.deepcopy(x)
8 159.918 MiB 0.000 MiB del x
9 159.918 MiB 0.000 MiB return y
What is happening is that Python's memory management system is allocating a new chunk of memory for additional expansion. It's not that the new y list object takes nearly 93MiB of memory, that's just the additional memory the OS has allocated to the Python process when that process requested some more memory for the object heap. The list object itself is a lot smaller.
The Python 3 tracemalloc module is a lot more accurate about what actually happens:
python3 -m memory_profiler --backend tracemalloc memtest.py
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 0.001 MiB 0.001 MiB #profile
5 def function():
6 35.280 MiB 35.279 MiB x = list(range(1000000)) # allocate a big list
7 35.281 MiB 0.001 MiB y = copy.deepcopy(x)
8 26.698 MiB -8.583 MiB del x
9 26.698 MiB 0.000 MiB return y
The Python 3.x memory manager and list implementation is smarter than those one in 2.7; evidently the new list object was able to fit into existing already-available memory, pre-allocated when creating x.
We can test Python 2.7's behaviour with a manually built Python 2.7.12 tracemalloc binary and a small patch to memory_profile.py. Now we get more reassuring results on Python 2.7 as well:
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 0.099 MiB 0.099 MiB #profile
5 def function():
6 31.734 MiB 31.635 MiB x = list(range(1000000)) # allocate a big list
7 31.726 MiB -0.008 MiB y = copy.deepcopy(x)
8 23.143 MiB -8.583 MiB del x
9 23.141 MiB -0.002 MiB return y
I note that the author was confused as well:
copy.deepcopy copies both lists, which allocates again ~50 MB (I am not sure where the additional overhead of 50 MB - 31 MB = 19 MB comes from)
(Bold emphasis mine).
The error here is to assume that all memory changes in the Python process size can directly be attributed to specific objects, but the reality is far more complex, as the memory manager can add (and remove!) memory 'arenas', blocks of memory reserved for the heap, as needed and will do so in larger blocks if that makes sense. The process here is complex, as it depends on interactions between Python's manager and the OS malloc implementation details. The author has found an older article on Python's model that they have misunderstood to be current, the author of that article themselves has already tried to point this out; as of Python 2.5 the claim that Python doesn't free memory is no longer true.
What's troubling, is that the same misunderstandings then lead the author to recommend against using pickle, but in reality the module, even on Python 2, never adds more than a little bookkeeping memory to track recursive structures. See this gist for my testing methodology; using cPickle on Python 2.7 adds a one-time 46MiB increase (doubling the create_file() call results in no further memory increase). In Python 3, the memory changes have gone altogether.
I'll open a dialog with the Theano team about the post, the article is wrong, confusing, and Python 2.7 is soon to be made entirely obsolete anyway so they really should focus on Python 3's memory model. (*)
When you create a new list from range(), not a copy, you'll see a similar increase in memory as for creating x the first time, because you'd create a new set of integer objects in addition to the new list object. Aside from a specific set of small integers, Python doesn't cache and re-use integer values for range() operations.
(*) addendum: I opened issue #6619 with the Thano project. The project agreed with my assessment and removed the page from their documentation, although they haven't yet updated the published version.
I am trying to get an understanding of using memory_profiler on my python app.
referring to the Python memory profile guide I copied the following code snippet :-
from memory_profiler import profile
#profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
The expected result according to the link is :-
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
But when i ran my it on my VM running Ubuntu 16.04 I got the following results instead :-
Line # Mem usage Increment Line Contents
================================================
3 35.4 MiB 35.4 MiB #profile
4 def my_func():
5 43.0 MiB 7.7 MiB a = [1] * (10 ** 6)
6 195.7 MiB 152.6 MiB b = [2] * (2 * 10 ** 7)
7 43.1 MiB -152.5 MiB del b
8 43.1 MiB 0.0 MiB return a
There seems to be a huge overhead of around 30MiB difference between the expected and my run. I am trying to get an understanding of where this comes from and if I am doing anything incorrect. Should I be worried about it?
Please advice if anyone have any idea. Thanks
EDIT:
O/S : Ubuntu 16.06.4 (Xenial) running inside a VM
Python : Python 3.6.4 :: Anaconda, Inc.
The memory taken by a list, or an integer heavily depends on the python version/build.
For instance, in python 3, all integers are long integers, whereas in python 2, long is used only when the integer doesn't fit in a CPU register / C int.
On my machine, python 2:
>>> sys.getsizeof(2)
24
Python 3.6.2:
>>> sys.getsizeof(2)
28
When computing your ratios vs the 24/28 ratio, it's pretty close:
>>> 195.7/166.2
1.177496991576414
>>> 28/24
1.1666666666666667
(this is probably not the only difference, but that's the most obvious I can think of)
So no, as long as the results are proportional, you shouldn't worry, but if you have memory issues with python integers (python 3, that is), you could use alternatives, like numpy or other native integer types.
I know that pdb is an interactive system and it is very helpful.
My ultimate goal is to gather all memory states after executing each command in certain function, command by command. For example, with a code snippet
0: def foo() :
1: if True:
2: x=1
3: else:
4: x=2
5: x
then the memory state of each command is
0: empty
1: empty
2: x = 1
3: x = 1
4: (not taken)
5: x = 1
To do this, what I'd like to do with pdb is to write a script that interact with pdb class. I know that s is a function to step forward in statements and print var(in the above case, var is x) is a function to print the value of certain variable. I can gather variables at each command. Then, I want to run a script like below:
import pdb
pdb.run('foo()')
while(not pdb.end()):
pdb.s()
pdb.print('x')
But I cannot find any way how to implement this functionality. Can anybody help me??
Try memory_profiler:
The line-by-line memory usage mode is used much in the same way of the
line_profiler: first decorate the function you would like to profile
with #profile and then run the script with a special script (in this
case with specific arguments to the Python interpreter).
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Or Heapy:
The aim of Heapy is to support debugging and optimization regarding
memory related issues in Python programs.
Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 35144 27 2140412 26 2140412 26 str
1 38397 29 1309020 16 3449432 42 tuple
2 530 0 739856 9 4189288 50 dict (no owner)
I am having trouble with high memory usage when performing ffts with scipy's fftpack. Example obtained with the module memory_profiler:
Line # Mem usage Increment Line Contents
================================================
4 50.555 MiB 0.000 MiB #profile
5 def test():
6 127.012 MiB 76.457 MiB a = np.random.random(int(1e7))
7 432.840 MiB 305.828 MiB b = fftpack.fft(a)
8 891.512 MiB 458.672 MiB c = fftpack.ifft(b)
9 585.742 MiB -305.770 MiB del b, c
10 738.629 MiB 152.887 MiB b = fftpack.fft(a)
11 891.512 MiB 152.883 MiB c = fftpack.ifft(b)
12 509.293 MiB -382.219 MiB del a, b, c
13 547.520 MiB 38.227 MiB a = np.random.random(int(5e6))
14 700.410 MiB 152.891 MiB b = fftpack.fft(a)
15 929.738 MiB 229.328 MiB c = fftpack.ifft(b)
16 738.625 MiB -191.113 MiB del a, b, c
17 784.492 MiB 45.867 MiB a = np.random.random(int(6e6))
18 967.961 MiB 183.469 MiB b = fftpack.fft(a)
19 1243.160 MiB 275.199 MiB c = fftpack.ifft(b)
My attempt at understanding what is going on here:
The amount of memory allocated by both fft and ifft on lines 7 and 8 is more than what they need to allocate to return a result. For the call b = fftpack.fft(a), 305 MiB is allocated. The amount of memory needed for the b array is 16 B/value * 1e7 values = 160 MiB (16 B per value as the code is returning complex128). It seems that fftpack is allocating some type of workspace, and that the workspace is equal in size to the output array (?).
On lines 10 and 11 the same procedure is run again, but the memory usage is less this time, and more in line with what I expect. It therefore seems that fftpack is able to reuse the workspace.
On lines 13-15 and 17-19 ffts with different, smaller input sizes are performed. In both of these cases more memory than what is needed is allocated, and memory does not seem to be reused.
The memory usage reported above agrees with what windows task manager reports (to the accuracy I am able to read those graphs). If I write such a script with larger input sizes, I can make my (windows) computer very slow, indicating that it is swapping.
A second example to illustrate the problem of the memory allocated for workspace:
factor = 4.5
a = np.random.random(int(factor * 3e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")
a = np.random.random(int(factor * 2e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")
The code prints the following:
Elapsed: 17.62
Finished first fft
Elapsed: 38.41
Finished first fft
Filename: ffttest.py
Notice how the second fft, which has the smaller input size, takes more than twice as long to compute. I noticed that my computer was very slow (likely swapping) during the execution of this script.
Questions:
Is it correct that the fft can be calculated inplace, without the need for extra workspace? If so, why does not fftpack do that?
Is there a problem with fftpack here? Even if it needs extra workspace, why does it not reuse its workspace when the fft is rerun with different input sizes?
EDIT:
Old, but possibly related: https://mail.scipy.org/pipermail/scipy-dev/2012-March/017286.html
Is this the answer? https://github.com/scipy/scipy/issues/5986
This is a known issue, and is caused by fftpack caching its strategy for computing the fft for a given size. That cache is about as large as the output of the computation, so if one does large ffts with different input sizes memory the memory consumption can become significant.
The problem is described in detail here:
https://github.com/scipy/scipy/issues/5986
Numpy has a similar problem, which is being worked on:
https://github.com/numpy/numpy/pull/7686