I'm coding a Python script which makes many plots. These plots are called from a main program which calls them recursively (by this, I mean hundreds of times).
As the main function runs, I see how my computer's RAM memory fills up during the execution. Furthermore, even after the main function finishes, the RAM memory usage is still much much higher than before the main program execution. Sometimes it can even completely fill the RAM memory.
I tried to delete the heaviest variables and use garbage collector but the net RAM memory usage is always higher. Why this is happening?
I attached a simple (and exaggerated) example of one of my functions and I used memory profiler to see line-by-line the memory usage.
Line # Mem usage Increment Occurrences Line Contents
=============================================================
15 100.926 MiB 100.926 MiB 1 #profile
16 def my_func():
17 108.559 MiB 7.633 MiB 1 a = [1] * (10 ** 6)
18 261.148 MiB 152.590 MiB 1 b = [2] * (2 * 10 ** 7)
19 421.367 MiB 160.219 MiB 1 c = a + b
20 428.609 MiB 7.242 MiB 1 plt.figure(dpi=10000)
21 430.328 MiB 1.719 MiB 1 plt.plot(np.random.rand(1000),np.random.rand(1000))
22 487.738 MiB 57.410 MiB 1 plt.show()
23 487.738 MiB 0.000 MiB 1 plt.close('all')
24 167.297 MiB -320.441 MiB 1 del a,b,c
25 118.922 MiB -48.375 MiB 1 print(gc.collect())
I tried to delete the heaviest variables and use garbage collector but the net RAM memory usage is always higher.
Related
I have a python list of very different size of texts.
When I try to convert that list to a Numpy array, I got very big spikes in Memory.
I am using np.array(list_of_texts) for the conversion.
Please find below the line per line memory usage that is returned by memory_profiler of an example of list of texts conversion.
from memory_profiler import profile
import numpy as np
Line # Mem usage Increment Line Contents
================================================
4 58.9 MiB 58.9 MiB #profile
5 def f():
23 59.6 MiB 0.3 MiB small_texts = ['a' for i in range(100000)]
24 60.4 MiB 0.8 MiB big_texts = small_texts + [''.join(small_texts)]
26 61.4 MiB 0.0 MiB a = np.array(small_texts)
27 38208.9 MiB 38147.5 MiB b = np.array(big_texts)
I suspect the problem comes from the different size of the texts in the list.
Any idea why this is happening?
How can I keep a reasonable in-ram memory while converting a list of texts to a numpy array?
I was trying to replicate the memory usage test here.
Essentially, the post claims that given the following code snippet:
import copy
import memory_profiler
#profile
def function():
x = list(range(1000000)) # allocate a big list
y = copy.deepcopy(x)
del x
return y
if __name__ == "__main__":
function()
Invoking
python -m memory_profiler memory-profile-me.py
prints, on a 64-bit computer
Filename: memory-profile-me.py
Line # Mem usage Increment Line Contents
================================================
4 #profile
5 9.11 MB 0.00 MB def function():
6 40.05 MB 30.94 MB x = list(range(1000000)) # allocate a big list
7 89.73 MB 49.68 MB y = copy.deepcopy(x)
8 82.10 MB -7.63 MB del x
9 82.10 MB 0.00 MB return y
I copied and pasted the same code but my profiler yields
Line # Mem usage Increment Line Contents
================================================
3 44.711 MiB 44.711 MiB #profile
4 def function():
5 83.309 MiB 38.598 MiB x = list(range(1000000)) # allocate a big list
6 90.793 MiB 7.484 MiB y = copy.deepcopy(x)
7 90.793 MiB 0.000 MiB del x
8 90.793 MiB 0.000 MiB return y
This post could be outdated --- either the profiler package or python could have changed. In any case, my questions are, in Python 3.6.x
(1) Should copy.deepcopy(x) (as defined in the code above) consume a nontrivial amount of memory?
(2) Why couldn't I replicate?
(3) If I repeat x = list(range(1000000)) after del x, would the memory increase by the same amount as I first assigned x = list(range(1000000)) (as in line 5 of my code)?
copy.deepcopy() recursively copies mutable object only, immutable objects such as integers or strings are not copied. The list being copied consists of immutable integers, so the y copy ends up sharing references to the same integer values:
>>> import copy
>>> x = list(range(1000000))
>>> y = copy.deepcopy(x)
>>> x[-1] is y[-1]
True
>>> all(xv is yv for xv, yv in zip(x, y))
True
So the copy only needs to create a new list object with 1 million references, an object that takes a little over 8MB of memory on my Python 3.6 build on Mac OS X 10.13 (a 64-bit OS):
>>> import sys
>>> sys.getsizeof(y)
8697464
>>> sys.getsizeof(y) / 2 ** 20 # Mb
8.294548034667969
An empty list object takes 64 bytes, each reference takes 8 bytes:
>>> sys.getsizeof([])
64
>>> sys.getsizeof([None])
72
Python list objects overallocate space to grow, converting a range() object to a list causes it to make a little more space for additional growth than when using deepcopy, so x is slightly larger still, having room for an additional 125k objects before having to resize again:
>>> sys.getsizeof(x)
9000112
>>> sys.getsizeof(x) / 2 ** 20
8.583175659179688
>>> ((sys.getsizeof(x) - 64) // 8) - 10**6
125006
while the copy only has additional space for left for about 87k:
>>> ((sys.getsizeof(y) - 64) // 8) - 10**6
87175
On Python 3.6 I can't replicate the article claims either, in part because Python has seen a lot of memory management improvements, and in part because the article is wrong on several points.
The behaviour of copy.deepcopy() regarding lists and integers has never changed in the long history of the copy.deepcopy() (see the first revision of the module, added in 1995), and the interpretation of the memory figures is wrong, even on Python 2.7.
Specifically, I can reproduce the results using Python 2.7 This is what I see on my machine:
$ python -V
Python 2.7.15
$ python -m memory_profiler memtest.py
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 28.406 MiB 28.406 MiB #profile
5 def function():
6 67.121 MiB 38.715 MiB x = list(range(1000000)) # allocate a big list
7 159.918 MiB 92.797 MiB y = copy.deepcopy(x)
8 159.918 MiB 0.000 MiB del x
9 159.918 MiB 0.000 MiB return y
What is happening is that Python's memory management system is allocating a new chunk of memory for additional expansion. It's not that the new y list object takes nearly 93MiB of memory, that's just the additional memory the OS has allocated to the Python process when that process requested some more memory for the object heap. The list object itself is a lot smaller.
The Python 3 tracemalloc module is a lot more accurate about what actually happens:
python3 -m memory_profiler --backend tracemalloc memtest.py
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 0.001 MiB 0.001 MiB #profile
5 def function():
6 35.280 MiB 35.279 MiB x = list(range(1000000)) # allocate a big list
7 35.281 MiB 0.001 MiB y = copy.deepcopy(x)
8 26.698 MiB -8.583 MiB del x
9 26.698 MiB 0.000 MiB return y
The Python 3.x memory manager and list implementation is smarter than those one in 2.7; evidently the new list object was able to fit into existing already-available memory, pre-allocated when creating x.
We can test Python 2.7's behaviour with a manually built Python 2.7.12 tracemalloc binary and a small patch to memory_profile.py. Now we get more reassuring results on Python 2.7 as well:
Filename: memtest.py
Line # Mem usage Increment Line Contents
================================================
4 0.099 MiB 0.099 MiB #profile
5 def function():
6 31.734 MiB 31.635 MiB x = list(range(1000000)) # allocate a big list
7 31.726 MiB -0.008 MiB y = copy.deepcopy(x)
8 23.143 MiB -8.583 MiB del x
9 23.141 MiB -0.002 MiB return y
I note that the author was confused as well:
copy.deepcopy copies both lists, which allocates again ~50 MB (I am not sure where the additional overhead of 50 MB - 31 MB = 19 MB comes from)
(Bold emphasis mine).
The error here is to assume that all memory changes in the Python process size can directly be attributed to specific objects, but the reality is far more complex, as the memory manager can add (and remove!) memory 'arenas', blocks of memory reserved for the heap, as needed and will do so in larger blocks if that makes sense. The process here is complex, as it depends on interactions between Python's manager and the OS malloc implementation details. The author has found an older article on Python's model that they have misunderstood to be current, the author of that article themselves has already tried to point this out; as of Python 2.5 the claim that Python doesn't free memory is no longer true.
What's troubling, is that the same misunderstandings then lead the author to recommend against using pickle, but in reality the module, even on Python 2, never adds more than a little bookkeeping memory to track recursive structures. See this gist for my testing methodology; using cPickle on Python 2.7 adds a one-time 46MiB increase (doubling the create_file() call results in no further memory increase). In Python 3, the memory changes have gone altogether.
I'll open a dialog with the Theano team about the post, the article is wrong, confusing, and Python 2.7 is soon to be made entirely obsolete anyway so they really should focus on Python 3's memory model. (*)
When you create a new list from range(), not a copy, you'll see a similar increase in memory as for creating x the first time, because you'd create a new set of integer objects in addition to the new list object. Aside from a specific set of small integers, Python doesn't cache and re-use integer values for range() operations.
(*) addendum: I opened issue #6619 with the Thano project. The project agreed with my assessment and removed the page from their documentation, although they haven't yet updated the published version.
I've noticed a weird thing recently, Tensorflow seems to use too much memory when initializing the variables with constants. Can someone help me understand the example below?
$ python -m memory_profiler test.py
[0 1 2 3 4 5 6 7 8 9]
Filename: test.py
Line # Mem usage Increment Line Contents
================================================
4 144.531 MiB 0.000 MiB #profile
5 def go():
6 907.312 MiB 762.781 MiB a = np.arange(100000000)
7 910.980 MiB 3.668 MiB s = tf.Session()
8 1674.133 MiB 763.152 MiB b = tf.Variable(a)
9 3963.000 MiB 2288.867 MiB s.run(tf.variables_initializer([b]))
10 3963.145 MiB 0.145 MiB print(s.run(b)[:10])
You have 900MB stored in numpy array.
tf.Variable(a) is equivalent to tf.Variable(tf.constant(a)). To create this constant, Python client appends 900MB constant to Graph object in Python runtime
Session.run triggers TF_ExtendGraph which transfers the graph to TensorFlow C runtime, another 900MB
session allocates 900MB for b tf.Variable object in TensorFlow runtime
This makes 3600MB of memory allocations. To save memory you could do something like this instead
a_holder = tf.placeholder(np.float32)
b = tf.Variable(a_holder)
sess.run(b.initializer, feed_dict={a_holder: np.arange(100000000)})
TLDR; avoid creating large constants.
I am having trouble with high memory usage when performing ffts with scipy's fftpack. Example obtained with the module memory_profiler:
Line # Mem usage Increment Line Contents
================================================
4 50.555 MiB 0.000 MiB #profile
5 def test():
6 127.012 MiB 76.457 MiB a = np.random.random(int(1e7))
7 432.840 MiB 305.828 MiB b = fftpack.fft(a)
8 891.512 MiB 458.672 MiB c = fftpack.ifft(b)
9 585.742 MiB -305.770 MiB del b, c
10 738.629 MiB 152.887 MiB b = fftpack.fft(a)
11 891.512 MiB 152.883 MiB c = fftpack.ifft(b)
12 509.293 MiB -382.219 MiB del a, b, c
13 547.520 MiB 38.227 MiB a = np.random.random(int(5e6))
14 700.410 MiB 152.891 MiB b = fftpack.fft(a)
15 929.738 MiB 229.328 MiB c = fftpack.ifft(b)
16 738.625 MiB -191.113 MiB del a, b, c
17 784.492 MiB 45.867 MiB a = np.random.random(int(6e6))
18 967.961 MiB 183.469 MiB b = fftpack.fft(a)
19 1243.160 MiB 275.199 MiB c = fftpack.ifft(b)
My attempt at understanding what is going on here:
The amount of memory allocated by both fft and ifft on lines 7 and 8 is more than what they need to allocate to return a result. For the call b = fftpack.fft(a), 305 MiB is allocated. The amount of memory needed for the b array is 16 B/value * 1e7 values = 160 MiB (16 B per value as the code is returning complex128). It seems that fftpack is allocating some type of workspace, and that the workspace is equal in size to the output array (?).
On lines 10 and 11 the same procedure is run again, but the memory usage is less this time, and more in line with what I expect. It therefore seems that fftpack is able to reuse the workspace.
On lines 13-15 and 17-19 ffts with different, smaller input sizes are performed. In both of these cases more memory than what is needed is allocated, and memory does not seem to be reused.
The memory usage reported above agrees with what windows task manager reports (to the accuracy I am able to read those graphs). If I write such a script with larger input sizes, I can make my (windows) computer very slow, indicating that it is swapping.
A second example to illustrate the problem of the memory allocated for workspace:
factor = 4.5
a = np.random.random(int(factor * 3e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")
a = np.random.random(int(factor * 2e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")
The code prints the following:
Elapsed: 17.62
Finished first fft
Elapsed: 38.41
Finished first fft
Filename: ffttest.py
Notice how the second fft, which has the smaller input size, takes more than twice as long to compute. I noticed that my computer was very slow (likely swapping) during the execution of this script.
Questions:
Is it correct that the fft can be calculated inplace, without the need for extra workspace? If so, why does not fftpack do that?
Is there a problem with fftpack here? Even if it needs extra workspace, why does it not reuse its workspace when the fft is rerun with different input sizes?
EDIT:
Old, but possibly related: https://mail.scipy.org/pipermail/scipy-dev/2012-March/017286.html
Is this the answer? https://github.com/scipy/scipy/issues/5986
This is a known issue, and is caused by fftpack caching its strategy for computing the fft for a given size. That cache is about as large as the output of the computation, so if one does large ffts with different input sizes memory the memory consumption can become significant.
The problem is described in detail here:
https://github.com/scipy/scipy/issues/5986
Numpy has a similar problem, which is being worked on:
https://github.com/numpy/numpy/pull/7686
Our game program will initialize the data of all players into the memory. My purpose is to reduce the memory which is not necessary. I traced the program and found that "for" taking a lot of memory.
For example:
Line # Mem usage Increment Line Contents
================================================
52 #profile
53 11.691 MB 0.000 MB def test():
54 19.336 MB 7.645 MB a = ["1"] * (10 ** 6)
55 19.359 MB 0.023 MB print recipe.total_size(a, verbose=False)
56 82.016 MB 62.656 MB for i in a:
57 pass
print recipe.total_size(a, verbose=False):8000098 bytes
The question is How can i release that 62.656 MB memory.
P.S.
Sorry, i know my English is not very well.I will appreciate everyone to read this.:-)
If you are absolutely desperate to reduce memory usage on the loop you can do it this way:
i = 0
while 1:
try:
a[i] #accessing an element here
i += 1
except IndexError:
break
Memory stats (if they are accurate):
12 9.215 MB 0.000 MB i = 0
13 9.215 MB 0.000 MB while 1:
14 60.484 MB 51.270 MB try:
15 60.484 MB 0.000 MB a[i]
16 60.484 MB 0.000 MB i += 1
17 60.484 MB 0.000 MB except IndexError:
18 60.484 MB 0.000 MB break
However, this code looks ugly and danger and reduction in memory usage is just tiny.
1) Instead of list iterator. You should use generator. according to your sample code:
#profile
def test():
a = ("1" for i in range(10**6)) #this will return a generator, instead of a list.
for i in a:
pass
Now if you use the generator 'a' in the for loop, it won't take that much memory.
2) If you are getting a list, then first convert it into generator.
#profile
def test():
a = ["1"] * (10**6) #getting list
g = (i for i in a) #converting list into a generator object
for i in g: #use generator object for iteration
pass
Try this. If is helps you.