I have a python script which is processing a large amount of data from compressed ASCII. After a short period, it runs out of memory. I am not constructing large lists or dicts. The following code illustrates the issue:
import struct
import zlib
import binascii
import numpy as np
import psutil
import os
import gc
process = psutil.Process(os.getpid())
n = 1000000
compressed_data = binascii.b2a_base64(bytearray(zlib.compress(struct.pack('%dB' % n, *np.random.random(n))))).rstrip()
print 'Memory before entering the loop is %d MB' % (process.get_memory_info()[0] / float(2 ** 20))
for i in xrange(2):
print 'Memory before iteration %d is %d MB' % (i, process.get_memory_info()[0] / float(2 ** 20))
byte_array = zlib.decompress(binascii.a2b_base64(compressed_data))
a = np.array(struct.unpack('%dB' % (len(byte_array)), byte_array))
gc.collect()
gc.collect()
print 'Memory after last iteration is %d MB' % (process.get_memory_info()[0] / float(2 ** 20))
It prints:
Memory before entering the loop is 45 MB
Memory before iteration 0 is 45 MB
Memory before iteration 1 is 51 MB
Memory after last iteration is 51 MB
Between the first and second iteration, 6 MB of memory get created. If i run the loop more than two times, the memory usage stays at 51 MB. If I put the code to decompress into its own function and feed it the actual compressed data, the memory usage will continue to grow. I am using Python 2.7. Why is the memory increasing and how can it be corrected? Thank you.
Through comments, we figured out what was going on:
The main issue is that variables declared in a for loop are not destroyed once the loop ends. They remain accessible, pointing to the value they received in the last iteration:
>>> for i in range(5):
... a=i
...
>>> print a
4
So here's what's happening:
First iteration: The print is showing 45MB, which the memory before instantiating byte_array and a.
The code instantiates those two lengthy variables, making the memory go to 51MB
Second iteration: The two variables instantiated in the first run of the loop are still there.
In the middle of the second iteration, byte_array and a are overwritten by the new instantiation. The initial ones are destroyed, but substituted by equally lengthy variables.
The for loop ends, but byte_array and a are still accessible in the code, therefore, not destroyed by the second gc.collect() call.
Changing the code to:
for i in xrange(2):
[ . . . ]
byte_array = None
a = None
gc.collect()
made the memory resreved by byte_array and a unaccessible, and therefore, freed.
There's more on Python's garbage collection in this SO answer: https://stackoverflow.com/a/4484312/289011
Also, it may be worth looking at How do I determine the size of an object in Python?. This is tricky, though... if your object is a list pointing to other objects, what is the size? The sum of the pointers in the list? The sum of the size of the objects those pointers point to?
Related
Is it possible to share gmpy2 multiprecision integers (https://pypi.python.org/pypi/gmpy2) between processes (created by multiprocessing) without creating copies in memory?
Each integer has about 750,000 bits. The integers are not modified by the processes.
Thank you.
Update: Tested code is below.
I would try the following untested approach:
Create a memory mapped file using Python's mmap library.
Use gmpy2.to_binary() to convert a gmpy2.mpz instance into binary string.
Write both the length of the binary string and binary string itself into the memory mapped file. To allow for random access, you should begin every write at a multiple of a fixed value, say 94000 in your case.
Populate the memory mapped file with all your values.
Then in each process, use gmpy2.from_binary() to read the data from the memory mapped file.
You need to read both the length of the binary string and binary string itself. You should be able to pass a slice from the memory mapped file directly to gmpy2.from_binary().
I may be simpler to create a list of (start, end) values for the position of each byte string in the memory mapped file and then pass that list to each process.
Update: Here is some sample code that has been tested on Linux with Python 3.4.
import mmap
import struct
import multiprocessing as mp
import gmpy2
# Number of mpz integers to place in the memory buffer.
z_count = 40000
# Maximum number of bits in each integer.
z_bits = 750000
# Total number of bytes used to store each integer.
# Size is rounded up to a multiple of 4.
z_size = 4 + (((z_bits + 31) // 32) * 4)
def f(instance):
global mm
s = 0
for i in range(z_count):
mm.seek(i * z_size)
t = struct.unpack('i', mm.read(4))[0]
z = gmpy2.from_binary(mm.read(t))
s += z
print(instance, z % 123456789)
def main():
global mm
mm = mmap.mmap(-1, z_count * z_size)
rs = gmpy2.random_state(42)
for i in range(z_count):
z = gmpy2.mpz_urandomb(rs, z_bits)
b = gmpy2.to_binary(z)
mm.seek(i * z_size)
mm.write(struct.pack('i', len(b)))
mm.write(b)
ctx = mp.get_context('fork')
pool = ctx.Pool(4)
pool.map_async(f, range(4))
pool.close()
pool.join()
if __name__ == '__main__':
main()
I have a python3 script that operates with numpy.memmap arrays. It writes an array to newly generated temporary file that is located in /tmp:
import numpy, tempfile
size = 2 ** 37 * 10
tmp = tempfile.NamedTemporaryFile('w+')
array = numpy.memmap(tmp.name, dtype = 'i8', mode = 'w+', shape = size)
array[0] = 666
array[size-1] = 777
del array
array2 = numpy.memmap(tmp.name, dtype = 'i8', mode = 'r+', shape = size)
print('File: {}. Array size: {}. First cell value: {}. Last cell value: {}'.\
format(tmp.name, len(array2), array2[0], array2[size-1]))
while True:
pass
The size of the HDD is only 250G. Nevertheless, it can somehow generate 10T large files in /tmp, and the corresponding array still seems to be accessible. The output of the script is following:
File: /tmp/tmptjfwy8nr. Array size: 1374389534720. First cell value: 666. Last cell value: 777
The file really exists and is displayed as being 10T large:
$ ls -l /tmp/tmptjfwy8nr
-rw------- 1 user user 10995116277760 Dec 1 15:50 /tmp/tmptjfwy8nr
However, the whole size of /tmp is much smaller:
$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 235G 5.3G 218G 3% /
The process also is pretending to use 10T virtual memory, which is also not possible. The output of top command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31622 user 20 0 10.000t 16592 4600 R 100.0 0.0 0:45.63 python3
As far as I understand, this means that during the call of numpy.memmap the needed memory for the whole array is not allocated and therefore displayed file size is bogus. This in turn means that when I start to gradually fill the whole array with my data, at some point my program will crash or my data will be corrupted.
Indeed, if I introduce the following in my code:
for i in range(size):
array[i] = i
I get the error after a while:
Bus error (core dumped)
Therefore, the question: how to check at the beginning, if there is really enough memory for the data and then indeed reserve the space for the whole array?
There's nothing 'bogus' about the fact that you are generating 10 TB files
You are asking for arrays of size
2 ** 37 * 10 = 1374389534720 elements
A dtype of 'i8' means an 8 byte (64 bit) integer, therefore your final array will have a size of
1374389534720 * 8 = 10995116277760 bytes
or
10995116277760 / 1E12 = 10.99511627776 TB
If you only have 250 GB of free disk space then how are you able to create a "10 TB" file?
Assuming that you are using a reasonably modern filesystem, your OS will be capable of generating almost arbitrarily large sparse files, regardless of whether or not you actually have enough physical disk space to back them.
For example, on my Linux machine I'm allowed to do something like this:
# I only have about 50GB of free space...
~$ df -h /
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb1 ext4 459G 383G 53G 88% /
~$ dd if=/dev/zero of=sparsefile bs=1 count=0 seek=10T
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000236933 s, 0.0 kB/s
# ...but I can still generate a sparse file that reports its size as 10 TB
~$ ls -lah sparsefile
-rw-rw-r-- 1 alistair alistair 10T Dec 1 21:17 sparsefile
# however, this file uses zero bytes of "actual" disk space
~$ du -h sparsefile
0 sparsefile
Try calling du -h on your np.memmap file after it has been initialized to see how much actual disk space it uses.
As you start actually writing data to your np.memmap file, everything will be OK until you exceed the physical capacity of your storage, at which point the process will terminate with a Bus error. This means that if you needed to write < 250GB of data to your np.memmap array then there might be no problem (in practice this would probably also depend on where you are writing within the array, and on whether it is row or column major).
How is it possible for a process to use 10 TB of virtual memory?
When you create a memory map, the kernel allocates a new block of addresses within the virtual address space of the calling process and maps them to a file on your disk. The amount of virtual memory that your Python process is using will therefore increase by the size of the file that has just been created. Since the file can also be sparse, then not only can the virtual memory exceed the total amount of RAM available, but it can also exceed the total physical disk space on your machine.
How can you check whether you have enough disk space to store the full np.memmap array?
I'm assuming that you want to do this programmatically in Python.
Get the amount of free disk space available. There are various methods given in the answers to this previous SO question. One option is os.statvfs:
import os
def get_free_bytes(path='/'):
st = os.statvfs(path)
return st.f_bavail * st.f_bsize
print(get_free_bytes())
# 56224485376
Work out the size of your array in bytes:
import numpy as np
def check_asize_bytes(shape, dtype):
return np.prod(shape) * np.dtype(dtype).itemsize
print(check_asize_bytes((2 ** 37 * 10,), 'i8'))
# 10995116277760
Check whether 2. > 1.
Update: Is there a 'safe' way to allocate an np.memmap file, which guarantees that sufficient disk space is reserved to store the full array?
One possibility might be to use fallocate to pre-allocate the disk space, e.g.:
~$ fallocate -l 1G bigfile
~$ du -h bigfile
1.1G bigfile
You could call this from Python, for example using subprocess.check_call:
import subprocess
def fallocate(fname, length):
return subprocess.check_call(['fallocate', '-l', str(length), fname])
def safe_memmap_alloc(fname, dtype, shape, *args, **kwargs):
nbytes = np.prod(shape) * np.dtype(dtype).itemsize
fallocate(fname, nbytes)
return np.memmap(fname, dtype, *args, shape=shape, **kwargs)
mmap = safe_memmap_alloc('test.mmap', np.int64, (1024, 1024))
print(mmap.nbytes / 1E6)
# 8.388608
print(subprocess.check_output(['du', '-h', 'test.mmap']))
# 8.0M test.mmap
I'm not aware of a platform-independent way to do this using the standard library, but there is a fallocate Python module on PyPI that should work for any Posix-based OS.
Based on the answer of #ali_m I finally came to this solution:
# must be called with the argumant marking array size in GB
import sys, numpy, tempfile, subprocess
size = (2 ** 27) * int(sys.argv[1])
tmp_primary = tempfile.NamedTemporaryFile('w+')
array = numpy.memmap(tmp_primary.name, dtype = 'i8', mode = 'w+', shape = size)
tmp = tempfile.NamedTemporaryFile('w+')
check = subprocess.Popen(['cp', '--sparse=never', tmp_primary.name, tmp.name])
stdout, stderr = check.communicate()
if stderr:
sys.stderr.write(stderr.decode('utf-8'))
sys.exit(1)
del array
tmp_primary.close()
array = numpy.memmap(tmp.name, dtype = 'i8', mode = 'r+', shape = size)
array[0] = 666
array[size-1] = 777
print('File: {}. Array size: {}. First cell value: {}. Last cell value: {}'.\
format(tmp.name, len(array), array[0], array[size-1]))
while True:
pass
The idea is to copy initially generated sparse file to a new normal one. For this cp with the option --sparse=never is employed.
When the script is called with a manageable size parameter (say, 1 GB) the array is getting mapped to a non-sparse file. This is confirmed by the output of du -h command, which now shows ~1 GB size. If the memory is not enough, the scripts exits with the error:
cp: ‘/tmp/tmps_thxud2’: write failed: No space left on device
Please excuse this naive question of mine. I am trying to monitor memory usage of my python code, and have come across the promising memory_profiler package. I have a question about interpreting the output generated by #profile decorator.
Here is a sample output that I get by running my dummy code below:
dummy.py
from memory_profiler import profile
#profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
Calling dummy.py by "python dummy.py" returns the table below.
Line # Mem usage Increment Line Contents
3 8.2 MiB 0.0 MiB #profile
4 def my_func():
5 15.8 MiB 7.6 MiB a = [1] * (10 ** 6)
6 168.4 MiB 152.6 MiB b = [2] * (2 * 10 ** 7)
7 15.8 MiB -152.6 MiB del b
8 15.8 MiB 0.0 MiB return a
My question is what does the 8.2 MiB in the first line of the table correspond to. My guess is that it is the initial memory usage by the python interpreter itself; but I am not sure. If that is the case, is there a way to have this baseline usage automatically subtracted from the memory usage of the script?
Many thanks for your time and consideration!
Noushin
According to the docs:
The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one.
So, that 8.2 MiB is the memory usage after the first line has been executed. That includes the memory needed to start up Python, load your script and all of its imports (including memory_profiler itself), and so on.
There don't appear to be any documented options for removing that from each entry. But it wouldn't be too hard to post-process the results.
Alternatively, do you really need to do that? The third column shows how much additional memory has been used after each line, and either that, or the sum of that across a range of lines, seems more interesting than the difference between each line's second column and the start.
The difference in memory between lines is given in the second column or you could write a small script to process the output.
I'm loading large h5 files into memory using numpy ndarray's. I read that my system (Win 7 prof., 6 GB RAM) is supposed to allow python.exe to use about 2 GB of physical memory.
However I'm getting a MemoryError already just shy of 1 GB. Even stranger this lower limit seems to only apply for numpy array's but not for a list.
I've tested my memory consumption using the following function found here:
import psutil
import gc
import os
import numpy as np
from matplotlib.pyplot import pause
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.get_memory_info()[0]/float(2**20)
return mem
Test 1: Testing memory limits for an ordinary list
print 'Memory - %d MB' %memory_usage_psutil() # prints memory usage after imports
a = []
while 1:
try:
a.append([x*2000 for x in xrange(10000)])
except MemoryError:
print 'Memory - %d MB' %memory_usage_psutil()
a = []
print 'Memory - %d MB' %memory_usage_psutil()
print 'run garbage collector: collected %d objects.' %gc.collect()
print 'Memory - %d MB\n\n' %memory_usage_psutil()
break
Test 1 prints:
Memory - 39 MB
Memory - 1947 MB
Memory - 1516 MB
run garbage collector: collected 0 objects.
Memory - 49 MB
Test 2: Creating a number of large np.array's
shape = (5500,5500)
names = ['b', 'c', 'd', 'g', 'h']
try:
for n in names:
globals()[n] = np.ones(shape, dtype='float64')
print 'created variable %s with %0.2f MB'\
%(n,(globals()[n].nbytes/2.**20))
except MemoryError:
print 'MemoryError, Memory - %d MB. Deleting files..'\
%memory_usage_psutil()
pause(2)
# Just added the pause here to be able to observe
# the spike of memory in the Windows task manager.
for n in names:
globals()[n] = []
print 'Memory - %d MB' %memory_usage_psutil()
print 'run garbage collector: collected %d objects.' %gc.collect()
print 'Memory - %d MB' %memory_usage_psutil()
Test 2 prints:
Memory - 39 MB
created variable b with 230.79 MB
created variable c with 230.79 MB
created variable d with 230.79 MB
created variable g with 230.79 MB
MemoryError, Memory - 964 MB. Deleting files..
Memory - 39 MB
run garbage collector: collected 0 objects.
Memory - 39 MB
My question: Why do I get a MemoryError before I'm even close to the 2GB limit and why is there a difference in memory limits for a list and np.array respectively or what am I missing?
I'm using python 2.7 and numpy 1.7.1
This is probably happening because numpy array is using some C array library (for speed), that is somewhere calling a malloc. This then fails because it cannot allocate a contiguous 1GB of memory. I am further guessing that Python lists are implemented as a linked list, thus the memory needed for a list need not be contiguous. Hence, if you have enough memory available but it is fragmented, your array malloc would fail but your linked list would allow you to use all of the noncontiguous pieces.
I was playing around with the memory_profiler package (downloaded from pip), more specifically, looking at the memory efficiency of looping through a list by creating a temporary list first vs. looping through an "iterator list".
This was a problem that I encountered a while back and I wanted to benchmark my solution. The problem was that I needed to compare each element in a list with the next element in the same list, until all elements had been "dealt with". So I guess this would be an O(n^2) solution (if the most naive solution is picked, for each element in list, loop through list).
Anyways, the three functions below are all doing the same thing (more or less); looping over a list that is zipped with itself-offset-by-one.
import cProfile
#profile
def zips():
li = range(1,20000000)
for tup in zip(li,li[1:]):
pass
del li
#profile
def izips():
from itertools import izip
li = range(1,20000000)
for tup in izip(li,li[1:]):
pass
del li
#profile
def izips2():
from itertools import izip
li = range(1,20000000)
for tup in izip(li,li[1:]):
del tup
del li
if __name__ == '__main__':
zips()
# izips()
# izips2()
The surprising part (to me) was in the memory usage, first I run the zips() function, and although I thought I did clean up, I still ended up with ~1.5 GB in memory:
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
10 #profile
11 27.730 MB 0.000 MB def zips():
12 649.301 MB 621.570 MB li = range(1,20000000)
13 3257.605 MB 2608.305 MB for tup in zip(li,li[1:]):
14 1702.504 MB -1555.102 MB pass
15 1549.914 MB -152.590 MB del li
Then I close the interpreter instance and reopen it for running the next test, which is the izips() function:
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
17 #profile
18 27.449 MB 0.000 MB def izips():
19 27.449 MB 0.000 MB from itertools import izip
20 649.051 MB 621.602 MB li = range(1,20000000)
21 1899.512 MB 1250.461 MB for tup in izip(li,li[1:]):
22 1746.922 MB -152.590 MB pass
23 1594.332 MB -152.590 MB del li
And then finally I ran a test (again after restarting the interpreter in between) where I tried to explicitly delete the tuple in the for-loop to try to make sure that its memory would be freed (maybe I'm not thinking that correctly?). Turns out that didn't make a difference so I'm guessing that either I'm not prompting GC or that is not the source of my memory overhead.
ipython -m memory_profiler python_profiling.py
Filename: python_profiling.py
Line # Mem usage Increment Line Contents
================================================
25 #profile
26 20.109 MB 0.000 MB def izips2():
27 20.109 MB 0.000 MB from itertools import izip
28 641.676 MB 621.566 MB li = range(1,20000000)
29 1816.953 MB 1175.277 MB for tup in izip(li,li[1:]):
30 1664.387 MB -152.566 MB del tup
31 1511.797 MB -152.590 MB del li
Bottom line:
I thought that the overhead of the for loop itself was minimal, and therefore, I was expecting just a little bit more than ~620.000 MB (the memory it takes to store the list) but instead it looks like I have ~2 lists of size 20.000.000 in memory + even more overhead. Can anyone help me explain what all this memory is being used for?? (and what is taking up that ~1.5 GB at the end of each run?)
Note that the OS assigns memory in chunks, and doesn't necessarily reclaim it all in one go. I've found the memory profiling package to be wildly inaccurate because it appears it fails to take that into account.
Your li[1:] slice creates a new list with (2*10**7) - 1 elements, nearly a whole new copy, easily doubling the memory space required for the lists. The zip() call also returns a full new list object, the output of the zipping action, again requiring memory for the intermediary result, plus 20 million 2-element tuples.
You could use a new iterator instead of slicing:
def zips():
from itertools import izip
li = range(1,20000000)
next_li = iter(li)
next(next_li) # advance one step
for tup in izip(li, next_li):
pass
del li
The list iterator returned from the iter() call is much more light-weight; it only keeps a reference to the original list and a pointer. Combining this with izip() avoids creating the output list as well.