I am currently working on a jupyter notebook in kaggle. After performing the desired transformations on my numpy array, I pickled it so that it can be stored on disk. The reason I did that is so that I can free up the memory being consumed by the large array.
The memory consumed after pickling the array was about 8.7 gb.
I decided to run this code snippet provided by #jan-glx here , to find out what variables were consuming my memory:
import sys
def sizeof_fmt(num, suffix='B'):
''' by Fred Cirera, https://stackoverflow.com/a/1094933/1870254, modified'''
for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
if abs(num) < 1024.0:
return "%3.1f %s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f %s%s" % (num, 'Yi', suffix)
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in locals().items()),
key= lambda x: -x[1])[:10]:
print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))
After performing this step I noticed that the size of my array was 3.3 gb, and the size of all the other variables summed together was about 0.1 gb.
I decided to delete the array and see if that would fix the problem, by performing the following:
del my_array
gc.collect()
After doing this, the memory consumption decreased from 8.7 gb to 5.4 gb. Which in theory makes sense, but still didn't explain what the rest of the memory was being consumed by.
I decided to continue anyways and reset all my variables to see whether this would free up the memory or not with:
%reset
As expected it freed up the memory of the variables that were printed out in the function above, and I was still left with 5.3 gb of memory in use.
One thing to note is that I noticed a memory spike when pickling the file itself, so a summary of the process would be something like this:
performed operations on array -> memory consumption increased from about 1.9 gb to 5.6 gb
pickled file -> memory consumption increased from 5.6 gb to about 8.7 gb
Memory spikes suddenly while file is being pickled to 15.2 gb then drops back to 8.7 gb.
deleted array -> memory consumption decreased from 8.7 gb to 5.4 gb
performed reset -> memory consumption decreased from 5.4 gb to 5.3 gb
Please note that the above is loosely based of monitoring the memory on kaggle and may be inaccurate.
I have also checked this question but it was not helpful for my case.
Would this be considered a memory leak? If so, what do I do in this case?
EDIT 1:
After some further digging, I noticed that there are others facing this problem. This problem stems from the pickling process, and that pickling creates a copy in memory but, for some reason, does not release it. Is there a way to release the memory after the pickling process is complete.
EDIT 2:
When deleting the pickled file from disk, using:
!rm my_array
It ended up freeing the disk space and freeing up space on memory as well. I don't know whether the above tidbit would be of use or not, but I decided to include it anyways as every bit of info might help.
There is one basic drawback that you should be aware of: The CPython interpreter actually can actually barely free memory and return it to the OS. For most workloads, you can assume that memory is not freed during the lifetime of the interpreter's process. However, the interpreter can re-use the memory internally. So looking at the memory consumption of the CPython process from the operating system's perspective really does not help at all. A rather common work-around is to run memory intensive jobs in a sub-process / worker process (via multiprocessing for instance) and "only" return the result to the main process. Once the worker dies, the memory is actually freed.
Second, using sys.getsizeof on ndarrays can be impressively misleading. Use the ndarray.nbytes property instead and be aware that this may also be misleading when dealing with views.
Besides, I am not entirely sure why you "pickle" numpy arrays. There are better tools for this job. Just to name two: h5py (a classic, based on HDF5) and zarr. Both libraries allow you to work with ndarray-like objects directly on disk (and compression) - essentially eliminating the pickling step. Besides, zarr also allows you to create compressed ndarray-compatible data structures in memory. Must ufuncs from numpy, scipy & friends will happily accept them as input parameters.
Related
Without going into algorithmic details, lets just say that my code sequentially processes a list of inputs:
inputs = [2,5,6,7,8,10,12,13,14,15,16,17,18,19,20,21]
for i in inputs:
process_input(i)
For simplicity, lets consider process_input to be a state-less black-box.
I know that this site is full of questions about finding memory leaks in Python code, but this is not what this question is about. Instead, I'm trying to understand the memory consumption of my code over time and whether it might suffer from leaking memory.
In particular, I'm trying to understand a discrepancy of two distinct indicators of memory usage:
The number of allocated objects (reported by gc.get_objects) and
the actually used amount of physical memory (read from VmRSS on a Linux system).
To study these two indicators, I expanded the original code from above as follows:
import time, gc
def get_current_memory_usage():
with open('/proc/self/status') as f:
memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]
return int(memusage.strip()) / (1024 ** 2)
inputs = [2,5,6,7,8,10,12,13,14,15,16,17,18,19,20,21]
gc.collect()
last_object_count = len(gc.get_objects())
for i in inputs:
print(f'\nProcessing input {i}...')
process_input(i)
gc.collect()
time.sleep(1)
memory_usage = get_current_memory_usage()
object_count = len(gc.get_objects())
print(f'Memory usage: {memory_usage:.2f} GiB')
print(f'Object count: {object_count - last_object_count:+}')
last_object_count = object_count
Note that process_input is state-less, i.e. the order of the inputs does not matter. Thus, we would expect both indicators to be about the same before running process_input and afterwards, right? Indeed, this is what I observe for the number of allocated objects. However, the consumption of memory grows steadily:
Now my core question: Do these observations indicate a memory leak? To my understanding, memory leaking in Python would be indicated by a growth of allocated objects, which we do not observe here. On the other hand, why does the memory consumption grow steadily?
For further investigation, I also ran a second test. For this test, I repeatedly invoked process_input(i) using a fixed input i (five times each) and recorded the memory consumption in between of the iterations:
For i=12, the memory consumption remained constant at 10.91 GiB.
For i=14, the memory consumption remained constant at 7.00 GiB.
I think, these observations make the presence of a memory leak even more unlikely, right? But then, what could be a possible explanation for why the memory consumption is not falling in between of the iterations, given that process_input is state-less?
The system has 32 GiB RAM in total and is running Ubuntu 20.04. Python version is 3.6.10. The process_input function uses several third-party libraries.
In general RSS is not a particularly good indicator because it is "resident" set size and even a rather piggish process, in terms of committed memory, can have a modest RSS as memory can be swapped out. You can look at /proc/self/smaps and add up the size of the writable regions to get a much better benchmark.
On the other hand, if there is actually growth, and you want to understand why, you need to look at the actual dynamically allocated memory. What I'd suggest for this is using https://github.com/vmware/chap
To do this, just make that 1 second sleep a bit longer, put a print just before the call to sleep, and use gcore from another session to gather a live core during a few of those sleeps.
So lets say you have cores gathered from when the input was 14 and when it was 21. Look at each of the cores using chap, for example, with the following commands:
count used
That will give you a good view of allocations that have been requested but not released. If the numbers are much larger for the later core, you probably have some kind of growth issue. If those numbers do differ by quite a lot, use
summarize used
If you have growth, it is possible that there is a leak (as opposed to some container simply expanding). To check this, you can try commands like
count leaked
show leaked
From there you should probably look at the documentation, depending on what you find.
OTOH if used allocations are not the issue, maybe try the following, to see memory for allocations that have been released but are part of larger regions of memory that cannot be given back to the operating system because parts of those regions are still in use:
count free
summarize free
If neither "used" allocations or "free" allocations are the issue, you might try:
summarize writable
That is a very high level view of all writable memory. For example, you can see things like stack usage...
I am reading in a 15Gb .csv file using the read_csv() pandas function including the iterator/chunk functionality because I need a subset of the file of about 20%.
I am doing this in PyCharm where I set the max heap size to 18Gb (although I have 16Gb RAM) and the minimum allocated memory to half of the max heap size 9Gb. Throughout this process Pycharm indicates I am using around 100Mb to 200Mb of RAM, while the Windows Task Manager indicates I am using approximately 2.5Gb of RAM which includes both the PyCharm and Python processes. I have about 45% left of my memory in the task manager.
As far as I can see there is nothing that indicates that I am running out of memory. Still while reading in this data I get a Memory error which tells me:
MemoryError: Unable to allocate array with shape (4, 8193780) and data type float64
Is there someone that can clarify this for me? I would suspect that maybe the final dataframe is larger than my RAM can handle? That would be:
( 4 * 8193780 * 8 (float64) ) / (1024**3) < 1Gb
So the above also does not seem to be the problem, or am I missing something here?
I think you are using 15 Gb of memory just to read your file, since i guess the read_csv() function access the whole file even if you specified the chunk/iterator to use 20% percent of your file, excluding that you are runninig windows and pycharm which needs at least 1 Gb of memory, so adding all the things up then you are out of memory i guess.
But those are someways to face your problem.
Verify the dtype of your array, and try to find the best one for your purpose. For example you are using float64, consider whether float32 or even float16 might be appropriate.
Consider if your computation can be done on a subset of the data. This is called subsampling. Maybe using subsampling you get a good enough model (this may be the case for a clustering algorithm like Kmean).
You may search for out-of-core solutions. This may either be rethinking your algorithm (can you split the work), or trying a solution that does it transparently.
While reading an article on memeory management in python came across few doubts:
import copy
import memory_profiler
#profile
def function():
x = list(range(1000000)) # allocate a big list
y = copy.deepcopy(x)
del x
return y
if __name__ == "__main__":
function()
$:python -m memory_profiler memory_profiler_demo.py
Filename: memory_profiler_demo.py
Line # Mem usage Increment Line Contents
================================================
4 30.074 MiB 30.074 MiB #profile
5 def function():
6 61.441 MiB 31.367 MiB x = list(range(1000000)) # allocate a big list
7 111.664 MiB 50.223 MiB y = copy.deepcopy(x)#doubt 1
8 103.707 MiB -7.957 MiB del x #doubt 2
9 103.707 MiB 0.000 MiB return
so i have the doubts on line 7 why it took more size to copy the list and second doubt on line 8 why it only frees 7 MiB.
First, let's start with why line 8 only frees 7MiB.
Once you allocate a bunch of memory, Python and your OS and/or malloc library both guess that you're likely to allocate a bunch of memory again. On modern platforms, it's a lot faster to reuse that memory in-process than to release it and reallocate it from scratch, while it costs very little to keep extra unused pages of memory in your process's space, so it's usually the right tradeoff. (But of course usually != always, and the blog you linked seems to be in large part about how to work out that you're building an application where it's not the right tradeoff and what to do about it.)
A default build of CPython on Linux virtually never releases any memory. On other POSIX (including Mac) it almost never releases any memory. On Windows, it does release memory more often—but there are still constraints. Basically, if a single allocation from Windows has any piece in use (or even in the middle of a freelist chain), that allocation can't be returned to Windows. So, if you're fragmenting memory (which you usually are), that memory can't be freed. The blog post you linked to explains this to some extent, and there are much better resources than an SO answer to explain further.
If you really do need to allocate a lot of memory for a short time, release it, and never use it again, without holding onto all those pages, there's a common Unix idiom for that—you fork, then do the short-term allocation in the child and exit after passing back the small results in some way. (In Python, that usually means using multiprocessing.Process instead of os.fork directly.)
Now, why does your deepcopy take more memory than the initial construction?
I tested your code on my Mac laptop with python.org builds of 2.7, 3.5, and 3.6. What I found was that the list construction takes around 38MiB (similar to what you're seeing), while the copy takes 42MiB on 2.7, 31MiB on 3.5, and 7MB on 3.6.
Slightly oversimplified, here's the 2.7 behavior: The functions in copy just call the type's constructor on an iterable of the elements (for copy) or recursive copies of them (for deepcopy). For list, this means creating a list with a small starting capacity and then expanding it as it appends. That means you're not just creating a 1M-length array, you're also creating and throwing away arrays of 500K, 250K, etc. all the way down. The sum of all those lengths is equivalent to a 2M-length array. Of course you don't really need the sum of all of them—only the most recent array and the new one are ever live at the same time—but there's no guarantee the old arrays will be freed in a useful way that lets them get reused. (That might explain why I'm seeing about 1.5x the original construction while you're seeing about 2x, but I'd need a lot more investigation to bet anything on that part…)
In 3.5, I believe the biggest difference is that a number of improvements over the 5 years since 2.7 mean that most of those expansions now get done by realloc, even if there is free memory in the pool that could be used instead. That changes a tradeoff that favored 32-bit over 64-bit on modern platforms into one that works the other way round—in 64-bit linux/Mac/Windows: there are often going to be free pages that can be tossed onto the end of an existing large alloc without remapping its address, so most of those reallocs mean no waste.
In 3.6, the huge change is probably #26167. Oversimplifying again, the list type knows how to copy itself by allocating all in one go, and the copy methods now take advantage of that for list and a few other builtin types. Sometimes there's no reallocation at all, and even when there is, it's usually with the special-purpose LIST_APPEND code (which can be used when you can assume nobody outside the current function has access to the list yet) instead of the general-purpose list.append code (which can't).
Consider the following minimal example:
# used memory: Python2=7421 MB, Python3=7440 MB
a = list(range(10**8))
# used memory: Python2=10553 MB, Python3=11317 MB
a = 1
# used memory: Python2=9785 MB, Python3=7454 MB
# ---> why does Python2 need >2GB of RAM here?
# after python process terminates: Python2=7433 MB, Python3=7458 MB
A large object is created which should be garbage collected after the second line. The memory usage has been monitored using free -m (this is not an exact measurement of course).
Python 3 needs more memory (3.7GB instead of 3.05GB) to store the large object, but it does what I expected: memory usage drops after the object is not needed any longer. Python2 seems to delete only 768 MB and keep 2.3GB of memory allocated. Why?
This is repeatable: if the list is created a second time, it will use again 3.05 GB, not more and it will drop again to 2.3GB RAM usage. gc.collect() returns 0 and does not change the amount of used memory.
Please don't tell me to use Python 3 - I know... :)
Some links to documentation which did not answer my question:
https://docs.python.org/2/library/gc.html
https://docs.python.org/2/c-api/memory.html
In the specific case of reclaimed ints on Python 2, the memory is stuck on an unbounded free list and not returned to the OS. The memory reserved for ints is thus proportional to the largest number of ints that have existed simultaneously in the program, not the number of ints that currently exist.
For other cases of memory not returning to the OS, that's probably due to details of the underlying malloc allocator. Most other free lists I can think of in Python are bounded.
I am trying to debug a memory problem with my large Python application. Most of the memory is in numpy arrays managed by Python classes, so Heapy etc. are useless, since they do not account for the memory in the numpy arrays. So I tried to manually track the memory usage using the MacOSX (10.7.5) Activity Monitor (or top if you will). I noticed the following weird behavior. On a normal python interpreter shell (2.7.3):
import numpy as np # 1.7.1
# Activity Monitor: 12.8 MB
a = np.zeros((1000, 1000, 17)) # a "large" array
# 142.5 MB
del a
# 12.8 MB (so far so good, the array got freed)
a = np.zeros((1000, 1000, 16)) # a "small" array
# 134.9 MB
del a
# 134.9 MB (the system didn't get back the memory)
import gc
gc.collect()
# 134.9 MB
No matter what I do, the memory footprint of the Python session will never go below 134.9 MB again. So my question is:
Why are the resources of arrays larger than 1000x1000x17x8 bytes (found empirically on my system) properly given back to the system, while the memory of smaller arrays appears to be stuck with the Python interpreter forever?
This does appear to ratchet up, since in my real-world applications, I end up with over 2 GB of memory I can never get back from the Python interpreter. Is this intended behavior that Python reserves more and more memory depending on usage history? If yes, then Activity Monitor is just as useless as Heapy for my case. Is there anything out there that is not useless?
Reading from Numpy's policy for releasing memory it seems like numpy does not have any special handling of memory allocation/deallocation. It simply calls free() when the reference count goes to zero. In fact it's pretty easy to replicate the issue with any built-in python object. The problem lies at the OS level.
Nathaniel Smith has written an explanation of what is happening in one of his replies in the linked thread:
In general, processes can request memory from the OS, but they cannot
give it back. At the C level, if you call free(), then what actually
happens is that the memory management library in your process makes a
note for itself that that memory is not used, and may return it from a
future malloc(), but from the OS's point of view it is still
"allocated". (And python uses another similar system on top for
malloc()/free(), but this doesn't really change anything.) So the OS
memory usage you see is generally a "high water mark", the maximum
amount of memory that your process ever needed.
The exception is that for large single allocations (e.g. if you create
a multi-megabyte array), a different mechanism is used. Such large
memory allocations can be released back to the OS. So it might
specifically be the non-numpy parts of your program that are producing
the issues you see.
So, it seems like there is no general solution to the problem .Allocating many small objects will lead to a "high memory usage" as profiled by the tools, even thou it will be reused when needed, while allocating big objects wont show big memory usage after deallocation because memory is reclaimed by the OS.
You can verify this allocating built-in python objects:
In [1]: a = [[0] * 100 for _ in range(1000000)]
In [2]: del a
After this code I can see that memory is not reclaimed, while doing:
In [1]: a = [[0] * 10000 for _ in range(10000)]
In [2]: del a
the memory is reclaimed.
To avoid memory problems you should either allocate big arrays and work with them(maybe use views to "simulate" small arrays?), or try to avoid having many small arrays at the same time. If you have some loop that creates small objects you might explicitly deallocate objects not needed at every iteration instead of doing this only at the end.
I believe Python Memory Management gives good insights on how memory is managed in python. Note that, on top of the "OS problem", python adds another layer to manage memory arenas, which can contribute to high memory usage with small objects.