Both these functions compute the same thing (the numbers of integers such that the length of the associated Collatz sequence is no greater than n) in essentially the same way. The only difference is that the first one uses sets exclusively whereas the second uses both sets and lists.
The second one leaks memory (in IDLE with Python 3.2, at least), the first one does not, and I have no idea why. I have tried a few "tricks" (such as adding del statements) but nothing seems to help (which is not surprising, those tricks should be useless).
I would be grateful to anybody who could help me understand what goes on.
If you want to test the code, you should probably use a value of n in the 55 to 65 range, anything above 75 will almost certainly result in a (totally expected) memory error.
def disk(n):
"""Uses sets for explored, current and to_explore. Does not leak."""
explored = set()
current = {1}
for i in range(n):
to_explore = set()
for x in current:
if not (x-1) % 3 and ((x-1)//3) % 2 and not ((x-1)//3) in explored:
to_explore.add((x-1)//3)
if not 2*x in explored:
to_explore.add(2*x)
explored.update(current)
current = to_explore
return len(explored)
def disk_2(n):
"""Does exactly the same thing, but Uses a set for explored and lists for
current and to_explore.
Leaks (like a sieve :))
"""
explored = set()
current = [1]
for i in range(n):
to_explore = []
for x in current:
if not (x-1) % 3 and ((x-1)//3) % 2 and not ((x-1)//3) in explored:
to_explore.append((x-1)//3)
if not 2*x in explored:
to_explore.append(2*x)
explored.update(current)
current = to_explore
return len(explored)
EDIT : This also happens when using the interactive mode of the interpreter (without IDLE), but not when running the script directly from a terminal (in that case, memory usage goes back to normal some time after the function has returned, or as soon as there is an explicit call to gc.collect()).
CPython allocates small objects (obmalloc.c, 3.2.3) out of 4 KiB pools that it manages in 256 KiB blocks called arenas. Each active pool has a fixed block size ranging from 8 bytes up to 256 bytes, in steps of 8. For example, a 14-byte object is allocated from the first available pool that has a 16-byte block size.
There's a potential problem if arenas are allocated on the heap instead of using mmap (this is tunable via mallopt's M_MMAP_THRESHOLD), in that the heap cannot shrink below the highest allocated arena, which will not be released so long as 1 block in 1 pool is allocated to an object (CPython doesn't float objects around in memory).
Given the above, the following version of your function should probably solve the problem. Replace the line return len(explored) with the following 3 lines:
result = len(explored)
del i, x, to_explore, current, explored
return result + 0
After deallocating the containers and all referenced objects (releasing arenas back to the system), this returns a new int with the expression result + 0. The heap cannot shrink as long as there's a reference to the first result object. In this case that gets automatically deallocated when the function returns.
If you're testing this interactively without the "plus 0" step, remember that the REPL (Read, Eval, Print, Loop) keeps a reference to the last result accessible via the pseudo-variable "_".
In Python 3.3 this shouldn't be an issue since the object allocator was modified to use anonymous mmap for arenas, where available. (The upper limit on the object allocator was also bumped to 512 bytes to accommodate 64-bit platforms, but that's inconsequential here.)
Regarding manual garbage collection, gc.collect() does a full collection of tracked container objects, but it also clears freelists of objects that are maintained by built-in types (e.g. frames, methods, floats). Python 3.3 added additional API functions to clear freelists used by lists (PyList_ClearFreeList), dicts (PyDict_ClearFreeList), and sets (PySet_ClearFreeList). If you'd prefer to keep the freelists intact, use gc.collect(1).
I doubt it leaks, I bet it is just that garbage collection doesn't kick in yet, so memory used keeps growing. This is because every round of outer loop, the previous current list becomes elgible for garbage collection, but will not be garbage collected until whenever.
Furthermore, even if it is garbage collected, memory isn't normally released back to the OS, so you have to use whatever Python method to get current used heap size.
If you add garbage collection at end of every outer loop iteration, that may reduce memory use a bit, or not, depending on how exactly Python handles its heap and garbage collection without that.
You do not have a memory leak. Processes on linux do not release memory to the OS until they exit. Accordingly, the stats you will see in e.g. top will only ever go up.
You only have a memory leak if after running the same, or smaller size of job, Python grabs more memory from the OS, when it "should" have been able to reuse the memory it was using for objects which "should" have been garbage collected.
Related
I have a class which primarily contains the three dicts:
class KB(object):
def __init__(self):
# key:str value: list of str
linear_patterns = defaultdict(list)
# key:str value: list of str
nonlinear_patterns = defaultdict(list)
# key: str value: dict
pattern_meta_info = {}
...
self.__initialize()
def __initialize(self):
# the 3 dicts are populated
...
The size of the 3 dicts are below:
linear_patterns: 100,000
non_linear_patterns: 900,000
pattern_meta_info: 700,000
After the program is run and done, it takes about 15 seconds to release the memory. When I reduces the number of the dict sizes above by loading less data in initialization, the memory release is faster, so I judge it's due to these dict sizes that cause memory release slower. The total program takes about 8G memory. Also, after the dicts are built, all operations are lookup, no modifications.
Is there a way to use cython to optimize the 3 data structures above, especially in terms of memory usage? Is there a similar cython dictionary that can replaces the python dicts?
It seems unlikely that a different dictionary or object type would change much. Destructor performance is dominated by the memory allocator. That will be roughly the same unless you switch to a different malloc implementation.
If this is only about object destruction at the end of your program, most languages (but not Python) would allow you to use call exit while keeping the KB object alive. The OS will release the memory much quicker when the process terminates. So why bother? Unfortunately that doesn't work with Python's sys.exit() since this merely raises an exception.
Everything else relies on changing the data structure or algorithm. Are your strings highly redundant? Maybe you can reuse string objects by interning them. Keep them in a shared set to use the same string in multiple places. A simple call to string = sys.intern(string) is enough. Unlike in earlier versions of Python, this will not keep the string object alive beyond its use so you don't run the risk of leaking memory in a long-running process.
You could also pool the strings in one large allocation. If access is relatively rare, you could change the class to use one large io.StringIO object for its contained strings and all dictionaries just deal with (offset, length) tuples into that buffer.
That still leaves many tuple and integer objects but those use specialized allocators that may be faster. Also, the length integer will come from the common pool of small integers and not allocate a new object.
A final thought: 8 GB of string data. You sure you don't want a small sqlite or dbm database? Could be a temporary file
I ran this:
import sys
diii = {'key1':1,'key2':2,'key3':1,'key4':2,'key5':1,'key6':2,'key7':1}
print sys.getsizeof(diii)
# output: 1048
diii = {'key1':1,'key2':2,'key3':1,'key4':2,'key5':1,'key6':2,'key7':1,'key8':2}
print sys.getsizeof(diii)
# output: 664
Before asking here, I restarted my python shell and tried it online too and got the same result.
I thought a dictionary with one more element will either give same bytes as output or more, than the one containing one less element.
Any idea what am I doing wrong?
Previous answers have already mentioned that you needn't worry, so I will dive into some more technical details. It's long, but please bear with me.
TLDR: this has to do with arithmetic of resizing. Each resize allocates 2**i memory, where 2**i > requested_size; 2**i >= 8, but then each insert resizes the underlying table further if 2/3 of slots are filled, but this time the new_size = old_size * 4. This way, your first dictionary ends up with 32 cells allocated while the second one with as little as 16 (as it got a bigger initial size upfront).
Answer: As #snakecharmerb noted in the comments this depends on the way that the dictionary is created. For the sake of brevity, let me refer you to this, excellent blog post which explains the differences between the dict() constructor and the dict literal {} on both Python bytecode and CPython implementation levels.
Let's start with the magic number of 8 keys. It turns out to be a constant, predefined for Python's 2.7 implementation in dictobject.h headers file
- the minimal size of the Python dictionary:
/* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4. 8 allows dicts with no more
* than 5 active entries to live in ma_smalltable (and so avoid an
* additional malloc); instrumentation suggested this suffices for the
* majority of dicts (consisting mostly of usually-small instance dicts and
* usually-small dicts created to pass keyword arguments).
*/
#define PyDict_MINSIZE 8
As such, it may differ between the specific Python implementations, but let's assume that we all use the same CPython version. However, the dict of size 8 is expected to neatly contain only 5 elements; don't worry about this, as this specific optimization is not as important for us as it seems.
Now, when you create the dictionary using the dict literal {}, CPython takes a shortcut (as compared to the explicit creation when calling dict constructor). Simplifying a bit the bytecode operation BUILD_MAP gets resolved and it results in calling the _PyDict_NewPresized function which will construct a dictionary for which we already know the size upfront:
/* Create a new dictionary pre-sized to hold an estimated number of elements.
Underestimates are okay because the dictionary will resize as necessary.
Overestimates just mean the dictionary will be more sparse than usual.
*/
PyObject *
_PyDict_NewPresized(Py_ssize_t minused)
{
PyObject *op = PyDict_New();
if (minused>5 && op != NULL && dictresize((PyDictObject *)op, minused) == -1) {
Py_DECREF(op);
return NULL;
}
return op;
}
This function calls the normal dict constructor (PyDict_New) and requests a resize of the newly created dict - but only if it is expected to hold more than 5 elements. This is due to an optimization which allows Python to speed up some things by holding the data in the pre-allocated "smalltable", without invoking expensive memory allocation and de-allocation functions.
Then, the dictresize will try to determine the minimal size of the new dictionary. It will also use the magic number 8 - as the starting point and iteratively multiply by 2 until it finds the minimal size larger than the requested size. For the first dictionary, this is simply 8, however, for the second one (and all dictionaries created by dict literal with less than 15 keys) it is 16.
Now, in the dictresize function there is a special case for the former, smaller new_size == 8, which is meant to bring forward the aforementioned optimization (using the "small table" to reduce memory manipulation operations). However, because there is no need to resize the newly created dict (e.g. no elements were removed so far thus the table is "clean") nothing really happens.
On the contrary, when the new_size != 8, a usual procedure of reallocating the hash table follows. This ends up with a new table being allocated to store the
"big" dictionary. While this is intuitive (the bigger dict got a bigger table), this does not seem to move us forward to the observed behavior yet - but, please bear with me one more moment.
Once we have the pre-allocated dict, STORE_MAP optcodes tell the interpreter to insert consecutive key-value pairs. This is implemented with dict_set_item_by_hash_or_entry function, which - importantly - resizes the dictionary after each increase in size (i.e. successful insertion) if more than 2/3 of the slots are already used up. The size will increase x4 (in our case, for large dicts only by x2).
So here is what happens when you create the dict with 7 elements:
# note 2/3 = 0.(6)
BUILD_MAP # initial_size = 8, filled = 0
STORE_MAP # 'key_1' ratio_filled = 1/8 = 0.125, not resizing
STORE_MAP # 'key_2' ratio_filled = 2/8 = 0.250, not resizing
STORE_MAP # 'key_3' ratio_filled = 3/8 = 0.375, not resizing
STORE_MAP # 'key_4' ratio_filled = 4/8 = 0.500, not resizing
STORE_MAP # 'key_5' ratio_filled = 5/8 = 0.625, not resizing
STORE_MAP # 'key_6' ratio_filled = 6/8 = 0.750, RESIZING! new_size = 8*4 = 32
STORE_MAP # 'key_7' ratio_filled = 7/32 = 0.21875
And you end up with a dict having a total size of 32 elements in the hash table.
However, when adding eight elements the initial size will be twice bigger (16), thus we will never resize as the condition ratio_filled > 2/3 will never be satisfied!
And that's why you end up with a smaller table in the second case.
sys.getsizeof returns the memory allocated to the underlying hash table implementation of those dictionaries, which has a somewhat non-obvious relationship with the actual size of the dictionary.
The CPython implementation of Python 2.7 quadruples the amount of memory allocated to a hash table each time it's filled up to 2/3 of its capacity, but shrinks it if it has over allocated memory to it (i.e. a large contiguous block of memory has been allocated but only a few addresses were actually used).
It just so happens that dictionaries that have between 8 and 11 elements allocate just enough memory for CPython to consider them 'over-allocated', and get shrunk.
You're not doing anything wrong. The size of a dictionary doesn't exactly correspond to the number of elements, as dictionaries are overallocated and dynamically resized once a certain percentage of their memory space is used. I'm not sure what makes the dict smaller in 2.7 (it doesn't in 3) in your example, but you don't have to worry about it. Why are you using 2.7 and why do you want to know the exact memory usage of the dict (which btw doesn't include the memory used by the variables contained in the dictionary, as the dictionary itself is filled with pointers.
Allocation of dict literals is handled here: dictobject.c#L685-L695.
Due to quirks of the implementation, size vs number of elements does not end up being monotonically increasing.
import sys
def getsizeof_dict_literal(n):
pairs = ["{0}:{0}".format(i) for i in range(n)]
dict_literal = "{%s}" % ", ".join(pairs)
source = "sys.getsizeof({})".format(dict_literal)
size = eval(source)
return size
The odd growing-and-shrinking behaviour exhibited is not just a weird once-off accident, it's a regularly repeating occurrence. For the first few thousand results, the visualization looks like this:
In more recent versions of Python, the dict implementation is completely different and allocation details are more sane. See bpo28731 - _PyDict_NewPresized() creates too small dict, for an example of some recent changes. In Python 3.7.3, the visualization now looks like this with smaller dicts in general and a monotonic allocation:
You are actually not doing anything wrong. getsizeof doesn't get the size of the elements inside the dictionary but the rough estimate of the dictionary. The alternative way for this problem would be json.dumps() from the json library. Though it doesn't give you the actual size of the object, it is consistent with the changes you make to the object.
Here's an example
import sys
import json
diii = {'key1':1,'key2':2,'key3':1,'key4':2,'key5':1,'key6':2,'key7':1}
print sys.getsizeof(json.dumps(diii)) # <----
diii = {'key1':1,'key2':2,'key3':1,'key4':2,'key5':1,'key6':2,'key7':1,'key8':2}
print sys.getsizeof(json.dumps(diii)) # <----
json.dumps() changes the dictionary into a json string, then diii can be evaluated as a string.
Read more about python's json library here
Here is my code:
from memory_profiler import profile
#profile
def mess_with_memory():
huge_list = range(20000000)
del huge_list
print "why this kolaveri di?"
This is what the output is, when I ran it from interpreter:
Line # Mem usage Increment Line Contents
3 7.0 MiB 0.0 MiB #profile
4 def mess_with_memory():
5
6 628.5 MiB 621.5 MiB huge_list = range(20000000)
7 476.0 MiB -152.6 MiB del huge_list
8 476.0 MiB 0.0 MiB print "why this kolaveri di"
If you notice the output, creating the huge list consumed 621.5 MB while deleting it just freed up 152.6 MB. When i checked the docs, I found the below statement:
the statement del x removes the binding of x from the namespace referenced by the local scope
So I guess, it didn't delete the object itself, but just unbind it. But, what did it do in unbinding that it freed up so much of space(152.6 MB). Can somebody please take the pain to explain me what is going on here?
Python is a garbage-collected language. If a value isn't "reachable" from your code anymore, it will eventually get deleted.
The del statement, as you saw, removes the binding of your variable. Variables aren't values, they're just names for values.
If that variable was the only reference to the value anywhere, the value will eventually get deleted. In CPython in particular, the garbage collector is built on top of reference counting. So, that "eventually" means "immediately".* In other implementations, it's usually "pretty soon".
If there were other references to the same value, however, just removing one of those references (whether by del x, x = None, exiting the scope where x existed, etc.) doesn't clean anything up.**
There's another issue here. I don't know what the memory_profiler module (presumably this one) actually measures, but the description (talking about use of psutil) sounds like it's measuring your memory usage from "outside".
When Python frees up storage, it doesn't always—or even usually—return it to the operating system. It keeps "free lists" around at multiple levels so it can re-use the memory more quickly than if it had to go all the way back to the OS to ask for more. On modern systems, this is rarely a problem—if you need the storage again, it's good that you had it; if you don't, it'll get paged out as soon as someone else needs it and never get paged back in, so there's little harm.
(On top of that, which I referred to as "the OS" above is really an abstraction made up of multiple levels, from the malloc library through the core C library to the kernel/pager, and at least one of those levels usually has its own free lists.)
If you want to trace memory use from the inside perspective… well, that's pretty hard. It gets a lot easier in Python 3.4 thanks to the new tracemalloc module. There are various third-party modules (e.g., heapy/guppy, Pympler, meliae) that try to get the same kind of information with earlier versions, but it's difficult, because getting information from the various allocators, and tying that information to the garbage collector, was very hard before PEP 445.
* In some cases, there are references to the value… but only from other references that are themselves unreachable, possibly in a cycle. That still counts as "unreachable" as far as the garbage collector is concerned, but not as far as reference counts are concerned. So, CPython also has a "cycle detector" that runs every so often and finds cycles of mutually-reachable but not-reachable-from-anyone-else values and cleans them up.
** If you're testing in the interactive console, there may be hidden references to your values that are hard to track, so you might think you've gotten rid of the last reference when you haven't. In a script, it should always be possible, if not easy, to figure things out. The gc module can help, as can the debugger. But of course both of them also give you new ways to add additional hidden references.
I am working with very large numpy/scipy arrays that take up a huge junk of memory. Suppose my code looks something like the following:
def do_something(a):
a = a / a.sum() #new memory is allocated
#I don't need the original a now anylonger, how to delete it?
#do a lot more stuff
#a = super large numpy array
do_something(a)
print a #still the same as originally (as passed by value)
So I am calling a function with a huge numpy array. The function then processes the array in some way or the other, but the original object is still kept in memory. Is there any way to free the memory inside the function; deleting the reference does not work.
What you want cannot be done; Python will only free the memory when all references to the array object are gone, and you cannot delete the a reference in the calling namespace from the function.
Instead, break up your problem into smaller steps. Do your calculations on a with one function, delete a then, then call another function to do the rest of the work.
Python works with a simple GC algorithm, basically it has a reference counting (it has a generational GC too, but that's not the case), that means that every reference to the object increment a counter, and every object out of scope decrement the scope.
The memory is deallocated only after the counter reach 0.
so while you've a reference to that object, it'll keep on memory.
In your case the caller of do_something still have a reference to the object, if you want that this variable gone you can reduce the scope of that variable.
If you suspect of memory leaks you can set the DEBUG_LEAK flag and inspect the output, more info here: https://docs.python.org/2/library/gc.html
>>>a=6
>>>b=5
>>>c=4
>>>d=c
>>>print(d)
>>>del b
>>># a and b "must be" garbage collection or "maybe" garbage collection
a and b maybe Garbage collection or a and b must be Garbage collection ?
How to prove it?
CPython uses reference counting. Jython and IronPython use their underlying VM's GC. Having said that, CPython interns small integers, including those used in your code, and therefore those specifically will never be GCed.
Python variables are just names referring to objects. In your example you have three objects, the integers 4, 5 and 6.
The integer 6 is referenced by a, 5 is initially referenced by b, and 4 is referenced by both c and d. Then you call del(b). This removes the reference from the integer 5. So at this point, 6 and 4 are still referenced, while 5 is not.
Exactly how garbage collection is handled is an implementation detail.
Now look here:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object
So the numbers you used in this example will never be garbage collected.
As to when garbage is collected is described in the documentation of gc.set_threshold(threshold0, threshold1, threshold2):
In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. Similarly, threshold2 controls the number of collections of generation 1 before collecting generation 2.
The standard values for the thresholds are;
In [2]: gc.get_threshold()
Out[2]: (700, 10, 10)
The way garbage collection works is an implementation detail. See also the question “My class defines __del__ but it is not called when I delete the object” in the Python FAQ.
In CPython, reference counting is default, this implies that objects will be deleted in the moment the last reference gets deleted. So if you have an object called a, which is only referenced in the current scope, del a will delete it completely.
However, CPython also maintains a list of cyclic objects, to deal with the special cases where reference counting fails. You cannot tell when objects which end up in this list will get deleted, but eventually they will.
In other Python implementations, there might be a full garbage collector for all objects, so you should never rely on del a actually deleting the object. This is also why you should always close file descriptors manually using .close() to prevent resources from leaking until the program shuts down.