Consider those two codes, I run in the python console:
l=[]
for i in range(0,1000): l.append("."*1000000)
# if you check your taskmanager now, python is using nearly 900MB
del l
# now python3 immediately free-d the memory
Now consider this:
l=[]
for i in range(0,1000): l.append("."*1000000)
l.append(l)
# if you check your taskmanager now, python is using nearly 900MB
del l
# now python3 won't free the memory
Since I am working with those kind of objects, and I need to free them from my memory, I need to know in order to let python recognize it needs to delete the corresponding memory.
PS: I am using Windows7.
Because you've created a circular reference, the memory won't be freed until the garbage collector runs, detects the cycle, and cleans it up. You can trigger that manually:
import gc
gc.collect() # Memory usage will drop once you run this.
The collector will automatically run occasionally, but only if certain conditions related to the number of object allocations/deallocations are met:
gc.set_threshold(threshold0[, threshold1[, threshold2]])
Set the garbage collection thresholds (the collection frequency).
Setting threshold0 to zero disables collection.
The GC classifies objects into three generations depending on how many
collection sweeps they have survived. New objects are placed in the
youngest generation (generation 0). If an object survives a collection
it is moved into the next older generation. Since generation 2 is the
oldest generation, objects in that generation remain there after a
collection. In order to decide when to run, the collector keeps track
of the number object allocations and deallocations since the last
collection. When the number of allocations minus the number of
deallocations exceeds threshold0, collection starts.
So if you continued creating more objects in the interpreter, eventually the garbage collector would kick on by itself. You can make that happen more often by lowering threshold0, or you can just manually call gc.collect when you know you've deleted one of the objects containing a reference cycle.
Related
In Python 3.8 I have a few shared memory values like these:
from multiprocessing.sharedctypes import RawArray, RawValue
...
sm_best_score_gpu_id = RawValue(ctypes.c_double, -1)
sm_positions = RawArray(ctypes.c_int32, genome_positions)
This needs to be reallocated every once in a while. If I just repeat these operations in a loop, will this memory be automatically freed when the original variables are garbage collected?
I look through the ctypes docs but didn't find anything related to free up memory.
How can I free up this memory?
It should be fine. The garbage collection may not occur immediately. On non-CPython interpreters like PyPy, Jython, etc., all collection is non-deterministic. On CPython (the "reference" interpreter, which is, rather appropriately, reference-counted), collection is deterministic when no reference cycles are involved, but if a reference cycle forms (which can happen accidentally in weird cases like raised exceptions containing frames in their tracebacks), and one of these objects is attached to something in the cycle, it won't be cleaned up until the cycle itself is collected, which occurs at an unspecified future time (and might not occur at all if the cycle collector is disabled with gc.disable(), or outstanding objects are frozen with gc.freeze()).
As long as "cleaned eventually" is fine, and you're not interfering with the cyclic garbage collector, cleanup will happen eventually (the underlying type uses a finalizer to ensure the memory is freed back to the shared heap it came from).
At the start of my code, I load in a huge (33GB) pickled object. This object is essentially a huge graph with many interconnected nodes.
Periodically, I run gc.collect(). When I have the huge object loaded in, this takes 100 seconds. When I change my code to not load in the huge object, gc.collect() takes .5 seconds. I assume that this is caused by python checking through every subobject of this object for reference cycles every time I call gc.collect().
I know that neither the huge object, nor any of the objects it references when it is loaded in at the start, will ever need to be garbage collected. How do I tell python this, so that I can avoid the 100s gc time?
In python 3.7 you might be able to hack something using https://docs.python.org/3/library/gc.html#gc.freeze
allocate_a_lot()
gc.freeze() # move all objects to a permanent generation. none will be collected
allocate_some_more()
gc.collect() # collect all non-frozen objects
gc.unfreeze() # return to sanity
This said, I think that python does not offer the tools for what you want. In general all garbage collected languages do not want you to do manual memory management.
I am running some memory-heavy scripts which iterate over documents in a database, and due to memory constraints on the server I manually delete references to the large object at the conclusion of each iteration:
for document in database:
initial_function_calls()
big_object = memory_heavy_operation(document)
save_to_file(big_object)
del big_object
additional_function_calls()
The initial_function_calls() and additional_function_calls() are each slightly memory-heavy. Do I see any benefit by explicitly deleting the reference to the large object for garbage collection? Alternatively, does leaving it and having it point to a new object in the next iteration suffice?
As often in these cases; it depends. :-/
I'm assuming we're talking about CPython here.
Using del or re-assigning a name reduces the reference count for an object. Only if that reference could reaches 0 can it be de-allocated. So if you inadvertently stashed a reference to big_object away somewhere, using del won't help.
When garbage collection is triggered depends on the amount of allocations and de-allocations. See the documentation for gc.set_threshold().
If you're pretty sure that there are no further references, you could use gc.collect() to force a garbage collection run. That might help if your code doesn't do a lot of other allocations.
One thing to keep in mind is that if the big_object is created by a C extension module (like e.g. numpy), it could manage its own memory. In that case the garbage collection won't affect it! Also small integers and small strings are pre-allocated and won't be garbage collected. You can use gc.is_tracked() to check if an object is managed by the garbage collector.
What I would suggest is that you run your program with and without del+gc.collect(), and monitor the amount of RAM used. On UNIX-like systems, look at the resident set size. You could also use sys._debugmallocstats().
Unless you see the resident set size grow and grow, I wouldn't worry about it.
I have this code:
import gc
def hacerciclo():
l=[0]
l[0]=l
recolector=gc.collect()
print("Garbage collector %d" % recolector)
for i in range (10):
hacerciclo()
recolector=gc.collect()
print("Garbage collector %d" % recolector)
This is an example code to the use of gc.collect(). The problem is that the same code shows different outputs in different computers.
One computers show:
Garbage collector 1
Garbage collector 10
others show:
Garbage collector 0
Garbage collector 10
Why this happens?
The current version of Python uses reference counting to keep track of allocated memory. Each object in Python has a reference count which indicates how many objects are pointing to it. When this reference count reaches zero the object is freed. This works well for most programs. However, there is one fundamental flaw with reference counting and it is due to something called reference cycles. The simplest example of a reference cycle is one object that refers to itself. For example:
>>> l = []
>>> l.append(l)
>>> del l
The reference count for the list created is now one. However, since it cannot not be reached from inside Python and cannot possibly be used again, it should be considered garbage. In the current version of Python, this list will never be freed.
Creating reference cycles is usually not good programming practice and can almost always be avoided. However, sometimes it is difficult to avoid creating reference cycles and other times the programmer does not even realize it is happening. For long running programs such as servers this is especially troublesome. People do not want their servers to run out of memory because reference counting failed to free unreachable objects. For large programs it is difficult to find how reference cycles are being created.
Source: http://arctrix.com/nas/python/gc/
The link below has the sample example you are using and it also explains:
http://www.digi.com/wiki/developer/index.php/Python_Garbage_Collection
If my understanding is correct, in CPython objects will be deleted as soon as their reference count reaches zero. If you have reference cycles that become unreachable that logic will not work, but on occasion the interpreter will try to find them and delete them (and you can do this manually by calling gc.collect() ).
My question is, when do these interpreter-triggered cycle collection steps happen? What kind of events trigger them?
I am more interested in the CPython case, but would love to hear how this differs in PyPy or other python implementations.
The GC runs periodically based on the (delta between the) number of allocations and deallocations that have taken place since the last GC run.
See the gc.set_threshold() function:
In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.
You can access the current counts with gc.get_count(); this returns a tuple of the 3 counts GC tracks (the other 2 are to determine when to run deeper scans).
The PyPy garbage collector operates entirely differently, as the GC process in PyPy is responsible for all deallocations, not just cyclic references. Moreover, the PyPy garbage collector is pluggable, meaning that how often it runs depends on what GC option you have picked. The default Minimark strategy doesn't even run at all when below a memory threshold, for example.
See the RPython toolchain Garbage Collector documentation for some details on their strategies, and the Minimark configuration options for more hints on what can be tweaked.
Ditto for Jython or IronPython; these implementations rely on the host runtime (Java and .NET) to handle garbage collection for them.