When does CPython garbage collect?

When does CPython garbage collect? - python

If my understanding is correct, in CPython objects will be deleted as soon as their reference count reaches zero. If you have reference cycles that become unreachable that logic will not work, but on occasion the interpreter will try to find them and delete them (and you can do this manually by calling gc.collect() ).
My question is, when do these interpreter-triggered cycle collection steps happen? What kind of events trigger them?
I am more interested in the CPython case, but would love to hear how this differs in PyPy or other python implementations.

The GC runs periodically based on the (delta between the) number of allocations and deallocations that have taken place since the last GC run.
See the gc.set_threshold() function:
In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.
You can access the current counts with gc.get_count(); this returns a tuple of the 3 counts GC tracks (the other 2 are to determine when to run deeper scans).
The PyPy garbage collector operates entirely differently, as the GC process in PyPy is responsible for all deallocations, not just cyclic references. Moreover, the PyPy garbage collector is pluggable, meaning that how often it runs depends on what GC option you have picked. The default Minimark strategy doesn't even run at all when below a memory threshold, for example.
See the RPython toolchain Garbage Collector documentation for some details on their strategies, and the Minimark configuration options for more hints on what can be tweaked.
Ditto for Jython or IronPython; these implementations rely on the host runtime (Java and .NET) to handle garbage collection for them.

Related

How to free up multiprocessing.sharedctypes.RawValue and multiprocessing.sharedctypes.RawArray?

In Python 3.8 I have a few shared memory values like these:
from multiprocessing.sharedctypes import RawArray, RawValue
...
sm_best_score_gpu_id = RawValue(ctypes.c_double, -1)
sm_positions = RawArray(ctypes.c_int32, genome_positions)
This needs to be reallocated every once in a while. If I just repeat these operations in a loop, will this memory be automatically freed when the original variables are garbage collected?
I look through the ctypes docs but didn't find anything related to free up memory.
How can I free up this memory?

It should be fine. The garbage collection may not occur immediately. On non-CPython interpreters like PyPy, Jython, etc., all collection is non-deterministic. On CPython (the "reference" interpreter, which is, rather appropriately, reference-counted), collection is deterministic when no reference cycles are involved, but if a reference cycle forms (which can happen accidentally in weird cases like raised exceptions containing frames in their tracebacks), and one of these objects is attached to something in the cycle, it won't be cleaned up until the cycle itself is collected, which occurs at an unspecified future time (and might not occur at all if the cycle collector is disabled with gc.disable(), or outstanding objects are frozen with gc.freeze()).
As long as "cleaned eventually" is fine, and you're not interfering with the cyclic garbage collector, cleanup will happen eventually (the underlying type uses a finalizer to ensure the memory is freed back to the shared heap it came from).

Does this code create a memory leak in python?

Consider the following code for illustration propose:
import mod
f1s = ["A1", "B1", "C1"]
f2s = ["A2", "B2", "C2"]
for f1, f2 in zip(f1s,f2s):
# Creating an object
acumulator = mod.AcumulatorObject()
# Using object
acumulator.append(f1)
acumulator.append(f2)
# Output of object
acumulator.print()
So, I use an instance of a class at the beginning of the for to perform an operation. For each tuple in the for I need to perform the same action, however I can not use the same object because it would add the effect of the last iteration. Therefore, at the beginning of every iteration I create a new instance.
My question is if by doing this a memory leak is created? What action I have to do for each object created? (Delete it maybe? Or by assign the new object to the same name it is cleared?)

tl,dr; no
The reference implementation of Python uses reference counting for garbage collection. There are other implementations that use different GC strategies and this affects the precise time at which __del__ methods are called, which may or may not be reliable or timely in PyPy, Jython or IronPython. These differences are not important unless when you are dealing with resources like file pointers and other expensive system resources.
In cPython the GC will wipe out objects when the referencing count is zero. For example, when you do acumulator = mod.AcumulatorObject() inside a for loop, a new object replaces the old one at the next iteration - and since there are no other variables referencing the old object it will be garbage collected in the next GC pass. The reference implementation cPython will spoil you with things like releasing resources automatically when they go out of scope but YMMV regarding other implementations.
That is why many people commented memory leaks are not of concern in Python.
You have complete control over cPython's garbage collector using the cg module. The default settings are pretty conservative and in 10 years doing Python for a living I never had to fire a GC cycle manually - but I've seen a situation where delaying it helped performance:
Yes, I had previously played with sys.setcheckinterval. I changed it to 1000 (from its default of 100), but it didn't do any measurable difference. Disabling Garbage Collection has helped - thanks. This has been the biggest speedup so far - saving about 20% (171 minutes for the whole run, down to 135 minutes) - I'm not sure what the error bars are on that, but it must be a statistically significant increase.
Just follow best practices like wrapping system resources using with or (try/finally blocks) and you should have no problems.

Memory deallocation in Linked list Python

While deleting the end node from the linked list we just set the link of the node pointing to the end node to "None".Does that mean that the end node is destroyed and memory occupied by it has been released?

You ask: "Does that mean that the end node is destroyed and memory occupied by it has been released?"
With the little information you have given the answer to your question as you posed it is definitely not a plain unqualified "yes".
The simplest example of why "yes" is wrong is that if there is any other reference to that end node, then it can't immediately be released - if that were the case then nothing much would work, would it? However that doesn't mean the node won't ever be regarded as deleteable.
Moreover, even once releasable, that doesn't mean the memory "has been released" - this is implementation-dependent and may well not be deterministic, i.e. you can't necessarily rely on the memory having been immediately released, or predict when (if ever) it is actually released.
The "garbage collector" metaphor is used to refer to recovering unused memory because IRL garbage collection happens every now and then but can't be relied on to happen (or have happened) at a particular time.
What happens to unreferenced data is nothing to do with the language specification, which is another reason why the answer is not a plain "yes". It is completely implementation-dependent. You don't say if you are using cPython or Jython, or some other flavour. You need to refer to the documentation for the implementation you are using. cPython does expose its garbage collector, refer to e.g. https://docs.python.org/2/library/gc.html and https://docs.python.org/3/library/gc.html, and Jython uses the Java garbage collector. You may or may not be able to influence their behaviour, you should refer to the documentation for the interpreter you are using.
The reasons for not necessarily immediately recycling releasable memory are usually to do with performance - why do work which isn't needed? - but if your interpreter does postpone recycling then it will at some point, when based on some criteria resources become limited, make some effort to tidy up - do the garbage collection - and this means that 99.9...% of the time you don't need to concern yourself with the recycling because it is automatically handled (with corresponding overhead cost) once the interpreter implementation considers it necessary.

Yes.
Python has a garbage collector so objects that cannot be reached in any way are automatically destroyed and their memory will be reused for other objects created in the future.

Python3.4 memory usage

Consider those two codes, I run in the python console:
l=[]
for i in range(0,1000): l.append("."*1000000)
# if you check your taskmanager now, python is using nearly 900MB
del l
# now python3 immediately free-d the memory
Now consider this:
l=[]
for i in range(0,1000): l.append("."*1000000)
l.append(l)
# if you check your taskmanager now, python is using nearly 900MB
del l
# now python3 won't free the memory
Since I am working with those kind of objects, and I need to free them from my memory, I need to know in order to let python recognize it needs to delete the corresponding memory.
PS: I am using Windows7.

Because you've created a circular reference, the memory won't be freed until the garbage collector runs, detects the cycle, and cleans it up. You can trigger that manually:
import gc
gc.collect() # Memory usage will drop once you run this.
The collector will automatically run occasionally, but only if certain conditions related to the number of object allocations/deallocations are met:
gc.set_threshold(threshold0[, threshold1[, threshold2]])
Set the garbage collection thresholds (the collection frequency).
Setting threshold0 to zero disables collection.
The GC classifies objects into three generations depending on how many
collection sweeps they have survived. New objects are placed in the
youngest generation (generation 0). If an object survives a collection
it is moved into the next older generation. Since generation 2 is the
oldest generation, objects in that generation remain there after a
collection. In order to decide when to run, the collector keeps track
of the number object allocations and deallocations since the last
collection. When the number of allocations minus the number of
deallocations exceeds threshold0, collection starts.
So if you continued creating more objects in the interpreter, eventually the garbage collector would kick on by itself. You can make that happen more often by lowering threshold0, or you can just manually call gc.collect when you know you've deleted one of the objects containing a reference cycle.

What does cpython do to help detect object cycles(reference counting)?

From what I've read about cpython it seems like it does reference counting + something extra to detect/free objects pointing to each other.(Correct me if I'm wrong). Could someone explain the something extra? Also does this guarantee* no cycle leaking? If not is there any research into an algorithm proven to add to reference counting to make it never leak*? Would this be just running a non reference counting tracing gc every so often?
*discounting bugs and problems with modules using foreign function interface

As explained in the documentation for gc.garbage, there is no guarantee that no leaks occur; specifically, cyclic objects with __del__ methods are not collected by default. For such objects, the cyclic links have to be manually broken to enable further GC.
From what I understand by browsing the CPython sourcecode, the interpreter keeps references to all objects under its control. The "extra" garbage collector runs a mark-and-sweep-like algorithm through the heap, remembers for each object if it is reachable from the "outside" and, if not, deletes it. (The GC is generational, but it may be run explicitly from the gc module with a generation argument.)
The only efficient algorithm that I could think of that satisfies your criteria would indeed be a "full" GC algorithm to augment reference counting and this is what seems to be implemented in Python. I'm not an expert in these matters though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.