Reference cycles when using inspect module - python

The documentation says:
Note Keeping references to frame objects, as found in the first element of the frame records these functions return, can cause your program to create reference cycles. Once a reference cycle has been created, the lifespan of all objects which can be accessed from the objects which form the cycle can become much longer even if Python’s optional cycle detector is enabled. If such cycles must be created, it is important to ensure they are explicitly broken to avoid the delayed destruction of objects and increased memory consumption which occurs.
Though the cycle detector will catch these, destruction of the frames (and local variables) can be made deterministic by removing the cycle in a finally clause. This is also important if the cycle detector was disabled when Python was compiled or using gc.disable(). For example:
def handle_stackframe_without_leak():
frame = inspect.currentframe()
try:
# do something with the frame
finally:
del frame
If you want to keep the frame around (for example to print a traceback later), you can also break reference cycles by using the frame.clear() method.
Which supposedly means that there are two things that have references to each other. What are they exactly?
Can you explain more precisely under which conditions a reference cycle is created? When I do inspect.currentframe() without del frame? Does the same go for inspect.stack()? Any other methods/circumstances?

Related

Accessing the memory heap in python

Is there a way to access the memory heap in Python? I'm interested in being able to access all of the objects allocated in memory of the running instance.
You can't get direct access, but the gc module should do most of what you want. A simple gc.get_objects() call will return all the objects tracked by the collector. This isn't everything since the CPython garbage collector is only concerned with potential reference cycles (so built-in types that can't refer to other objects, e.g. int, float, str, etc.) won't appear in the resulting list, but they'll all be referenced by something in that list (if they weren't, their reference count would be zero and they'd have been disposed of).
Aside from that, you might get some more targeted use out of the inspect module, especially stack frame inspection, using the traceback module for "easy formatting" or manually digging into the semi-documented frame objects themselves, either of which would allow you to narrow the scope down to a particular active scope on the stack frame.
For the closest to the heap solution, you could use the tracemalloc module to trace and record allocations as they happen, or the pdb debugger to do live introspection from the outside (possibly adding breakpoint() calls to your code to make it stop automatically when you reach that point to let you look around).

Does this code create a memory leak in python?

Consider the following code for illustration propose:
import mod
f1s = ["A1", "B1", "C1"]
f2s = ["A2", "B2", "C2"]
for f1, f2 in zip(f1s,f2s):
# Creating an object
acumulator = mod.AcumulatorObject()
# Using object
acumulator.append(f1)
acumulator.append(f2)
# Output of object
acumulator.print()
So, I use an instance of a class at the beginning of the for to perform an operation. For each tuple in the for I need to perform the same action, however I can not use the same object because it would add the effect of the last iteration. Therefore, at the beginning of every iteration I create a new instance.
My question is if by doing this a memory leak is created? What action I have to do for each object created? (Delete it maybe? Or by assign the new object to the same name it is cleared?)
tl,dr; no
The reference implementation of Python uses reference counting for garbage collection. There are other implementations that use different GC strategies and this affects the precise time at which __del__ methods are called, which may or may not be reliable or timely in PyPy, Jython or IronPython. These differences are not important unless when you are dealing with resources like file pointers and other expensive system resources.
In cPython the GC will wipe out objects when the referencing count is zero. For example, when you do acumulator = mod.AcumulatorObject() inside a for loop, a new object replaces the old one at the next iteration - and since there are no other variables referencing the old object it will be garbage collected in the next GC pass. The reference implementation cPython will spoil you with things like releasing resources automatically when they go out of scope but YMMV regarding other implementations.
That is why many people commented memory leaks are not of concern in Python.
You have complete control over cPython's garbage collector using the cg module. The default settings are pretty conservative and in 10 years doing Python for a living I never had to fire a GC cycle manually - but I've seen a situation where delaying it helped performance:
Yes, I had previously played with sys.setcheckinterval. I changed it to 1000 (from its default of 100), but it didn't do any measurable difference. Disabling Garbage Collection has helped - thanks. This has been the biggest speedup so far - saving about 20% (171 minutes for the whole run, down to 135 minutes) - I'm not sure what the error bars are on that, but it must be a statistically significant increase.
Just follow best practices like wrapping system resources using with or (try/finally blocks) and you should have no problems.

how python handle with circle on GC?

I know that python uses reference counting for garbage collection.
Every object that is allocated on the heap has counter that counts the number of object that refer to it, when the counter hits zero, the object is delete.
but how python handle with circle pointer?
if one of then delete the second stay with 1 counter but need to be delete.
The way this is handled is dependent on the python implementation. The reference implementation, the one you're probably using, is sometimes called CPython, because it is written in C.
CPython uses reference counting to clean up object which are obviously no longer used. However, every once in a while, it pauses execution of the program, and begins will the objects directly referenced by variables alive in the program. Then, it follows all references as long as it can, marking which objects have been visited. Once it has followed all references, it finds all the objects which aren't reachable from the main program, and deletes them. This is called tracing garbage collection, of which mark and sweep is a particular implementation.
If you want, and you're sure your program has no circular references, you can turn this feature off to improve performance. If you have circular references, however, you'll accidentally cause memory leaks, so it's usually not worth doing unless you're really worried about performance.

Is it faster to create new instance of class/variable or set existing one?

In Python 2.7, (or in programming languages in general), is it faster to create a new instance of a class/variable or to set an existing one to something new?
For example, which is faster to create another_pic.png? This:
my_img = Image.open(cur_directory_path + '\\my_pic.png') # don't need this anymore
new_img = Image.open(cur_directory_path + '\\another_pic.png') # but need this new pic
or this:
my_img = Image.open(cur_directory_path + '\\my_pic.png') # don't need this anymore
my_img = Image.open(cur_directory_path + '\\another_pic.png') # but need this new pic
I ask because I have one Image variable which I "gets around" so to speak in my code, by constantly being reset to various things, and I am wondering if this affects performance at all.
In both cases, you're creating two completely new objects at the exact same speed, so to that end I don't think either one is faster than the other. You're never really "resetting" an object; you're just reassigning a name. All that's happening is you're changing an existing pointer to a new memory location, which is a fraction of a fraction of a second.
The main difference is that with the bottom option, you have left an unused object for the garbage collector to pick up, but deallocating memory is not a very speed-intensive task. It's possible (depending on the number of free objects you have lying around) that won't even happen before your program ends. But you're also using more memory by keeping two objects lying around. So if you're constantly importing new images, to the degree that it may impact your memory, it's probably best to be resetting the same pointer. Or you could even invoke the garbage collector manually if you're concerned about running out of memory, but it doesn't sound like you are.
They're exactly the same. Both go through the process of importing the image. The variable assignment is only storing a reference to the object. The only difference is that the latter may begin garbage collecting the my_pic.png image sooner since there are no more references to the object.
Technically it is faster to reuse variables as long as they are storing objects of the same type then it is to constantly create a new one. This boils down to addressing in memory and the fact that if you already have a variable (an address in memory is associated with it) then it will be easy to access that slot in memory and update the object located there. The reason that I mention that the object types should be the same is because of how memory is allocated for classes and objects when they are created at run time. As for why creating a new variable to store objects is slower is because it has to find proper space in memory (enough free space for the object) and then assign that address to that variable. This involves accessing address lookup tables and depending on the table configuration would also add time. The thing is the difference is so small that in any normal application you shouldn't notice it.

frame type in python

In a python reference manual said that
A code block is executed in an execution frame. A frame contains some
administrative information (used for debugging) and determines where
and how execution continues after the code block’s execution has
completed.
and
Frame objects represent execution frames. They may occur
in traceback objects
But I don't understanf how frame does work. How can I get an acces to a current frame object? When is frame object creating? Is the frame object created everytime when a code of new block is strarting to execute?
These frames are representations of the stack frames created by function calls. You should not need to access them in normal programming. A new frame is indeed created every time a function is called, and destroyed when it exits or raises an uncaught exception. Since function calls can go many levels deep your program ends up with a bunch of nested stack frames, but it's not good programming practice (unless you are writing a debugger or similar application) to mess about with the frames even though Python does make them available.
It is important to understand frames in Python 3, especially if you maintain production code that is sensitive to memory usage, such as deep learning models.
Frames are objects that represent areas of memory where local variables are stored when a function is called. They can be stored and manipulated, which is what debuggers do. Understanding how frames are handled by python can help you avoid memory leaks.
Each time a function is called, a new frame is created to hold the function's variables and parameters. These frames are normally destroyed when the function finishes executing normally. However, if an exception is raised, the associated frame and all parent frames are stored in a traceback object, which is an attribute of the Exception object (__traceback__). This can cause memory leaks if the Exception object live for a long time, because they will hold onto the frames and all associated local variables are not going to be garbage collected.
This matter quite a lot.
For example it is one reason why you need to call the close method on file objects even if you don't create reference cycles, because the file object may be referenced by a traceback object stored on an Exception. This exclude file from being garbage collected even after it goes out of scope.
The issue is worse in interactive python shells like (Jupyter) where each unhandled exceptions ends up leaving forever in few places. I'm working on a way to clear that up hence I've found this issue.
In Python, the types module provides the FrameType and TracebackType types, which represent frames and tracebacks, respectively. However, you cannot instantiate these types directly. https://docs.python.org/3/library/types.html#types.FrameType
The tracback attribute was introduced in python 3 with PEP 3134, it goes a bit on ramification of this change in details, so it is a good read for curious.

Categories

Resources