In a python reference manual said that
A code block is executed in an execution frame. A frame contains some
administrative information (used for debugging) and determines where
and how execution continues after the code block’s execution has
completed.
and
Frame objects represent execution frames. They may occur
in traceback objects
But I don't understanf how frame does work. How can I get an acces to a current frame object? When is frame object creating? Is the frame object created everytime when a code of new block is strarting to execute?
These frames are representations of the stack frames created by function calls. You should not need to access them in normal programming. A new frame is indeed created every time a function is called, and destroyed when it exits or raises an uncaught exception. Since function calls can go many levels deep your program ends up with a bunch of nested stack frames, but it's not good programming practice (unless you are writing a debugger or similar application) to mess about with the frames even though Python does make them available.
It is important to understand frames in Python 3, especially if you maintain production code that is sensitive to memory usage, such as deep learning models.
Frames are objects that represent areas of memory where local variables are stored when a function is called. They can be stored and manipulated, which is what debuggers do. Understanding how frames are handled by python can help you avoid memory leaks.
Each time a function is called, a new frame is created to hold the function's variables and parameters. These frames are normally destroyed when the function finishes executing normally. However, if an exception is raised, the associated frame and all parent frames are stored in a traceback object, which is an attribute of the Exception object (__traceback__). This can cause memory leaks if the Exception object live for a long time, because they will hold onto the frames and all associated local variables are not going to be garbage collected.
This matter quite a lot.
For example it is one reason why you need to call the close method on file objects even if you don't create reference cycles, because the file object may be referenced by a traceback object stored on an Exception. This exclude file from being garbage collected even after it goes out of scope.
The issue is worse in interactive python shells like (Jupyter) where each unhandled exceptions ends up leaving forever in few places. I'm working on a way to clear that up hence I've found this issue.
In Python, the types module provides the FrameType and TracebackType types, which represent frames and tracebacks, respectively. However, you cannot instantiate these types directly. https://docs.python.org/3/library/types.html#types.FrameType
The tracback attribute was introduced in python 3 with PEP 3134, it goes a bit on ramification of this change in details, so it is a good read for curious.
Related
Summary:
Python process is not managing memory as expected, resulting in the process getting killed.
Details:
I'm making an app in python that manages huge image data (hundreds of 32bit 3000x3000 px images). I'm trying to manage the data in the most storage-efficient and memory-efficient way by following the OOP principles, saving the data in optimized formats, loading the data in minimal batches and keeping almost all variables out of the "main" scope.
However, I'm facing a problem that I'm unable to diagnose. After running a method, the memory usage skyrockets from 40% to 80%. This method opens multiple stacks of images in napari, so it is expected to use that much memory (nevertheless, I should optimize it).
The issue arises when exiting this method, as the memory is not freed. This means that running the method twice or performing any other intense work afterwards fills up the memory and makes the program crash. The method runs out of the "main" scope. I've printed the local and global variables from the "main" scope before and after running this method:
Before the issue:
After the issue:
I already tried:
Running gc.collect from the main scope and making sure from the debugger that no napari-related object exists after the execution of the method.
Maybe there is some variable not show by locals().items() or globals().items(), or maybe I simply don't understand how Python allocates memory at all. This is my first time dealing with memory management and garbage collection in Python, so any information will be highly appreciated.
Edit:
I've been playing with objgraph to track the memory leak, and I found that the Garbage Collector is not removing napari-related objects upong closing napari. This means that I should move this question to napari's Issues page, on github. However, it would be highly appreciated if someone knew of a way of cleaning all module-related objects, so I could just dump all the napari leftover trash. The alternative for the moment is just closing and running the script, however, this is far from desired.
Is there a way to access the memory heap in Python? I'm interested in being able to access all of the objects allocated in memory of the running instance.
You can't get direct access, but the gc module should do most of what you want. A simple gc.get_objects() call will return all the objects tracked by the collector. This isn't everything since the CPython garbage collector is only concerned with potential reference cycles (so built-in types that can't refer to other objects, e.g. int, float, str, etc.) won't appear in the resulting list, but they'll all be referenced by something in that list (if they weren't, their reference count would be zero and they'd have been disposed of).
Aside from that, you might get some more targeted use out of the inspect module, especially stack frame inspection, using the traceback module for "easy formatting" or manually digging into the semi-documented frame objects themselves, either of which would allow you to narrow the scope down to a particular active scope on the stack frame.
For the closest to the heap solution, you could use the tracemalloc module to trace and record allocations as they happen, or the pdb debugger to do live introspection from the outside (possibly adding breakpoint() calls to your code to make it stop automatically when you reach that point to let you look around).
At the start of my code, I load in a huge (33GB) pickled object. This object is essentially a huge graph with many interconnected nodes.
Periodically, I run gc.collect(). When I have the huge object loaded in, this takes 100 seconds. When I change my code to not load in the huge object, gc.collect() takes .5 seconds. I assume that this is caused by python checking through every subobject of this object for reference cycles every time I call gc.collect().
I know that neither the huge object, nor any of the objects it references when it is loaded in at the start, will ever need to be garbage collected. How do I tell python this, so that I can avoid the 100s gc time?
In python 3.7 you might be able to hack something using https://docs.python.org/3/library/gc.html#gc.freeze
allocate_a_lot()
gc.freeze() # move all objects to a permanent generation. none will be collected
allocate_some_more()
gc.collect() # collect all non-frozen objects
gc.unfreeze() # return to sanity
This said, I think that python does not offer the tools for what you want. In general all garbage collected languages do not want you to do manual memory management.
The documentation says:
Note Keeping references to frame objects, as found in the first element of the frame records these functions return, can cause your program to create reference cycles. Once a reference cycle has been created, the lifespan of all objects which can be accessed from the objects which form the cycle can become much longer even if Python’s optional cycle detector is enabled. If such cycles must be created, it is important to ensure they are explicitly broken to avoid the delayed destruction of objects and increased memory consumption which occurs.
Though the cycle detector will catch these, destruction of the frames (and local variables) can be made deterministic by removing the cycle in a finally clause. This is also important if the cycle detector was disabled when Python was compiled or using gc.disable(). For example:
def handle_stackframe_without_leak():
frame = inspect.currentframe()
try:
# do something with the frame
finally:
del frame
If you want to keep the frame around (for example to print a traceback later), you can also break reference cycles by using the frame.clear() method.
Which supposedly means that there are two things that have references to each other. What are they exactly?
Can you explain more precisely under which conditions a reference cycle is created? When I do inspect.currentframe() without del frame? Does the same go for inspect.stack()? Any other methods/circumstances?
I'm having trouble understanding the differences between stack frames and execution frames, mostly with respect to the traceback and inspect modules (in Python 3).
I thought they were the same but the docs imply they are not as methods of the inspect module return frame objects whereas methods of the traceback module do not (i.e. inspect.stack() vs traceback.print_stack().
From googling, I understand that a stack frame is a data structure containing subroutine state information (function call and argument data). However, as per the docs, an an execution frame is something similar:
An execution frame contains some administrative information (used for debugging), determines where and how execution continues after the code block's execution has completed, and (perhaps most importantly) defines two namespaces, the local and the global namespace, that affect execution of the code block.
So what exactly is the difference between a stack frame and an execution frame?