I'm having trouble understanding the differences between stack frames and execution frames, mostly with respect to the traceback and inspect modules (in Python 3).
I thought they were the same but the docs imply they are not as methods of the inspect module return frame objects whereas methods of the traceback module do not (i.e. inspect.stack() vs traceback.print_stack().
From googling, I understand that a stack frame is a data structure containing subroutine state information (function call and argument data). However, as per the docs, an an execution frame is something similar:
An execution frame contains some administrative information (used for debugging), determines where and how execution continues after the code block's execution has completed, and (perhaps most importantly) defines two namespaces, the local and the global namespace, that affect execution of the code block.
So what exactly is the difference between a stack frame and an execution frame?
Related
In order to debug how a complicated library executes a function call, e.g. pd.Series(np.NaN)|pd.Series("True") I would like to generate a list of all states of the stack during the execution of that function. So for every line of Python code executed (even inside functions and functions called by functions), there should be one entry in the list of stack traces).
This is a bit like a profile generated by a sampling profiler. But in contrast, I don't want a stack trace for every millisecond, but for every line of Python code executed.
I know I can manually call this function using pdb, and repeatedly enter step to step into any function calls and where to print stack traces of the current state - but I want to do this automatically.
How can I automate this stack trace collection? Is there a package for this? If not, how do I automate that with, for example, pdb or another tool that does the job? Is there an accepted word for such a list of stack traces?
That stack list could be used for at least two purposes: a) quickly finding all code that is reached, reducing the scope for finding relevant lines, b) creating a "flamegraph" of execution.
Somewhat related questions:
What cool hacks can be done using sys.settrace?
sandboxing/running python code line by line
Without using a debugger you can only get the stack trace of the current point of execution, i.e. the position of your code.
Therefore you have three options:
If you can change to called functions code, you can inject a tiny function that makes the stack trace. Look at the traceback module on how to do this.
Some libraries/frameworks have the ability to tune the logging options. Lookup the doc (or the source code) of the called functions if they do logging and you may enable this using the logging module.
Use a debugger.
Here's an esoteric pure-Python question.
I'm doing some statistical profiling using sys._current_frames(). i.e. I've got a background thread that runs sys._current_frames() once every second, dumps the results in a text file, and then later I've got some Python code that sorts the tracebacks from most common to least.
One curious phenomenon I've seen is tracebacks like these:
File "/opt/foo/bar.py", line 1437, in __iter__
yield key
This yield is a generator that I wrote. The curious thing is that there's just one frame on this traceback. How could this be? The other traceback have lots of frames, either from the top level of the process or the top level of the frame. What is the meaning of this single-frame stacktrace?
One theory that I had is that this is a generator's frozen state, after it's yielded a value and it's waiting to have next called on it again. But I think I disproved this theory with a separate experiment: I made a generator, ensured it was paused, called sys._current_frames() and I didn't see that kind of stacktrace.
As the sys._current_frames() documentation warns,
This is most useful for debugging deadlock: this function does not require the deadlocked threads’ cooperation, and such threads’ call stacks are frozen for as long as they remain deadlocked. The frame returned for a non-deadlocked thread may bear no relationship to that thread’s current activity by the time calling code examines the frame.
sys._current_frames() is naturally prone to race conditions in any situation where you cannot guarantee the threads of interest are paused.
As you suspected, you're seeing a stack trace for a suspended generator. When a generator suspends, its stack frame has no parent frames. Its f_back is set to null.
sys._current_frames() retrieves stack frames for currently running threads, but by the time you look at those frames, they may not be running any more. If a generator suspends between the time you call sys._current_frames() and the time you inspect the frame, this is what it will look like. You might also see it on top of a call stack that looks completely different from when you actually called sys._current_frames(), if it gets resumed somewhere else.
Your test didn't show the generator frame because you suspended the generator before calling sys._current_frames() instead of afterward. The generator's stack frame was not the active frame of any thread at that point.
The documentation says:
Note Keeping references to frame objects, as found in the first element of the frame records these functions return, can cause your program to create reference cycles. Once a reference cycle has been created, the lifespan of all objects which can be accessed from the objects which form the cycle can become much longer even if Python’s optional cycle detector is enabled. If such cycles must be created, it is important to ensure they are explicitly broken to avoid the delayed destruction of objects and increased memory consumption which occurs.
Though the cycle detector will catch these, destruction of the frames (and local variables) can be made deterministic by removing the cycle in a finally clause. This is also important if the cycle detector was disabled when Python was compiled or using gc.disable(). For example:
def handle_stackframe_without_leak():
frame = inspect.currentframe()
try:
# do something with the frame
finally:
del frame
If you want to keep the frame around (for example to print a traceback later), you can also break reference cycles by using the frame.clear() method.
Which supposedly means that there are two things that have references to each other. What are they exactly?
Can you explain more precisely under which conditions a reference cycle is created? When I do inspect.currentframe() without del frame? Does the same go for inspect.stack()? Any other methods/circumstances?
With normal functions calls, the program state is mostly described by a simple call stack. It is printed out as a traceback after an uncaught exception, it can be examined with inspect.stack, and it can be displayed in a debugger after a breakpoint.
In the presence of generators, generator-based couroutines, and async def-based coroutines, I don't think the call stack is enough. What's a good way to mentally visualize the program state? How do I inspect it in run-time?
There are functions inspect.getgeneratorstate and inspect.getcoroutinestate, but they only provide information about whether the generator/coroutine is created, running, suspended, or closed. In the case the state is RUNNING, I want to be able to examine the actual line number the generator or coroutine is currently executing and the stack frames that correspond to the other functions it might have called. In the case it's SUSPENDED, I want to examine other generators / coroutines it sent data to or yielded to.
Edit: I found a related question on SO which pointed me to this excellent article that explains everything I asked about in this question.
You just have to findout all instances fo generators and co-routines, in all "traditional" frames - (either search for them recursively in all objects in all frames, or you mitght try to use the garbage collector (gc) module to get a reference to all these instances)
Generators and co-routines have, respectively, a gi_frame and a cr_frame attribute.
In a python reference manual said that
A code block is executed in an execution frame. A frame contains some
administrative information (used for debugging) and determines where
and how execution continues after the code block’s execution has
completed.
and
Frame objects represent execution frames. They may occur
in traceback objects
But I don't understanf how frame does work. How can I get an acces to a current frame object? When is frame object creating? Is the frame object created everytime when a code of new block is strarting to execute?
These frames are representations of the stack frames created by function calls. You should not need to access them in normal programming. A new frame is indeed created every time a function is called, and destroyed when it exits or raises an uncaught exception. Since function calls can go many levels deep your program ends up with a bunch of nested stack frames, but it's not good programming practice (unless you are writing a debugger or similar application) to mess about with the frames even though Python does make them available.
It is important to understand frames in Python 3, especially if you maintain production code that is sensitive to memory usage, such as deep learning models.
Frames are objects that represent areas of memory where local variables are stored when a function is called. They can be stored and manipulated, which is what debuggers do. Understanding how frames are handled by python can help you avoid memory leaks.
Each time a function is called, a new frame is created to hold the function's variables and parameters. These frames are normally destroyed when the function finishes executing normally. However, if an exception is raised, the associated frame and all parent frames are stored in a traceback object, which is an attribute of the Exception object (__traceback__). This can cause memory leaks if the Exception object live for a long time, because they will hold onto the frames and all associated local variables are not going to be garbage collected.
This matter quite a lot.
For example it is one reason why you need to call the close method on file objects even if you don't create reference cycles, because the file object may be referenced by a traceback object stored on an Exception. This exclude file from being garbage collected even after it goes out of scope.
The issue is worse in interactive python shells like (Jupyter) where each unhandled exceptions ends up leaving forever in few places. I'm working on a way to clear that up hence I've found this issue.
In Python, the types module provides the FrameType and TracebackType types, which represent frames and tracebacks, respectively. However, you cannot instantiate these types directly. https://docs.python.org/3/library/types.html#types.FrameType
The tracback attribute was introduced in python 3 with PEP 3134, it goes a bit on ramification of this change in details, so it is a good read for curious.