how to prevent Python garbage collection for anonymous objects?

how to prevent Python garbage collection for anonymous objects? - python

In this Python code
import gc
gc.disable()
<some code ...>
MyClass()
<more code...>
I am hoping that the anonymous object created by MyClass constructor would not be garbage-collected. MyClass actually links to a shared object library of C++ code, and there through raw memory pointers I am able to inspect the contents of the anonymous object.
I can then see that the object is immediately corrupted (garbage collected).
How to prevent Python garbage collection for everything?
I have to keep this call anonymous. I cannot change the part of the code MyClass() - it has to be kept as is.
MyClass() has to be kept as is, because it is an exact translation from C++ (by way of SWIG) and the two should be identical for the benefit of people who translate.
I have to prevent the garbage collection by some "initialization code", that is only called once at the beginning of the program. I cannot touch anything after that.

The "garbage collector" referred to in gc is only used for resolving circular references. In Python (at least in the main C implementation, CPython) the main method of memory management is reference counting. In your code, the result of MyClass() has no references, so will always be disposed immediately. There's no way of preventing that.
What is not clear, even with your edit, is why you can't simply assign it to something? If the target audience is "people who translate", those people can presumably read, so write a comment explaining why you're doing the assignment.

Related

Accessing the memory heap in python

Is there a way to access the memory heap in Python? I'm interested in being able to access all of the objects allocated in memory of the running instance.

You can't get direct access, but the gc module should do most of what you want. A simple gc.get_objects() call will return all the objects tracked by the collector. This isn't everything since the CPython garbage collector is only concerned with potential reference cycles (so built-in types that can't refer to other objects, e.g. int, float, str, etc.) won't appear in the resulting list, but they'll all be referenced by something in that list (if they weren't, their reference count would be zero and they'd have been disposed of).
Aside from that, you might get some more targeted use out of the inspect module, especially stack frame inspection, using the traceback module for "easy formatting" or manually digging into the semi-documented frame objects themselves, either of which would allow you to narrow the scope down to a particular active scope on the stack frame.
For the closest to the heap solution, you could use the tracemalloc module to trace and record allocations as they happen, or the pdb debugger to do live introspection from the outside (possibly adding breakpoint() calls to your code to make it stop automatically when you reach that point to let you look around).

Does Python garbage collect variables that are no longer referenced while within the scope of a function?

While waiting for a long running function to finish executing, I began thinking about whether the garbage collector will clean up references to variables which will no longer be used.
Say for example, I have a function like:
def long_running_function():
x = MemoryIntensiveObject()
print id(x)
# lots of hard work done here which does not reference x
return
I'm intrigued whether the interpreter is smart enough to realize that x is no longer used and can be dereferenced. It's somewhat hard to test, as I can write code to check its reference count, but that implicitly then references it which obviates the reason for doing it.
My thought is, perhaps when the function is parsed and the bytecode is generated, it may be generated in such a way that will allow it to clean up the object when it can no longer be referenced.
Or, is the answer just simpler that, as long as we're still within a scope where it "could" be used, it won't be cleaned up?

No, CPython will not garbage collect an object as long as a name that references that object is still defined in the current scope.
This is because, even if there are no references to the name x as literals in the code, calls to vars() or locals() could still grab a copy of the locals namespace dictionary (either before or after the last reference to x) and therefore the entire locals namespace effectively "roots" the values it references until execution leaves its scope.
I don't know for certain how other implementations do this. In particular, in a JIT-compiled implementation like PyPy, Jython, or IronPython, it is possible at least in theory for this optimization to be performed. The JVM and CLR JITs actually do perform this optimization in practice on other languages. Whether Python on those platforms would be able to take advantage or not depends entirely on the bytecode that the Python code gets compiled into.

When to Py_INCREF?

I'm working on a C extension and am at the point where I want to track down memory leaks. From reading Python's documentation it's hard to understand when to increment / decrement reference count of Python objects. Also, after couple days spending trying to embed Python interpreter (in order to compile the extension as a standalone program), I had to give up this endeavor. So, tools like Valgrind are helpless here.
So far, by trial and error I learned that, for example, Py_DECREF(Py_None) is a bad thing... but is this true of any constant? I don't know.
My major confusions so far can be listed like this:
Do I have to decrement refcount on anything created by PyWhatever_New() if it doesn't outlive the procedure that created it?
Does every Py_INCREF need to be matched by Py_DECREF, or should there be one more of one / the other?
If a call to Python procedure resulted in a PyObject*, do I need to increment it to ensure that I can still use it (forever), or decrement it to ensure that eventually it will be garbage-collected, or neither?
Are Python objects created through C API on the stack allocated on stack or on heap? (It is possible that Py_INCREF reallocates them on heap for example).
Do I need to do anything special to Python objects created in C code before passing them to Python code? What if Python code outlives C code that created Python objects?
Finally, I understand that Python has both reference counting and garbage collector: if that's the case, how critical is it if I mess up the reference count (i.e. not decrement enough), will GC eventually figure out what to do with those objects?

Most of this is covered in Reference Count Details, and the rest is covered in the docs on the specific questions you're asking about. But, to get it all in one place:
Py_DECREF(Py_None) is a bad thing... but is this true of any constant?
The more general rule is that calling Py_DECREF on anything you didn't get a new/stolen reference to, and didn't call Py_INCREF on, is a bad thing. Since you never call Py_INCREF on anything accessible as a constant, this means you never call Py_DECREF on them.
Do I have to decrement refcount on anything created by PyWhatever_New()
Yes. Anything that returns a "new reference" has to be decremented. By convention, anything that ends in _New should return a new reference, but it should be documented anyway (e.g., see PyList_New).
Does every Py_INCREF need to be matched by Py_DECREF, or should there be one more of one / the other?
The number in your own code may not necessarily balance. The total number has to balance, but there are increments and decrements happening inside Python itself. For example, anything that returns a "new reference" has already done an inc, while anything that "steals" a reference will do the dec on it.
Are Python objects created through C API on the stack allocated on stack or on heap? (It is possible that Py_INCREF reallocates them on heap for example).
There's no way to create objects through C API on the stack. The C API only has functions that return pointers to objects.
Most of these objects are allocated on the heap. Some are actually in static memory.
But your code should not care anyway. You never allocate or delete them; they get allocated in the PySpam_New and similar functions, and deallocate themselves when you Py_DECREF them to 0, so it doesn't matter to you where they are.
(The except is constants that you can access via their global names, like Py_None. Those, you obviously know are in static storage.)
Do I need to do anything special to Python objects created in C code before passing them to Python code?
No.
What if Python code outlives C code that created Python objects?
I'm not sure what you mean by "outlives" here. Your extension module is not going to get unloaded while any objects depend on its code. (In fact, until at least 3.8, your module probably never going to get unloaded until shutdown.)
If you just mean the function that _New'd up an object returning, that's not an issue. You have to go very far out of your way to allocate any Python objects on the stack. And there's no way to pass things like a C array of objects, or a C string, into Python code without converting them to a Python tuple of objects, or a Python bytes or str. There are a few cases where, e.g., you could stash a pointer to something on the stack in a PyCapsule and pass that—but that's the same as in any C program, and… just don't do it.
Finally, I understand that Python has both reference counting and garbage collector
The garbage collector is just a cycle breaker. If you have objects that are keeping each other alive with a reference cycle, you can rely on the GC. But if you've leaked references to an object, the GC will never clean it up.

Is relying on del() for cleanup in Python unreliable?

I was reading about different ways to clean up objects in Python, and I have stumbled upon these questions (1, 2) which basically say that cleaning up using __del__() is unreliable and the following code should be avoid:
def __init__(self):
rc.open()
def __del__(self):
rc.close()
The problem is, I'm using exactly this code, and I can't reproduce any of the issues cited in the questions above. As far as my knowledge goes, I can't go for the alternative with with statement, since I provide a Python module for a closed-source software (testIDEA, anyone?) This software will create instances of particular classes and dispose of them, these instances have to be ready to provide services in between. The only alternative to __del__() that I see is to manually call open() and close() as needed, which I assume will be quite bug-prone.
I understand that when I'll close the interpreter, there's no guarantee that my objects will be destroyed correctly (and it doesn't bother me much, heck, even Python authors decided it was OK). Apart from that, am I playing with fire by using __del__() for cleanup?

You observe the typical issue with finalizers in garbage collected languages. Java has it, C# has it, and they all provide a scope based cleanup method like the Python with keyword to deal with it.
The main issue is, that the garbage collector is responsible for cleaning up and destroying objects. In C++ an object gets destroyed when it goes out of scope, so you can use RAII and have well defined semantics. In Python the object goes out of scope and lives on as long as the GC likes. Depending on your Python implementation this may be different. CPython with its refcounting based GC is rather benign (so you rarely see issues), while PyPy, IronPython and Jython might keep an object alive for a very long time.
For example:
def bad_code(filename):
return open(filename, 'r').read()
for i in xrange(10000):
bad_code('some_file.txt')
bad_code leaks a file handle. In CPython it doesn't matter. The refcount drops to zero and it is deleted right away. In PyPy or IronPython you might get IOErrors or similar issues, as you exhaust all available file descriptors (up to ulimit on Unix or 509 handles on Windows).
Scope based cleanup with a context manager and with is preferable if you need to guarantee cleanup. You know exactly when your objects will be finalized. But sometimes you cannot enforce this kind of scoped cleanup easily. Thats when you might use __del__, atexit or similar constructs to do a best effort at cleaning up. It is not reliable but better than nothing.
You can either burden your users with explicit cleanup or enforcing explicit scopes or you can take the gamble with __del__ and see some oddities now and then (especially interpreter shutdown).

There are a few problems with using __del__ to run code.
For one, it only works if you're actively keeping track of references, and even then, there's no guarantee that it will be run immediately unless you're manually kicking off garbage collections throughout your code. I don't know about you, but automatic garbage collection has pretty much spoiled me in terms of accurately keeping track of references. And even if you are super diligent in your code, you're also relying on other users that use your code being just as diligent when it comes to reference counts.
Two, there are lots of instances where __del__ is never going to run. Was there an exception while objects were being initialized and created? Did the interpreter exit? Is there a circular reference somewhere? Yep, lots that could go wrong here and very few ways to cleanly and consistently deal with it.
Three, even if it does run, it won't raise exceptions, so you can't handle exceptions from them like you can with other code. It's also nearly impossible to guarantee that the __del__ methods from various objects will run in any particular order. So the most common use case for destructors - cleaning up and deleting a bunch of objects - is kind of pointless and unlikely to go as planned.
If you actually want code to run, there are much better mechanisms -- context managers, signals/slots, events, etc.

If you're using CPython, then __del__ fires perfectly reliably and predictably as soon as an object's reference count hits zero. The docs at https://docs.python.org/3/c-api/intro.html state:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference count is decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can easily test and see this immediate cleanup happening yourself:
>>> class Foo:
... def __del__(self):
... print('Bye bye!')
...
>>> x = Foo()
>>> x = None
Bye bye!
>>> for i in range(5):
... print(Foo())
...
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
>>>
(Though if you want to test stuff involving __del__ at the REPL, be aware that the last evaluated expression's result gets stored as _, which counts as a reference.)
In other words, if your code is strictly going to be run in CPython, relying on __del__ is safe.

Should I delete large object when finished to use them in python?

Assume to not have any particular memory-optimization problem in the script, so my question is about Python coding style. That also means: is it good and common python practice to dereference an object as soon as whenever possible? The scenario is as follows.
Class A instantiates an object as self.foo and asks a second class B to store and share it with other objects.
At a certain point A decides that self.foo should not be shared anymore and removes it from B.
Class A still has a reference to foo, but we know this object to be useless from now on.
As foo is a relatively big object, would you bother to delete the reference from A and how? (e.g. del vs setting self.foo = None) How this decision influence the garbage collector?

If, after deleting the attribute, the concept of accessing the attribute and seeing if it's set or not doesn't even make sense, use del. If, after deleting the attribute, something in your program may want to check that space and see if anything's there, use = None.
The garbage collector won't care either way.

del Blah
will reduce the reference count of Blah by one ... once there are no more references python will garbage collect it
self.foo = None
will also reduce the reference count of Blah by one ... once there are no more references python will garbage collect it
neither method actually forces the object to be destroyed ... only one reference to it
*
as a general rule of thumb I would avoid using del as it destroys the name and can cause other errors in your code if you try and reference it after that ...
in cPython (the "normal" python) this garbage collection happens very regularly

So far, in my experience with Python, I haven't had any problems with garbage collection. However, I do take precautions, not only because I don't want to bother with any unreferenced objects, but also for organization reasons as well.
To answer your questions specifically:
1) Yes, I would recommend deleting the object. This will keep your code from getting bulky and/or slow. This is an especially good decision if you have a long run-time for your code, even though Python is pretty good about garbage collection.
2) Either way works fine, although I would use del just for the sake of removing the actual reference itself.
3) I don't know how it "influences the garbage collector" but it's always better to be safe than sorry.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.