Correct cyclic garbage-collection in extension modules

Correct cyclic garbage-collection in extension modules - python

Two sections of Python 2.7's doc mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.
The Python/C API Reference Manual gives two rules, i.e.,
The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().
Once all the fields which may contain references to other containers are initialized, it must call PyObject_GC_Track().
Whereas in Extending and Embedding the Python Interpreter, for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and filling tp_traverse and tp_clear slots would be sufficient to enable CGC support. And the two rules above are NOT practiced at all.
When I modified the Noddy example to actually follow the rules of PyObject_GC_New()/PyObject_GC_Del() and PyObject_Track()/PyObject_GC_UnTrack(), it surprisingly raised assertion error saying,
Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small
This leads to my confusion about the correct / safe way to implement CGC. Could anyone give advices or, preferably, a neat example of a container object with CGC support?

Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy example you definitely don't.
The short version is that a TypeObject contains two function pointers: tp_alloc and tp_free. By default tp_alloc calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC is set) and tp_free untracks the class on destruction.
The Noddy documentation says (at the end of the section):
That’s pretty much it. If we had written custom tp_alloc or tp_free slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.
Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.
Detail:
Noddy's are allocated using a function called Noddy_new put in the tp_new slots of the TypeObject. According to the documentation, the main thing the "new" function should do is call the tp_alloc slot. You typically don't write tp_alloc yourself, and it just defaults to PyType_GenericAlloc().
Looking at PyType_GenericAlloc() in the Python source shows a number of sections where it changes based on PyType_IS_GC(type). First it calls _PyObject_GC_Malloc instead of PyObject_Malloc, and second it calls _PyObject_GC_TRACK(obj). [Note that all that PyObject_New really does is call PyObject_Malloc and then tp_init.]
Similarly, on deallocation you call the tp_free slot, which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC. PyObject_GC_Del includes code that does the same as PyObject_GC_UnTrack so a call to untrack is unnecessary.

I am not experienced enough in the C API myself to give you any advice, but there are plenty of examples in the Python container implementations themselves.
Personally, I'd start with the tuple implementation first, since it's immutable: Objects/tupleobject.c. Then move on to the dict, list and set implementations for further notes on mutable containers:
Objects/dictobject.c
Objects/listobject.c
Objects/setobject.c
I can't help but notice that there are calls to PyObject_GC_New(), PyObject_GC_NewVar() and PyObject_GC_Track() throughout, as well as having Py_TPFLAGS_HAVE_GC set.

Related

Magic functions to C functions in CPython

I am looking into Cpython implementation and got to learn about how python tackles operator overloading (for example comparison operators) using something like richcmpfunc tp_richcompare; field in _typeobject struct. Where the type is defined as typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int);. And so whenever there is need for PyObject being operated by these operators it tries to call tp_richcompare function.
My doubt is that in python we use magic functions like __gt__ etc. to override these operators. So how does python code gets converted into C code as a tp_richcompare and is being used everywhere where we interpret any comparison operator for PyObject.
My second doubt is kind of general version of this: How code in a particular language (here Python) to override things (operators, hash etc.) which are interpreted in another language (C in case of CPython) calls the function defined in first language (Python). As far as I know, when bytecode is generated it's a low-level instruction based representation (which is essentially array of uint8_t).
Another example of this is __hash__ which would be defined in python but is needed in the C-based implementation of the dictionary while lookdict. Again they use C function typedef Py_hash_t (*hashfunc)(PyObject *); everywhere hash is needed for a PyObject but translation of __hash__ to this C function is mysterious.

Python code is not transformed into C code. It is interpreted by C code (in CPython), but that's a completely different concept.
There are many ways to interpret a Python program, and the language reference does not specify any particular mechanism. CPython does it by transforming the each Python function into a list of virtual machine instructions, which can then be interpreted with a virtual machine emulator. That's one approach. Another one would be to just build the AST and then define a (recursive) evaluate method on each AST node.
Of course, it would also be possible to transform the program into C code and compile the C code for future execution. (Here, "C" is not important. It could be any compiled language which seems convenient.) However, there's not much benefit to doing that, and lots of disadvantages. One problem, which I guess is the one behind your question, is that Python types don't correspond to any C primitive type. The only way to represent a Python object in C is to use a structure, such as CPython PyObject, which is effectively a low-level mechanism for defining classes (a concept foreign to C) by including a pointer to a type object which contains a virtual method table, which contains pointers to the functions used to implement the various operations on objects of that type. In effect, that will end up calling the same functions as the interpreter would call to implement each operation; the only purpose of the compiled C code is to sequence the calls without having to walk through an interpretable structure (VM list or AST or whatever). That might be slightly faster, since it avoids a switch statement on each AST node or VM operation, but it's also a lot bulkier, because a function call occupies a lot more space in memory than a single opcode byte.
An intermediate possibility, in common use these days, is to dynamically compile descriptions of programs (ASTs or VM lists or whatever) into actual machine code at runtime, taking into account what can be discovered about the actual dynamic types and values of the referenced variables and functions. That's called "just-in-time (JIT) compilation", and it can produce huge speedups at runtime, if it's implemented well. On the other hand, it's very hard to get it right, and discussing how to do it is well beyond the scope of a SO answer.
As a postscript, I understand from a different question that you are reading Robert Nystrom's book, Crafting Interpreters. That's probably a good way of learning these concepts, although I'm personally partial to a much older but still very current textbook, also freely available on the internet, The Structure and Interpretation of Computer Programs, by Gerald Sussman, Hal Abelson, and Julie Sussman. The books are not really comparable, but both attempt to explain what it means to "interpret a program", and that's an extremely important concept, which probably cannot be communicated in four paragraphs (the size of this answer).
Whichever textbook you use, it's important to not just read the words. You must do the exercises, which is the only way to actually understand the underlying concepts. That's a lot more time-consuming, but it's also a lot more rewarding. One of the weaknesses of Nystrom's book (although I would still recommend it) is that it lays out a complete implementation for you. That's great if you understand the concepts and are looking for something which you can tweak into a rapid prototype, but it leaves open the temptation of skipping over the didactic material, which the is most important part for someone interested in learning how computer languages work.

Difference between a destructor in Python vs C++

How do the contracts of a C++ destructor and a Python destructor differ, especially relating to object lifecycle and when resources are reclaimed? I haven't found a comprehensive side-by-side comparison.
What I think a C++ destructor does is that it entirely frees the memory held by the object. And Python deregisters the object but it still remains in the cache memory (calling it garbage collection), and then frees the memory entirely after the program is complete.

Liftime of a C++ object begins with its construction, and ends with its destruction. C++ assumes a system with limited amount of resources; Resource Aquisition Is Initialization. That just means every aquired resource is bound to an object and must be freed before the objects lifetime ends: the destructor is supposed to pay the debt of constructor. Otherwise a resource leak happens. Thus C++ defines 4 storage classes:
External objects trancende others they are created at beginning of the program(before main) and destroyed upon exit.
Static objects are much like external objects, but they can be a bit lazyier in construction and generally get destryed prior to external objects(things are more complicated)
Do not ever forget forget the initialization order fiasco.
Automatic objects( function local objects or none static class data members) die when they go out of scope. Scope for function local variables is the closing curly brace } corresponding the nearest opening curly brace { that embrace the object. None-static data members are killed by their parents. General rule for distructoin is that since new-commers build upon pioneers' work, each object may rely on the existance and validity of objects previously constructed: so automatic objects are destructed in the reverse order of construction(declaration).
Dynamic objects are the most dangerous ones. Compiler does not automatically put a bound on their lifetime; but the progrmmer must do. General rule is to delete every instance created by new. But that is easier said than done. So good practice is to avoid using naked new/delete pair and stick to smart pointer and generic container libraries that cautiously take care of the tricky parts.
But that is not all. In order to correctly implement RAII per class, one must be familiar with famous idioms. C++ Programmer must be able to utilize the rule of 0/3/5 and he must be familiar with copy/swap idiom for not so time-critcal cases. Nothing is a must, but those 2 idioms can be good start points for general cases; specific cases need specific treatment (eg. Copy/swap does not do good with vector).

(Python) Monkeypatch new for objects of type int, float, str, list, dict, set, and module in python

I want to implicitly extend the int, float, str, list, dict, set, and module classes with custom built substitutions (extensions).
When I say 'implicitly', what I mean is that when I declare 'a = 1', and object of the type Custom_Int (as an example) is produced, as opposed to a normal integer object.
Now, I understand and respect the reasons not to do this. Firstly- messing with built-ins is like messing with the laws of physics. No good can come from it. That said- I do understand the gravity of what I'm trying to do and what can happen if I do it wrong.
Second- I understand that modifying a base case will effect not just the current run-time but all running python processes. I feel that by overriding the __new__ method of these base classes, such that it returns Custom_Object_Whatever if and ONLY IF certain environmental factors are true, other run times will remain largely unaffected.
So, getting back to the issue at hand- how can I override the __new__ method of these various types?
Pythons forbiddenfruit package seems to be promising. I havn't had a chance to reeeeeeally investigate it though, and if someone who understands it could summarize what it does, that would save me a lot of time.
Beyond that, I've observed something strange.
Every answer to monkeypatching that doesn't eventually circle back to forbiddenfruit or how forbiddenfruit works has to do with modifying what I will refer to as the 'absolute_dictionary' of the class. Because everything in Python is essentially a mapping (or dictionary) of functions/values to names, if you change the name __new__ within the right mapping, you change the nature of the object.
Problem is- every near-success I've had has it that if I call 'str( 'a' ).__new__( *args )' it works fine {in some cases}, but the calling of varOne = 'a' does not seem to actually call str.__new__().
My guess- this has something to do with either python's parsing of a program prior to launch, or else the caching of the various classes during/post launch. Or maybe I'm totally off the mark. Either python pre-reads and applies some regex to it's modules prior to launch or else the machine code, when it attempts to implicitly create an object, it reaches for something other than the class located in moduleObject.builtins[ __classname__ ]
Any ideas?

If you want to do this, your best option is probably to modify the CPython source code and build your own custom Python build with your extensions baked into the actual built-in types. The result will integrate a lot better with all the low-level mechanisms you don't yet understand, and you'll learn a lot in the process.
Right now, you're getting blocked by a lot of factors. Here are the ones that have come to my mind.
The first is that most ways of creating built-in objects don't go through a __new__ method at all. They go through C-level calls like PyLong_FromLong or PyList_New. These calls are hardwired to use the actual built-in types, allocating memory sized for the real built-in types, fetching the type object by the address of its statically-allocated C struct, and stuff like that. It's basically impossible to change any of this without building your own custom Python.
The second factor is that messing with __new__ isn't even enough to correctly affect things that theoretically should go through __new__, like int("5"). Python has reasons for stopping you from setting attributes on built-in classes, and two of those reasons are slots and the type attribute cache.
Slots are a public part of the C API that you'll probably learn about if you try to modify the CPython source code. They're function pointers in the C structs that make up type objects at C level, and most of them correspond to Python-level magic methods. For example, the __new__ method has a corresponding tp_new slot. Most C code accesses slots instead of methods, and there's code to ensure the slots and methods are in sync, but if you bypass Python's protections, that breaks and everything goes to heck.
The type attribute cache isn't a public part of anything even at C level. It's a cache that saves the results of type object attribute lookups, to make Python go faster. Its memory safety relies on all type object attribute modification going through type.__setattr__ (and all built-in type object attribute modification getting rejected by type.__setattr__), but if you bypass the protection, memory safety goes out the window and arbitrarily weird results can occur.
The third factor is that there's a bunch of caching for immutable objects. The small int cache, the interned string dict, constants getting saved in bytecode objects, compile-time constant folding... there's a lot. Objects aren't going to be created when you expect. (There's also stuff like, say, zip saving the last output tuple and reusing it if it sees you didn't keep a reference, for even more ways object creation will mess with your assumptions.)
There's more. Stuff like, what argument would int.__new__ even take if you tried to use int.__new__ to evaluate the expression 5? Stuff like all the low-level code that knows exactly how to work with the types it expects and will get very confused if it gets a MyCustomTuple with a completely different memory layout from a real tuple. Screwing with built-ins has a lot of issues.
Incidentally, one of the things you expected to be a problem is mostly not a problem. Screwing with one Python process's built-ins won't affect other Python processes' built-ins... unless those other processes are created by forking the first process, such as with multiprocessing in fork mode.

Is it conventional to say that functions are called and methods are invoked?

I’m reading Think Python: How to Think Like a Computer Scientist. The author uses “invoke” with methods and “call” with functions.
Is it a convention? And, if so, why is this distinction made? Why are functions said to be called, but methods are said to be invoked?

Not really, maybe it is easier for new readers to make an explicit distinction in order to understand that their invocation is slightly different. At least that why I suspect the author might have chosen different wording for each.
There doesn't seem to be a convention that dictates this in the Reference Manual for the Python language. What I seem them doing is choosing invoke when the call made to a function is implicit and not explicit.
For example, in the Callables section of the Standard Type Hierarchy you see:
[..] When an instance method object is called, the underlying function (__func__) is called, inserting the class instance (__self__) in front of the argument list. [...]
(Emphasis mine) Explicit call
Further down in Basic Customization and specifically for __new__ you can see:
Called to create a new instance of class cls. __new__() is a static method [...]
(Emphasis mine) Explicit call
While just a couple of sentences later you'll see how invoked is used because __new__ implicitly calls __init__:
If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.
(Emphasis mine) Implicitly called
So no, no convention seems to be used, at least by the creators of the language. Simple is better than complex, I guess :-).

One good source for this would be the Python documentation. A simple text search through the section on Classes reveals the word "call" being used many times in reference to "calling methods", and the word "invoke" being used only once.
In my experience, the same is true: I regularly hear "call" used in reference to methods and functions, while I rarely hear "invoke" for either. However, I assume this is mainly a matter of personal preference and/or context (is the setting informal?, academic?, etc.).
You will also see places in the documentation where the word "invoke" is used in refernce to functions:
void Py_FatalError(const char *message)
Print a fatal error message
and kill the process. No cleanup is performed. This function should
only be invoked when a condition is detected that would make it
dangerous to continue using the Python interpreter; e.g., when the
object administration appears to be corrupted. On Unix, the standard C
library function abort() is called which will attempt to produce a
core file.
And from here:
void Py_DECREF(PyObject *o)
Decrement the reference count for object o. The object must not be NULL; if you aren’t sure that it isn’t NULL,
use Py_XDECREF(). If the reference count reaches zero, the object’s
type’s deallocation function (which must not be NULL) is invoked.
Although both these references are from the Python C API, so that may be significant.
To summerize:
I think it is safe to use either "invoke" or "call" in the context of functions or methods without sounding either like a noob or a showoff.
Note that I speak only of Python, and what I know from my own experience. I cannot speak to the difference between these terms in other languages.

C++ shared_ptr vs. Python object

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
On the other hand, Python objects seem to be essentially shared_ptrs (ref_count and garbage collection).
I am wondering what makes them work nicely in Python but potentially dangerous in C++. In other words, what are the differences between Python and C++ in dealing with shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
I know e.g. Python automatically detects cycles between objects which prevents memory leaks that dangling cyclic shared_ptrs can cause in C++.

"I know e.g. Python automatically detects cycles" -- that's what makes them work nicely, at least so far as the "potential bugs" relate to memory leaks.
Besides which, C++ programs are more commonly written under tight performance constraints than Python programs (which IMO is a combination of different genuine requirements with some fairly bogus differences in rules-of-thumb, but that's another story). A fairly high proportion of the Python objects I use don't strictly need reference counting, they have exactly one owner and a unique_ptr would be fine (or for that matter a data member of class type). In C++ it's considered (by the people writing the advice you're reading) worth taking the performance advantage and the explicitly simplified design. In Python it's usually not considered a problem, you pay the performance and you keep the flexibility to decide later that it's shared after all without any code change required (other than to take additional references that outlive the original, I mean).
Btw in any language, shared mutable objects have "potential bugs" associated with them, if you lose track of what objects will or won't change when you're not looking at them. I don't just mean race conditions: even in a single-threaded program you need to be aware that C++ Predicates shouldn't change anything and that you (often) can't mutate a container while iterating over it. I don't see this as a difference between C++ and Python, though. Rather, to some extent you should be slightly wary of shared objects in Python too, and when you proliferate references to an object at least understand why you're doing it.
So, on to the list of issues in the question you link to:
cyclic references -- as mentioned, Python rolls its sleeves up, finds them and frees them. For reasons to do with the design and specific uses of the languages, cycle-breaking garbage collection is rather difficult to implement in C++, although not impossible.
creating multiple unrelated shared_ptrs to the same object -- no analog is possible in Python, since the reference-counter isn't open to the user to mess up.
Constructing an anonymous temporary shared pointer -- doesn't arise in Python, there's no risk of a memory leak that way in Python since there's no "gap" in which the object exists but is not yet subject to collection if it becomes unreferenced.
Calling the get() function to get the raw pointer and use it after the pointed-to object goes out of scope -- well, you can mess this up if you're writing Python/C, but not in pure Python.
Passing a reference of or a raw pointer to a shared_ptr should be dangerous too, since it won't increment the internal count -- there's no means in Python to add a reference without the language taking care of the refcount.
we passed 'this' to some thread workers instead of 'shared_from_this' -- in other words, forgot to create a shared_ptr when needed. Can't do this in Python.
most of the predicates you know and love from <functional> don't play nicely with shared_ptr -- Python refcounting is so built in to the runtime (or I suppose to be precise I should say: garbage collection is so built in to the language design) that there are no libraries that fail to cope with it.
Using shared_ptr for really small objects (like char short) could be an overhead -- issue exists in Python, and Python programmers generally don't sweat it. If you need an array of "primitive type" then you can use numpy to reduce overhead. Sometimes Python programs run out of memory and you need to do something about it, that's life ;-)
Giving out a shared_ptr< T > to this inside a class definition is also dangerous. Use enabled_shared_from_this instead -- it may not be obvious, but this is "don't create multiple unrelated shared_ptr to the same object" again.
You need to be careful when you use shared_ptr in multithread code -- it's possible to create race conditions in Python too, this is part of "shared mutable objects are tricksy".
Most of this is to do with the fact that in C++ you have to explicitly do something to get refcounting, and you don't get it if you don't ask for it. This provides several opportunities for error that Python doesn't make available to the programmer because it just does it for you. If you use shared_ptr correctly then apart from the existence of libraries that don't co-operate with it, none of these problems comes up in C++ either. Those who are cautious of using it for these reasons are basically saying they're afraid they'll use it incorrectly, or at any rate more afraid than that they'll misuse some alternative. Much of C++ programming is trading different potential bugs off against each other until you come up with a design that you consider yourself competent to execute. Furthermore it has "don't pay for what you don't need" as a design philosophy. Between these two factors, you don't do anything without a really good explanation, a significant benefit, and a carefully checked design. shared_ptr is no different ;-)

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
I wouldn't agree. The tendency goes towards generally using these smart pointers unless you have a very good reasons not to do so.
shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
Well, I don't know about your favourite largish signal processing framework ecosystem, but GNU Radio uses shared_ptrs for all their blocks, which are the core elements of the GNU Radio architecture. In fact, blocks are classes, with private constructors, which are only accessible by a friend make function, which returns a shared_ptr. We haven't had problems with this -- and GNU Radio had good reason to adopt such a model. Now, we don't have a single place where users try to use deallocated block objects, not a single block is leaked. Nice!
Also, we use SWIG and a gateway class for a few C++ types that can't just be represented well as Python types. All this works very well on both sides, C++ and Python. In fact, it works so very well, that we can use Python classes as blocks in the C++ runtime, wrapped in shared_ptr.
Also, we never had performance problems. GNU Radio is a high rate, highly optimized, heavily multithreaded framework.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.