I have a compatibility library that uses SWIG to access a C++ library. I would find it useful to be able to create a SWIG-wrapped Python object inside this layer (as opposed to accepting the C++ object as an argument or returning one). I.e. I want the PyObject* that points to the SWIG-wrapped C++ object.
I discovered that the SWIG_NewPointerObj function does exactly this. The SWIG-generated xx_wrap.cpp file uses this function, but it's also made available in the header emitted by swig -python -external-runtime swigpyrun.h
HOWEVER, I cannot find any reference to what the last argument to this function is. It appears that it specifies the ownership of the object, but there is no documentation that says what each of the options mean (or even what they all are).
It appears that the following are acceptable values:
0
SWIG_POINTER_OWN
SWIG_POINTER_NOSHADOW
SWIG_POINTER_NEW = OWN + NOSHADOW
SWIG_POINTER_DISOWN (I'm not sure if SWIG_NewPointerObj accepts this)
SWIG_POINTER_IMPLICIT_CONV (I'm not sure if SWIG_NewPointerObj accepts this)
I want to create an object that is used only in my wrapping layer. I want to create it out of my own pointer to the C++ object (so I can change the C++ object's value and have it be reflected in the Python object. I need it so it can be passed to a Python callback function. I want to keep this one instance throughout the life of the program so that I don't waste time creating/destroying identical objects for each callback. Which option is appropriate, and what do I Py_INCREF?
When you create new pointer objects with SWIG_NewPointerObj, you may pass the following flags:
SWIG_POINTER_OWN
SWIG_POINTER_NOSHADOW
If SWIG_POINTER_OWN is set, the destructor of the underlying C++ class will be called when the Python pointer is finalized. By default, the destructor will not be called. See Memory Management
For your use case, you don't need to set any flags at all.
From what I can see in the sources, if SWIG_POINTER_NOSHADOW is set, then a basic wrapped pointer is returned. You will not be able to access member variables in Python. All you'll have is an opaque pointer.
Reference: /usr/share/swig/2.0.7/python/pyrun.swg
Related
I want to implicitly extend the int, float, str, list, dict, set, and module classes with custom built substitutions (extensions).
When I say 'implicitly', what I mean is that when I declare 'a = 1', and object of the type Custom_Int (as an example) is produced, as opposed to a normal integer object.
Now, I understand and respect the reasons not to do this. Firstly- messing with built-ins is like messing with the laws of physics. No good can come from it. That said- I do understand the gravity of what I'm trying to do and what can happen if I do it wrong.
Second- I understand that modifying a base case will effect not just the current run-time but all running python processes. I feel that by overriding the __new__ method of these base classes, such that it returns Custom_Object_Whatever if and ONLY IF certain environmental factors are true, other run times will remain largely unaffected.
So, getting back to the issue at hand- how can I override the __new__ method of these various types?
Pythons forbiddenfruit package seems to be promising. I havn't had a chance to reeeeeeally investigate it though, and if someone who understands it could summarize what it does, that would save me a lot of time.
Beyond that, I've observed something strange.
Every answer to monkeypatching that doesn't eventually circle back to forbiddenfruit or how forbiddenfruit works has to do with modifying what I will refer to as the 'absolute_dictionary' of the class. Because everything in Python is essentially a mapping (or dictionary) of functions/values to names, if you change the name __new__ within the right mapping, you change the nature of the object.
Problem is- every near-success I've had has it that if I call 'str( 'a' ).__new__( *args )' it works fine {in some cases}, but the calling of varOne = 'a' does not seem to actually call str.__new__().
My guess- this has something to do with either python's parsing of a program prior to launch, or else the caching of the various classes during/post launch. Or maybe I'm totally off the mark. Either python pre-reads and applies some regex to it's modules prior to launch or else the machine code, when it attempts to implicitly create an object, it reaches for something other than the class located in moduleObject.builtins[ __classname__ ]
Any ideas?
If you want to do this, your best option is probably to modify the CPython source code and build your own custom Python build with your extensions baked into the actual built-in types. The result will integrate a lot better with all the low-level mechanisms you don't yet understand, and you'll learn a lot in the process.
Right now, you're getting blocked by a lot of factors. Here are the ones that have come to my mind.
The first is that most ways of creating built-in objects don't go through a __new__ method at all. They go through C-level calls like PyLong_FromLong or PyList_New. These calls are hardwired to use the actual built-in types, allocating memory sized for the real built-in types, fetching the type object by the address of its statically-allocated C struct, and stuff like that. It's basically impossible to change any of this without building your own custom Python.
The second factor is that messing with __new__ isn't even enough to correctly affect things that theoretically should go through __new__, like int("5"). Python has reasons for stopping you from setting attributes on built-in classes, and two of those reasons are slots and the type attribute cache.
Slots are a public part of the C API that you'll probably learn about if you try to modify the CPython source code. They're function pointers in the C structs that make up type objects at C level, and most of them correspond to Python-level magic methods. For example, the __new__ method has a corresponding tp_new slot. Most C code accesses slots instead of methods, and there's code to ensure the slots and methods are in sync, but if you bypass Python's protections, that breaks and everything goes to heck.
The type attribute cache isn't a public part of anything even at C level. It's a cache that saves the results of type object attribute lookups, to make Python go faster. Its memory safety relies on all type object attribute modification going through type.__setattr__ (and all built-in type object attribute modification getting rejected by type.__setattr__), but if you bypass the protection, memory safety goes out the window and arbitrarily weird results can occur.
The third factor is that there's a bunch of caching for immutable objects. The small int cache, the interned string dict, constants getting saved in bytecode objects, compile-time constant folding... there's a lot. Objects aren't going to be created when you expect. (There's also stuff like, say, zip saving the last output tuple and reusing it if it sees you didn't keep a reference, for even more ways object creation will mess with your assumptions.)
There's more. Stuff like, what argument would int.__new__ even take if you tried to use int.__new__ to evaluate the expression 5? Stuff like all the low-level code that knows exactly how to work with the types it expects and will get very confused if it gets a MyCustomTuple with a completely different memory layout from a real tuple. Screwing with built-ins has a lot of issues.
Incidentally, one of the things you expected to be a problem is mostly not a problem. Screwing with one Python process's built-ins won't affect other Python processes' built-ins... unless those other processes are created by forking the first process, such as with multiprocessing in fork mode.
I’m reading Think Python: How to Think Like a Computer Scientist. The author uses “invoke” with methods and “call” with functions.
Is it a convention? And, if so, why is this distinction made? Why are functions said to be called, but methods are said to be invoked?
Not really, maybe it is easier for new readers to make an explicit distinction in order to understand that their invocation is slightly different. At least that why I suspect the author might have chosen different wording for each.
There doesn't seem to be a convention that dictates this in the Reference Manual for the Python language. What I seem them doing is choosing invoke when the call made to a function is implicit and not explicit.
For example, in the Callables section of the Standard Type Hierarchy you see:
[..] When an instance method object is called, the underlying function (__func__) is called, inserting the class instance (__self__) in front of the argument list. [...]
(Emphasis mine) Explicit call
Further down in Basic Customization and specifically for __new__ you can see:
Called to create a new instance of class cls. __new__() is a static method [...]
(Emphasis mine) Explicit call
While just a couple of sentences later you'll see how invoked is used because __new__ implicitly calls __init__:
If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.
(Emphasis mine) Implicitly called
So no, no convention seems to be used, at least by the creators of the language. Simple is better than complex, I guess :-).
One good source for this would be the Python documentation. A simple text search through the section on Classes reveals the word "call" being used many times in reference to "calling methods", and the word "invoke" being used only once.
In my experience, the same is true: I regularly hear "call" used in reference to methods and functions, while I rarely hear "invoke" for either. However, I assume this is mainly a matter of personal preference and/or context (is the setting informal?, academic?, etc.).
You will also see places in the documentation where the word "invoke" is used in refernce to functions:
void Py_FatalError(const char *message)
Print a fatal error message
and kill the process. No cleanup is performed. This function should
only be invoked when a condition is detected that would make it
dangerous to continue using the Python interpreter; e.g., when the
object administration appears to be corrupted. On Unix, the standard C
library function abort() is called which will attempt to produce a
core file.
And from here:
void Py_DECREF(PyObject *o)
Decrement the reference count for object o. The object must not be NULL; if you aren’t sure that it isn’t NULL,
use Py_XDECREF(). If the reference count reaches zero, the object’s
type’s deallocation function (which must not be NULL) is invoked.
Although both these references are from the Python C API, so that may be significant.
To summerize:
I think it is safe to use either "invoke" or "call" in the context of functions or methods without sounding either like a noob or a showoff.
Note that I speak only of Python, and what I know from my own experience. I cannot speak to the difference between these terms in other languages.
One of the great things of C++ is the usage of const-reference arguments – using these type of argument, you're pretty much guaranteed objects won’t be accidentally modified, and there won’t be side effects.
Question is, what’d be the Python equivalent to such arguments?
For instance, let’s say you have this C++ method:
void Foo::setPosition(const QPoint &position) {
m_position = position;
}
And you want to “translate” it to Python like this:
def set_position(self, position):
self.position = position
Doing this will potentially yield a lot of trouble, and many subtle bugs could appear as well. Question is, what’s the “Python equivalent” way of C++ methods which use const references arguments (copy constructor)?
Last time I caught a bug because a I had a bad “C++ -> Python translation”; I fixed this with something like:
my_python_instance.set_position(QPoint(pos))
… said otherwise, my other choice was to clone the object from the caller… which I’m pretty much sure is not the right way to go.
I hope I understood correctly.
Shortly, there is not one. There are two things you are after which you do not come by in python often. consts and copy constructors.
It's a design decision, python is a different language. Arguments are always passed by reference.
const correctness
It's up to user not to mess up object, and I think it doesn't happen very often. Personally I like const correctness in c++ but I never caught myself missing it in python. It's a dynamic and scripting language so there is no point in looking at the micro optimizations which could be done under argument constness assertion.
Copying object
... you don't do it too much in python. In my opinion it's a design decision to offload memory management onto user, because it's hard to come by with a good standard way, e.g. shallow copies vs. deep copies. I guess if you assume, you don't need it that much, there is no point in providing a way for each object (like c++ does), but only for those which do need it.
Therefore, there is no unified pythonic way. There are at least few ways to do it in standard:
List copied = original[:].
Some objects provide copy method, like dict.
Some objects explicitly provide a constructor (like in c++), dict and list do so, so you can write copied = list(original).
There is a module copy, for which you can provide custom methods __copy__ and __deepcopy__. It also has advantage that another standard library work with it - pickle for serialization.
The most c++-like way is option 3. - to implement a constructor so it accepts invocation which returns a copy of its argument (when the argument is of the same type). But it might need a bit crafty implementation of ctor, because you cannot do type overloading.
Option 2. is the same, but refactored into a function.
So probably best way is to provide explicit __copy__/copy methods, and if you're nice support ctor invocation which will call it.
You then as a developer of object can ensure const correctness, and provide a user with an easy explicit way to request a copy.
There is no direct equivalent. The closest is to use the copy module and define the __copy__() and/or __deepcopy__() methods on your classes.
Write a decorator on the function. Have it serialize the data on the way in, serialize it on the way out, and confirm the two are equal. If they are not, generate an error.
Python type checking is mostly runtime checks that arguments satisfy some predicate. const in C++ is a type check on the behaviour of the function: so you have to do a runtime check on the behaviour of the function to be equivalent.
You could also only do such checks when doing unit testing or in a debug build, "prove" it correct, then remove checks on "release" mode.
Alternatively, you could write a static analyzer that checks for const violations using the inspect module, and decorating the immutability of arguments you lack source for, I suppose. Probably be just as easy to write your own language variant that supports const tho. As in nigh impossible.
What kind of object is position? There is very likely a way to copy it, so that self has a private copy of the position. For instance, if position is a Point2D, then either self.position = Point2D(position) or self.position = Point2D( position.x, position.y ) is likely to work.
By the way, your C++ code might not be as safe as you think it is. If m_position is a QPoint&, then you are still vulnerable to somebody in the outside world modifying the QPoint that was passed in, after your function returns. (Passing a parameter as const does not guarantee that the referred-to object is const from the caller's point of view.)
I am using swig with python to call a function myfunc() written in C++. Every time I call myfunc() I have to generate the same huge sparse matrix. What I would like to do instead, is to create the matrix once, then pass an a pointer of the matrix to python, without reallocating space every time. What I fear is that this could cause some kind of memory leak.
What is the best way to do this?
The matrix is part of Eigen::SparseMatrix.
Is it maybe save to simply pass a pointer back and forth? Python would not know how to handle it, but as long as the space stays allocated, will I be able to reuse the pointer in C++?
This is precisely how swig handles an unknown object: It passes a pointer to the object around, together with some type information (a string). If a function takes a pointer of that type as argument, swig will happily pass it that pointer. See the swig docs here.
You just have to make sure the types match up, i.e., you cannot pass say a MatrixXd* to python and use it in a function taking a MatrixBase<MatrixXd>*, since swig will not know that the types are compatible.
Also, for unknown objects (at least pointers to such), swig will not do any memory management, so you will need to allocate and deallocate the object on the C++ side.
Two sections of Python 2.7's doc mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.
The Python/C API Reference Manual gives two rules, i.e.,
The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().
Once all the fields which may contain references to other containers are initialized, it must call PyObject_GC_Track().
Whereas in Extending and Embedding the Python Interpreter, for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and filling tp_traverse and tp_clear slots would be sufficient to enable CGC support. And the two rules above are NOT practiced at all.
When I modified the Noddy example to actually follow the rules of PyObject_GC_New()/PyObject_GC_Del() and PyObject_Track()/PyObject_GC_UnTrack(), it surprisingly raised assertion error saying,
Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small
This leads to my confusion about the correct / safe way to implement CGC. Could anyone give advices or, preferably, a neat example of a container object with CGC support?
Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy example you definitely don't.
The short version is that a TypeObject contains two function pointers: tp_alloc and tp_free. By default tp_alloc calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC is set) and tp_free untracks the class on destruction.
The Noddy documentation says (at the end of the section):
That’s pretty much it. If we had written custom tp_alloc or tp_free slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.
Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.
Detail:
Noddy's are allocated using a function called Noddy_new put in the tp_new slots of the TypeObject. According to the documentation, the main thing the "new" function should do is call the tp_alloc slot. You typically don't write tp_alloc yourself, and it just defaults to PyType_GenericAlloc().
Looking at PyType_GenericAlloc() in the Python source shows a number of sections where it changes based on PyType_IS_GC(type). First it calls _PyObject_GC_Malloc instead of PyObject_Malloc, and second it calls _PyObject_GC_TRACK(obj). [Note that all that PyObject_New really does is call PyObject_Malloc and then tp_init.]
Similarly, on deallocation you call the tp_free slot, which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC. PyObject_GC_Del includes code that does the same as PyObject_GC_UnTrack so a call to untrack is unnecessary.
I am not experienced enough in the C API myself to give you any advice, but there are plenty of examples in the Python container implementations themselves.
Personally, I'd start with the tuple implementation first, since it's immutable: Objects/tupleobject.c. Then move on to the dict, list and set implementations for further notes on mutable containers:
Objects/dictobject.c
Objects/listobject.c
Objects/setobject.c
I can't help but notice that there are calls to PyObject_GC_New(), PyObject_GC_NewVar() and PyObject_GC_Track() throughout, as well as having Py_TPFLAGS_HAVE_GC set.