Memory overhead of objects wrapped by pybind11?

Memory overhead of objects wrapped by pybind11? - python

I'm wondering if there is any memory overhead of using C++ classes/structs wrapped by pybind11.
Let's consider a simple example:
struct Person {
std::string name;
int age;
}
// With some basic bindings
pybind11::class_<Person>(m, "Person")
.def_readwrite("name", &Person::name)
.def_readwrite("age", &Person::age);
In addition, there is a C++ function that returns millions of persons via a std::vector<Person>.
Technically, it is easy to add a pybind11 binding for the function, but is it a good idea to so?
Wrapping the function returns a Python list of person instances.
In general in Python it is inefficient to have a large number of tiny objects, because of memory and GC overheads. The typical solution in Python is to opt for columnar memory layouts, but do these worries apply for classes/structs wrapped by pybind11 as well?
Specifically: If the function returns 1 million elements, will pybind11 internally create another 1 million wrapper instances or do the bindings operate directly on the C++ objects without any overhead?
Does the type of the members matter?

The pybind documentation says that it copies structures every time in binding. That means that these structures and containers are independent in Python and C++, so changes of data in C++ container will not reflect in Python (no references). Also it means that it will duplicate data in C++ and Python - 1 million elements in C++ container and 1 million elements in Python.
See here - https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html

Related

Magic functions to C functions in CPython

I am looking into Cpython implementation and got to learn about how python tackles operator overloading (for example comparison operators) using something like richcmpfunc tp_richcompare; field in _typeobject struct. Where the type is defined as typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int);. And so whenever there is need for PyObject being operated by these operators it tries to call tp_richcompare function.
My doubt is that in python we use magic functions like __gt__ etc. to override these operators. So how does python code gets converted into C code as a tp_richcompare and is being used everywhere where we interpret any comparison operator for PyObject.
My second doubt is kind of general version of this: How code in a particular language (here Python) to override things (operators, hash etc.) which are interpreted in another language (C in case of CPython) calls the function defined in first language (Python). As far as I know, when bytecode is generated it's a low-level instruction based representation (which is essentially array of uint8_t).
Another example of this is __hash__ which would be defined in python but is needed in the C-based implementation of the dictionary while lookdict. Again they use C function typedef Py_hash_t (*hashfunc)(PyObject *); everywhere hash is needed for a PyObject but translation of __hash__ to this C function is mysterious.

Python code is not transformed into C code. It is interpreted by C code (in CPython), but that's a completely different concept.
There are many ways to interpret a Python program, and the language reference does not specify any particular mechanism. CPython does it by transforming the each Python function into a list of virtual machine instructions, which can then be interpreted with a virtual machine emulator. That's one approach. Another one would be to just build the AST and then define a (recursive) evaluate method on each AST node.
Of course, it would also be possible to transform the program into C code and compile the C code for future execution. (Here, "C" is not important. It could be any compiled language which seems convenient.) However, there's not much benefit to doing that, and lots of disadvantages. One problem, which I guess is the one behind your question, is that Python types don't correspond to any C primitive type. The only way to represent a Python object in C is to use a structure, such as CPython PyObject, which is effectively a low-level mechanism for defining classes (a concept foreign to C) by including a pointer to a type object which contains a virtual method table, which contains pointers to the functions used to implement the various operations on objects of that type. In effect, that will end up calling the same functions as the interpreter would call to implement each operation; the only purpose of the compiled C code is to sequence the calls without having to walk through an interpretable structure (VM list or AST or whatever). That might be slightly faster, since it avoids a switch statement on each AST node or VM operation, but it's also a lot bulkier, because a function call occupies a lot more space in memory than a single opcode byte.
An intermediate possibility, in common use these days, is to dynamically compile descriptions of programs (ASTs or VM lists or whatever) into actual machine code at runtime, taking into account what can be discovered about the actual dynamic types and values of the referenced variables and functions. That's called "just-in-time (JIT) compilation", and it can produce huge speedups at runtime, if it's implemented well. On the other hand, it's very hard to get it right, and discussing how to do it is well beyond the scope of a SO answer.
As a postscript, I understand from a different question that you are reading Robert Nystrom's book, Crafting Interpreters. That's probably a good way of learning these concepts, although I'm personally partial to a much older but still very current textbook, also freely available on the internet, The Structure and Interpretation of Computer Programs, by Gerald Sussman, Hal Abelson, and Julie Sussman. The books are not really comparable, but both attempt to explain what it means to "interpret a program", and that's an extremely important concept, which probably cannot be communicated in four paragraphs (the size of this answer).
Whichever textbook you use, it's important to not just read the words. You must do the exercises, which is the only way to actually understand the underlying concepts. That's a lot more time-consuming, but it's also a lot more rewarding. One of the weaknesses of Nystrom's book (although I would still recommend it) is that it lays out a complete implementation for you. That's great if you understand the concepts and are looking for something which you can tweak into a rapid prototype, but it leaves open the temptation of skipping over the didactic material, which the is most important part for someone interested in learning how computer languages work.

Tracking memory usage allocated within C++ wrapped by Cython

I have a Python program which calls some Cython code which in turn wraps some raw C++ code. I would like to see how much memory the base C++ code is allocating. I've tried the memory_profiler module for Python, however, it can't seem to detect anything allocated by the C++ code. My evidence for this is that I have a Cython object that in turn stores an instance of a C++ object. This C++ object should definitely hold onto a bunch of memory. In python, when I create an instance of the Cython object (and it stores an instance of the C++ object), memory_profiler does not detect any extra memory stored (or at least detects only a negligible amount).
Is there any other way to detect how much memory Python is having allocated by the base C++ objects? Or is there something similar to memory_profiler, but for Cython?

If you can run your program on Linux, use https://github.com/vmware/chap (for example, start with "summarize used").

array returned from shared library in python - is this a memory leak?

I have a problem with a project I am working on and am not sure about the best way to
resolve it.
Basically I am pushing a slow python algorithm into a c++ shared library that I am using to do a lot of the numerically intense stuff. One of the c++ functions is of the form:
const int* some_function(inputs){
//does some stuff
int *return_array = new int[10];
// fills return array with a few values
return return_array;
}
I.e returns an array here. This array is interpreted within python using numpy ndpointer as per:
lib.some_function.restype = ndpointer(dtype=c_int, shape=(10,))
I have a couple of questions that I have been fretting over for a while:
1) I have dynamically allocated memory here. Given that I am calling this function through the shared library and into python, do I cause a memory leak? My program is long running and I will likely call this function millions of times, so this is important.
2) Is there a better data structure I can be using? If this was a pure c++ function I would return a vector, but from googling around, this seems to be a non- ideal solution in python with ctypes. I also have other functions in the c++ library that call this function. Given that I have just written the function and am about to write the others, I know to delete[] the returned pointer after use in these functions. However, I am unsatisfied with the current situation, as if someone other than myself (or indeed myself in a few months) uses this function, there is a relatively high chance of future memory leaks.
Thanks!

Yes, you are leaking memory. It is not possible for the Python code to automatically free the pointed-to memory (since it has no idea how it was allocated). You need to provide a corresponding de-allocation function (to call delete[]) and tell Python how to call it (possibly using a wrapper framework as recommended by #RichardHidges).

You probably want to consider using either SWIG or boost::python
There's an example of converting a std::vector to a python list using boost::python here:
std::vector to boost::python::list
here is the link for swig:
http://www.swig.org

Is there a way to generate a c++ class from a python class and bind it a compile time?

Is there a way to generate a relatively clean c++ class from a python class and bind it at compile-time?
For instance, if I have this python class:
class CarDef:
acceleration = 1000.0
brake = 1500.0
inertia = acceleration * 0.1 * brake
def __init__(self):
pass
I'd like to have the corresponding c++ class:
class CarDef
{
public:
double acceleration;
double brake;
double inertia;
CarDef()
: acceleration( 1000.0 )
, brake( 1500.0 )
, inertia ( 150000.0 )
{};
};
The resulting c++ class could be different, as well as the original python class: I could use a "getter methods" paradigm instead of class attributes.
What I'm trying to achieve is to create resource files in python that I'll be able to use in my c++ application. The goal is to reduce as much as possible the amount of code the end-user will have to write to add and use parameters; and it must avoid string comparison during the "running phase" (it's allowed during the "initialization phase").
I'd like the user to have to enter the resource name only twice: once in the python class, and once in the place where the resource will be used in the c++, assuming that the "magic" is going to bind the two items (either at run-time (which I doubt could be done without string comparison) or at compile time (an in-between step generates c++ class before the project is compiled)). This is why I'm going from python to c++; I believe that going from c++ to python would require at least 2 python files: one that is generated and one that inherits from the latter (to avoid overwriting already specified resources).
The end-user use would look like this:
// A singleton definition manager
class DefManager
{
CarDef mCarDef;
public:
static DefManager& GetReference()
{
static DefManager instance;
return instance;
}
CarDef& getCarDef() { return mCarDef; }
};
// One would use the generated CarDef class like this:
mCar.setSpeed( mCar.getSpeed() + DefManager.GetReference().getCarDef().acceleration );
With this in mind, the python code is strictly outside of the c++ code.
One obvious problem I see is how to know what type a python attribute or method returns. I've seen a bit of examples of Cython, and it's seems to be able to use types (which is great!), but I haven't seen any examples where it could do what I need. Also, c generated code seems to still need Python.h and thus the cpython api libraries when compiling.
Is there any ways I could achieve this? Are there better way to do it?
I'm using python 3.2+.
I'm using MS Visual Studio 2010 (we plan to upgrade to 2013 soon).
I'm on a 32 bit system (but we plan to upgrade to 64 bit soon, OS and developed software).

There is a way to go from C++ to Python but I do not know of any way of going from Python to C++. If you don't mind writing your code in C++ first, you can use the tool SWIG to auto generated for you Python classes.
Do note there are a few limitations around exception handling. You can set up to have your Python code throw C++ exceptions but the type of exception can be lost in translation. You also need to pay attention to handling of reference counted objects. SWIG will generate reference counting for Python which can sometimes delete objects unexpectedly.
If you don't like using a tool such as SWIG, there is also Boost.Python for C++. Again, this is C++ for Python bindings and does not auto generate C++ from Python.

You could embed python in your C++ code or vice versa. There are tons of helper functions, though a little ugly, can be very powerful and might be able to accomplish what you want, though I'm not sure I'm entirely understanding your question. This doesn't require the cython api, but does still require Python.h.

There is a logical problem with doing what you ask.
Python is weakly typed.
In python one can even change the type of a certain data member during run time.
So say you have two objects of type
CarDef
Lets call them obj1 and obj2.
Lets say you have a setter:
setIntX(self):
self.x = 5
and lets say you also have a setter:
setStringX(self):
self.x = "5"
Then what type will member x have in your C++ class?
This can only be decided during run time, and more than one C++ class might be necessary to model one python class.
However a template class from python might be possible, and quite interesting actually.
Also maybe a general solution is not possible, but if you assume no member have ambiguous type it is possible.

Python extensions - performance

I am using Boost.Python to extend python program functionality. Python scripts do a lot of calls to native modules so I am really concerned about the performance of python-to-cpp type conversion and data marshaling.
I decided to try exposing methods natively through Python C API. May be somebody already tried that before ? Any success ... at least in theory ?
The problem I run into is that how to convert PyObject* back to class instance, PyArg_parse provides O& option, but what I am looking is simply a pointer to C++ object in memory... how can I get it in function ?
if ( PyArg_ParseTuple(args, "O", &pyTestClass ) )
{
// how to get TestClass from pyTestClass ??
}
Thanks

I haven't tried Boost.Python, but I've extended Python using raw C as well as Cython. I recommend Cython; if you're careful enough you can get code with the same efficiency as raw C but with a lot less boilerplate code.
Regarding efficiency, it's relative. It depends on what you want to do and how you do it. For example, what I've done very often is write the inner loop of some image processing or matrix operation in C, and have this function be called by Python with pointers to matrices as arguments. The matrices themselves don't get copied, so the overhead is minimal.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.