Interfacing C/C++ libraries with Python

Interfacing C/C++ libraries with Python - python

I have a C++ library that I need to be able to interface with python. I read this question to understand the choice I need to adapt.
I saw SWIG and Cython and wanted to go with SWIG, mainly because my python programming experience is very minimal. However, I realise that with Swig I have to write an interface (.i extensions) for every class. Now, my C++ project is huge and I feel it will take me a lot of time to get the wrappers in place (or maybe I am wrong).
So right now since my application is large I need to make a choice. In the quoted thread I came across Boost Python. Now I can no longer decide and want input from people who can tell me the pros and cons of one over the other. Note my preference is on easy of use and how quickly can it be done. I am willing to compromise system performance for this. I would appreciate immensely if someone could provide me a SWIG implemented project or Boost Python implemented project link (a complete module instead of a sample tutorial would be much better !)

Boost::python provides a nearly wrapper-less interface between C++ and Python. It also allows you to write custom converters and other neat things that make the Python interfaces much nicer. The interfaces are pure C++, but they rely on templates and clever design patterns to make it look all nice and declarative. You also get the benefit of your connector code being checked by the compiler directly.
With Swig, you write interface declarations in Swig's own DSL, which takes a few days to get a hang of. In addition, it always inserts a wrapper layer, so it could be a bit slower. However, it does have the nice feature of automatically converting many things for ya without having to declare anything extra. The wrappers it generates are quite hard to debug though.
IMHO boost::python is the better choice, because you're working pretty directly with CPython's native C interfaces. I use Swig for Java and C++ interaction, because JNI is a bear, Python's C interface is actually quite usable all by itself.
If you already have a bunch of Swig wrappers, I would keep those because you'd have to redo all that work. However, starting a new project, or if you require maximum performance, boost::python all the way!

Related

Wrapping C++ class to use in Python

I have a device which can be controlled through a C++ class (https://github.com/stanleyseow/RF24/tree/master/RPi/RF24).
I'd like to be able to use this class in Python, and thought I could wrap it.
I found many ways to do it, but not much detailed documentation with examples. In particular, I found Boost, Cython, SWIG and the native C Python API.
Which one is the best method in which case ? And do you have some links to detailed documentations / examples about this ?
Thanks !

There is no "best"; it depends entirely on your circumstances.
For a single class, the native C Python API isn't too difficult,
but you do have to create an entire module, then the class. It
would be simpler if you exposed a procedural interface, rather
than a class. If you only have one instance of the device, this
would be an appropriate solution.
SWIG is very good for taking C++ class definitions and
generating a Python module which contains them. The resulting
code is relatively complex, since SWIG tries to cover all
possible versions of Python; for anything 2.7 or later (and
perhaps a little earlier), you can do everything directly in
C++, without any intermediate Python.
Boost makes extensive use of templates. This isn't really an
appropriate solution for the problem; it adds a lot of
complexity for something that is relatively simple if done with
external tools, rather than metaprogramming. Still, if the
underlying complexity doesn't scare you, it might not be that
hard to use.
I'm not familiar with Cython.
Globally, if all you have is one instance of one simple class,
using the native C API is probably no more difficult than the
other solutions, and introduces a minimum of added internal
complexity.

Are there advantages to use the Python/C interface instead of Cython?

I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.

The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.

First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython

The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.

Python vs C : Line of Code Comparison vs Dev Time

Hi I'm currently learning Python since the syntax feels so succinct and the idioms match well with my mental model.
However I'm also interested in learning about OS internals and reverse engineering software, which ultimately means knowing C in a rather thorough capacity.
When originally picking a language I did lots of reading and comparisons, and it seems that a number thrown out a lot is that to write short idiomatic statements in Python would require the equivalent of a few hundred lines of C (I'd guess code for memory management, writing the code for dictionaries,lists etc) that we take for granted as built into the Python language.
1) With an average C programmer, is that 100-200 lines of code per Python idiom anywhere near accurate?
Because C doesn't come built-in with Python-like constructs such as dictionaries/lists(with all their nice methods etc):
2) Do C programmers tend to build these constructs from scratch and then re-use them between projects to greatly reduce the actual amount of hand coding for their projects?
I assume re-using libraries like boost:: stuff also again, reduces some of the boilerplate hand coding also...
3) But does using popular libraries and re-using common code one has written before in C for basic constructs/etc, how much does that revise the lines of code written in C compared to the code in Python of a enthusiast sized code base?
I know specific numbers aren't possible, but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python without being a Linus Torvalds style coding machine?
Thanks!

but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python
No.
You've missed the most important point.
Python's interactive. It's not edit-compile-link-execute-break-debug. It's edit-debug.

Boost is C++, not C (emphatically not C -- virtually all of it makes heavy use of templates and such that aren't part of C).
Yes, C programmers tend to build up personal libraries of code for all sorts of "stuff" -- data structures, algorithms, user interfaces, and so on. There are also a fair number of other libraries for everything from basic string manipulation to database connectivity, user interfaces, basic algorithms and data structures, etc.
Comparing productivity between the two can be difficult though -- even if something can be done in one line of code either way, there's a greater chance that the C programmer will end up doing extra work to find and learn to use that particular library. OTOH, if he has used it before, the two might be directly competitive of (in a few cases) C might be more productive.
I'd guess Python ends up more productive more often, but trying to guess how much so is difficult (and lines of code usually won't be a good indication either).

As I did serious c programming I read a book that claimed libraries are worth to write. (Especially in C which considered a low level language)
Libraries are build for reuse.
If you use libraries you write one line like detectFace( faceDesriptor ) or renderPDF( document) is doesn't matter whether an idiom in another language is more concise or not.
Lines of code isn't a proper metric if it is about what would more efficient.

It depends.
Try to write an interrupt handler in python. Someone could probably make it work but it's going to be a dancing bear, the dancing is not good but it is surprising that a bear can do it. Want to write an OS or do some embedded programming you're not going to be able to use python. It's telling that the main python implementation is written in C.
That being said I'm amazed at some of the low-level stuff that you can do with python. The high-level stuff is almost a given if you're measuring lines of code. Python is just a higher-level language.
They are both very useful tools, just for different types of projects. Knowing both would be very useful, particularly when you need to interface to some new functionality in python that doesn't yet have a python binding.
For the types of projects most developers work on python is going to be more consice and quicker to write and debug. You may be able to make a library of reusable C code, but a good python programmer will be doing the same thing with their python code, at a higher level.

I think Python is more productive for small projects (up to a few thousand lines of code).
On the other hand, C is better suited for large projects (even though IMHO there are better languages for that, such as Ada): static type-checking allows to find many errors at compile time that are much more difficult to detect at run-time, especially in a large program.
In a larger C project, the lack of lists and other powerful data structures that are found in Python can be compensated by implementing or using custom libraries. I agree with user stacker that by using well-designed libraries your C code can be pretty concise.

Depends greatly on the task and the size of the project. For many small interesting tasks, I would not be surprised by 100:1 smaller Python code simply because the standard libraries are extremely good. If you find, buy, or build C/C++ libraries that do what you want, I imagine the ratio would be much more like 3:1 on big projects.
However, finding, buying, and building C/C++ libraries does take time and effort, so I believe in the vast majority of cases, Python is going to be much faster to develop in.

Writing bindings and wrappers

I keep seeing people writing wrappers for, say a module written in X language to use it in Y language. I wanted to know the basics of writing such wrappers. Where does one start from? My question here is more specific for libgnokii, how do I begin to write python bindings for it.

You can start with reading this: extending python with c or c++ And then when you decide that it's too much hassle, you can check out swig or possibly Boost.Python.
ctypes may also be useful.
I've done manual wrapping of c++ classes and I've used swig. swig was much easier to use, but in the end I wanted to do stuff that wasn't easily done (or I was just too lazy to figure out how). So i ended up doing manual wrapping. It's a bit of work but if you know a bit of C, it's very doable.

You can start by looking here for information on extending Python with C. You'll probably want to think about how to translate libgnokii's API into something Pythonic while you're at it. If you don't want to do a lot of work, you can just write a thin wrapper that translates all the gnokii API calls into Python functions.

Python: SWIG vs ctypes

In python, under what circumstances is SWIG a better choice than ctypes for calling entry points in shared libraries? Let's assume you don't already have the SWIG interface file(s). What are the performance metrics of the two?

I have a rich experience of using swig. SWIG claims that it is a rapid solution for wrapping things. But in real life...
Cons:
SWIG is developed to be general, for everyone and for 20+ languages. Generally, it leads to drawbacks:
- needs configuration (SWIG .i templates), sometimes it is tricky,
- lack of treatment of some special cases (see python properties further),
- lack of performance for some languages.
Python cons:
1) Code style inconsistency. C++ and python have very different code styles (that is obvious, certainly), the possibilities of a swig of making target code more Pythonish is very limited. As an example, it is butt-heart to create properties from getters and setters. See this q&a
2) Lack of broad community. SWIG has some good documentation. But if one caught something that is not in the documentation, there is no information at all. No blogs nor googling helps. So one has to heavily dig SWIG generated code in such cases... That is terrible, I could say...
Pros:
In simple cases, it is really rapid, easy and straight forward
If you produced swig interface files once, you can wrap this C++ code to ANY of other 20+ languages (!!!).
One big concern about SWIG is a performance. Since version 2.04 SWIG includes '-builtin' flag which makes SWIG even faster than other automated ways of wrapping. At least some benchmarks shows this.
When to USE SWIG?
So I concluded for myself two cases when the swig is good to use:
2) If one needs to wrap C++ code for several languages. Or if potentially there could be a time when one needs to distribute the code for several languages. Using SWIG is reliable in this case.
1) If one needs to rapidly wrap just several functions from some C++ library for end use.
Live experience
Update :
It is a year and a half passed as we did a conversion of our library by using SWIG.
First, we made a python version. There were several moments when we experienced troubles with SWIG - it is true. But right now we expanded our library to Java and .NET. So we have 3 languages with 1 SWIG. And I could say that SWIG rocks in terms of saving a LOT of time.
Update 2:
It is two years as we use SWIG for this library. SWIG is integrated into our build system. Recently we had major API change of C++ library. SWIG worked perfectly. The only thing we needed to do is to add several %rename to .i files so our CppCamelStyleFunctions() now looks_more_pythonish in python. First I was concerned about some problems that could arise, but nothing went wrong. It was amazing. Just several edits and everything distributed in 3 languages. Now I am confident that it was a good solution to use SWIG in our case.
Update 3:
It is 3+ years we use SWIG for our library. Major change: python part was totally rewritten in pure python. The reason is that Python is used for the majority of applications of our library now. Even if the pure python version works slower than C++ wrapping, it is more convenient for users to work with pure python, not struggling with native libraries.
SWIG is still used for .NET and Java versions.
The Main question here "Would we use SWIG for python if we started the project from the beginning?". We would! SWIG allowed us to rapidly distribute our product to many languages. It worked for a period of time which gave us the opportunity for better understanding our users requirements.

SWIG generates (rather ugly) C or C++ code. It is straightforward to use for simple functions (things that can be translated directly) and reasonably easy to use for more complex functions (such as functions with output parameters that need an extra translation step to represent in Python.) For more powerful interfacing you often need to write bits of C as part of the interface file. For anything but simple use you will need to know about CPython and how it represents objects -- not hard, but something to keep in mind.
ctypes allows you to directly access C functions, structures and other data, and load arbitrary shared libraries. You do not need to write any C for this, but you do need to understand how C works. It is, you could argue, the flip side of SWIG: it doesn't generate code and it doesn't require a compiler at runtime, but for anything but simple use it does require that you understand how things like C datatypes, casting, memory management and alignment work. You also need to manually or automatically translate C structs, unions and arrays into the equivalent ctypes datastructure, including the right memory layout.
It is likely that in pure execution, SWIG is faster than ctypes -- because the management around the actual work is done in C at compiletime rather than in Python at runtime. However, unless you interface a lot of different C functions but each only a few times, it's unlikely the overhead will be really noticeable.
In development time, ctypes has a much lower startup cost: you don't have to learn about interface files, you don't have to generate .c files and compile them, you don't have to check out and silence warnings. You can just jump in and start using a single C function with minimal effort, then expand it to more. And you get to test and try things out directly in the Python interpreter. Wrapping lots of code is somewhat tedious, although there are attempts to make that simpler (like ctypes-configure.)
SWIG, on the other hand, can be used to generate wrappers for multiple languages (barring language-specific details that need filling in, like the custom C code I mentioned above.) When wrapping lots and lots of code that SWIG can handle with little help, the code generation can also be a lot simpler to set up than the ctypes equivalents.

CTypes is very cool and much easier than SWIG, but it has the drawback that poorly or malevolently-written python code can actually crash the python process. You should also consider boost python. IMHO it's actually easier than swig while giving you more control over the final python interface. If you are using C++ anyway, you also don't add any other languages to your mix.

In my experience, ctypes does have a big disadvantage: when something goes wrong (and it invariably will for any complex interfaces), it's a hell to debug.
The problem is that a big part of your stack is obscured by ctypes/ffi magic and there is no easy way to determine how did you get to a particular point and why parameter values are what they are..

You can also use Pyrex, which can act as glue between high-level Python code and low-level C code. lxml is written in Pyrex, for instance.

ctypes is great, but does not handle C++ classes. I've also found ctypes is about 10% slower than a direct C binding, but that will highly depend on what you are calling.
If you are going to go with ctypes, definitely check out the Pyglet and Pyopengl projects, that have massive examples of ctype bindings.

I'm going to be contrarian and suggest that, if you can, you should write your extension library using the standard Python API. It's really well-integrated from both a C and Python perspective... if you have any experience with the Perl API, you will find it a very pleasant surprise.
Ctypes is nice too, but as others have said, it doesn't do C++.
How big is the library you're trying to wrap? How quickly does the codebase change? Any other maintenance issues? These will all probably affect the choice of the best way to write the Python bindings.

Just wanted to add a few more considerations that I didn't see mentioned yet.
[EDIT: Ooops, didn't see Mike Steder's answer]
If you want to try using a non Cpython implementation (like PyPy, IronPython or Jython), then ctypes is about the only way to go. PyPy doesn't allow writing C-extensions, so that rules out pyrex/cython and Boost.python. For the same reason, ctypes is the only mechanism that will work for IronPython and (eventually, once they get it all working) jython.
As someone else mentioned, no compilation is required. This means that if a new version of the .dll or .so comes out, you can just drop it in, and load that new version. As long as the none of the interfaces changed, it's a drop in replacement.

Something to keep in mind is that SWIG targets only the CPython implementation. Since ctypes is also supported by the PyPy and IronPython implementations it may be worth writing your modules with ctypes for compatibility with the wider Python ecosystem.

I have found SWIG to be be a little bloated in its approach (in general, not just Python) and difficult to implement without having to cross the sore point of writing Python code with an explicit mindset to be SWIG friendly, rather than writing clean well-written Python code. It is, IMHO, a much more straightforward process to write C bindings to C++ (if using C++) and then use ctypes to interface to any C layer.
If the library you are interfacing to has a C interface as part of the library, another advantage of ctypes is that you don't have to compile a separate python-binding library to access third-party libraries. This is particularly nice in formulating a pure-python solution that avoids cross-platform compilation issues (for those third-party libs offered on disparate platforms). Having to embed compiled code into a package you wish to deploy on something like PyPi in a cross-platform friendly way is a pain; one of my most irritating points about Python packages using SWIG or underlying explicit C code is their general inavailability cross-platform. So consider this if you are working with cross-platform available third party libraries and developing a python solution around them.
As a real-world example, consider PyGTK. This (I believe) uses SWIG to generate C code to interface to the GTK C calls. I used this for the briefest time only to find it a real pain to set up and use, with quirky odd errors if you didn't do things in the correct order on setup and just in general. It was such a frustrating experience, and when I looked at the interace definitions provided by GTK on the web I realized what a simple excercise it would be to write a translator of those interface to python ctypes interface. A project called PyGGI was born, and in ONE day I was able to rewrite PyGTK to be a much more functiona and useful product that matches cleanly to the GTK C-object-oriented interfaces. And it required no compilation of C-code making it cross-platform friendly. (I was actually after interfacing to webkitgtk, which isn't so cross-platform). I can also easily deploy PyGGI to any platform supporting GTK.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.