I have a device which can be controlled through a C++ class (https://github.com/stanleyseow/RF24/tree/master/RPi/RF24).
I'd like to be able to use this class in Python, and thought I could wrap it.
I found many ways to do it, but not much detailed documentation with examples. In particular, I found Boost, Cython, SWIG and the native C Python API.
Which one is the best method in which case ? And do you have some links to detailed documentations / examples about this ?
Thanks !
There is no "best"; it depends entirely on your circumstances.
For a single class, the native C Python API isn't too difficult,
but you do have to create an entire module, then the class. It
would be simpler if you exposed a procedural interface, rather
than a class. If you only have one instance of the device, this
would be an appropriate solution.
SWIG is very good for taking C++ class definitions and
generating a Python module which contains them. The resulting
code is relatively complex, since SWIG tries to cover all
possible versions of Python; for anything 2.7 or later (and
perhaps a little earlier), you can do everything directly in
C++, without any intermediate Python.
Boost makes extensive use of templates. This isn't really an
appropriate solution for the problem; it adds a lot of
complexity for something that is relatively simple if done with
external tools, rather than metaprogramming. Still, if the
underlying complexity doesn't scare you, it might not be that
hard to use.
I'm not familiar with Cython.
Globally, if all you have is one instance of one simple class,
using the native C API is probably no more difficult than the
other solutions, and introduces a minimum of added internal
complexity.
Related
I have a C++ library that I need to be able to interface with python. I read this question to understand the choice I need to adapt.
I saw SWIG and Cython and wanted to go with SWIG, mainly because my python programming experience is very minimal. However, I realise that with Swig I have to write an interface (.i extensions) for every class. Now, my C++ project is huge and I feel it will take me a lot of time to get the wrappers in place (or maybe I am wrong).
So right now since my application is large I need to make a choice. In the quoted thread I came across Boost Python. Now I can no longer decide and want input from people who can tell me the pros and cons of one over the other. Note my preference is on easy of use and how quickly can it be done. I am willing to compromise system performance for this. I would appreciate immensely if someone could provide me a SWIG implemented project or Boost Python implemented project link (a complete module instead of a sample tutorial would be much better !)
Boost::python provides a nearly wrapper-less interface between C++ and Python. It also allows you to write custom converters and other neat things that make the Python interfaces much nicer. The interfaces are pure C++, but they rely on templates and clever design patterns to make it look all nice and declarative. You also get the benefit of your connector code being checked by the compiler directly.
With Swig, you write interface declarations in Swig's own DSL, which takes a few days to get a hang of. In addition, it always inserts a wrapper layer, so it could be a bit slower. However, it does have the nice feature of automatically converting many things for ya without having to declare anything extra. The wrappers it generates are quite hard to debug though.
IMHO boost::python is the better choice, because you're working pretty directly with CPython's native C interfaces. I use Swig for Java and C++ interaction, because JNI is a bear, Python's C interface is actually quite usable all by itself.
If you already have a bunch of Swig wrappers, I would keep those because you'd have to redo all that work. However, starting a new project, or if you require maximum performance, boost::python all the way!
I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.
The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.
First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython
The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.
I would be interested to learn about large scale development in Python and especially in how do you maintain a large code base?
When you make incompatibility changes to the signature of a method, how do you find all the places where that method is being called. In C++/Java the compiler will find it for you, how do you do it in Python?
When you make changes deep inside the code, how do you find out what operations an instance provides, since you don't have a static type to lookup?
How do you handle/prevent typing errors (typos)?
Are UnitTest's used as a substitute for static type checking?
As you can guess I almost only worked with statically typed languages (C++/Java), but I would like to try my hands on Python for larger programs. But I had a very bad experience, a long time ago, with the clipper (dBase) language, which was also dynamically typed.
Don't use a screw driver as a hammer
Python is not a statically typed language, so don't try to use it that way.
When you use a specific tool, you use it for what it has been built. For Python, it means:
Duck typing : no type checking. Only behavior matters. Therefore your code must be designed to use this feature. A good design means generic signatures, no dependences between components, high abstraction levels.. So if you change anything, you won't have to change the rest of the code. Python will not complain either, that what it has been built for. Types are not an issue.
Huge standard library. You do not need to change all your calls in the program if you use standard features you haven't coded yourself. And Python come with batteries included. I keep discovering them everyday. I had no idea of the number of modules I could use when I started and tried to rewrite existing stuff like everybody. It's OK, you can't get it all right from the beginning.
You don't write Java, C++, Python, PHP, Erlang, whatever, the same way. They are good reasons why there is room for each of so many different languages, they do not do the same things.
Unit tests are not a substitute
Unit tests must be performed with any language. The most famous unit test library (JUnit) is from the Java world!
This has nothing to do with types. You check behaviors, again. You avoid trouble with regression. You ensure your customer you are on tracks.
Python for large scale projects
Languages, libraries and frameworks
don't scale. Architectures do.
If you design a solid architecture, if you are able to make it evolves quickly, then it will scale. Unit tests help, automatic code check as well. But they are just safety nets. And small ones.
Python is especially suitable for large projects because it enforces some good practices and has a lot of usual design patterns built-in. But again, do not use it for what it is not designed. E.g : Python is not a technology for CPU intensive tasks.
In a huge project, you will most likely use several different technologies anyway. As a SGBD (French for DBMS) and a templating language, or else. Python is no exception.
You will probably want to use C/C++ for the part of your code you need to be fast. Or Java to fit in a Tomcat environment. Don't know, don't care. Python can play well with these.
As a conclusion
My answer may feel a bit rude, but don't get me wrong: this is a very good question.
A lot of people come to Python with old habits. I screwed myself trying to code Java like Python. You can, but will never get the best of it.
If you have played / want to play with Python, it's great! It's a wonderful tool. But just a tool, really.
I had some experience with modifying "Frets On Fire", an open source python "Guitar Hero" clone.
as I see it, python is not really suitable for a really large scale project.
I found myself spending a large part of the development time debugging issues related to assignment of incompatible types, things that static typed laguages will reveal effortlessly at compile-time.
also, since types are determined on run-time, trying to understand existing code becomes harder, because you have no idea what's the type of that parameter you are currently looking at.
in addition to that, calling functions using their name string with the __getattr__ built in function is generally more common in Python than in other programming languages, thus getting the call graph to a certain function somewhat hard (although you can call functions with their name in some statically typed languages as well).
I think that Python really shines in small scale software, rapid prototype development, and gluing existing programs together, but I would not use it for large scale software projects, since in those types of programs maintainability becomes the real issue, and in my opinion python is relatively weak there.
Since nobody pointed out pychecker, pylint and similar tools, I will: pychecker and pylint are tools that can help you find incorrect assumptions (about function signatures, object attributes, etc.) They won't find everything that a compiler might find in a statically typed language -- but they can find problems that such compilers for such languages can't find, too.
Python (and any dynamically typed language) is fundamentally different in terms of the errors you're likely to cause and how you would detect and fix them. It has definite downsides as well as upsides, but many (including me) would argue that in Python's case, the ease of writing code (and the ease of making it structurally sound) and of modifying code without breaking API compatibility (adding new optional arguments, providing different objects that have the same set of methods and attributes) make it suitable just fine for large codebases.
my 0.10 EUR:
i have several python application in 'production'-state. our company use java, c++ and python. we develop with the eclipse ide (pydev for python)
unittests are the key-solution for the problem. (also for c++ and java)
the less secure world of "dynamic-typing" will make you less careless about your code quality
BY THE WAY:
large scale development doesn't mean, that you use one single language!
large scale development often uses a handful of languages specific to the problem.
so i agree to the-hammer-problem :-)
PS: static-typing & python
Here are some items that have helped me maintain a fairly large system in python.
Structure your code in layers. i.e separate biz logic, presentation logic and your persistence layers. Invest a bit of time in defining these layers and make sure everyone on the project is brought in. For large systems creating a framework that forces you into a certain way of development can be key as well.
Tests are key, without unit tests you will likely end up with an unmanagable code base several times quicker than with other languages. Keep in mind that unit tests are often not sufficient, make sure to have several integration/acceptance tests you can run quickly after any major change.
Use Fail Fast principle. Add assertions for cases you feel your code maybe vulnerable.
Have standard logging/error handling that will help you quickly navigate to the issue
Use an IDE( pyDev works for me) that provides type ahead, pyLint/Checker integration to help you detect common typos right away and promote some coding standards
Carefull about your imports, never do from x import * or do relative imports without use of .
Do refactor, a search/replace tool with regular expressions is often all you need to do move methods/class type refactoring.
Incompatible changes to the signature of a method. This doesn't happen as much in Python as it does in Java and C++.
Python has optional arguments, default values, and far more flexibility in defining method signatures. Also, duck typing means that -- for example -- you don't have to switch from some class to an interface as part of a significant software change. Things just aren't as complex.
How do you find all the places where that method is being called? grep works for dynamic languages. If you need to know every place a method is used, grep (or equivalent IDE-supported search) works great.
How do you find out what operations an instance provides, since you don't have a static type to lookup?
a. Look at the source. You don't have the Java/C++ problem of object libraries and jar files to contend with. You don't need all the elaborate aids and tools that those languages require.
b. An IDE can provide signature information under many common circumstances. You can, easily, defeat your IDE's reasoning powers. When that happens, you should probably review what you're doing to be sure it makes sense. If your IDE can't reason out your type information, perhaps it's too dynamic.
c. In Python, you often work through the interactive interpreter. Unlike Java and C++, you can explore your instances directly and interactively. You don't need a sophisticated IDE.
Example:
>>> x= SomeClass()
>>> dir(x)
How do you handle/prevent typing errors? Same as static languages: you don't prevent them. You find and correct them. Java can only find a certain class of typos. If you have two similar class or variable names, you can wind up in deep trouble, even with static type checking.
Example:
class MyClass { }
class MyClassx extends MyClass { }
A typo with these two class names can cause havoc. ["But I wouldn't put myself in that position with Java," folks say. Agreed. I wouldn't put myself in that position with Python, either; you make classes that are profoundly different, and will fail early if they're misused.]
Are UnitTest's used as a substitute for static type checking? Here's the other Point of view: static type checking is a substitute for clear, simple design.
I've worked with programmers who weren't sure why an application worked. They couldn't figure out why things didn't compile; the didn't know the difference between abstract superclass and interface, and the couldn't figure out why a change in place makes a bunch of other modules in a separate JAR file crash. The static type checking gave them false confidence in a flawed design.
Dynamic languages allow programs to be simple. Simplicity is a substitute for static type checking. Clarity is a substitute for static type checking.
My general rule of thumb is to use dynamic languages for small non-mission-critical projects and statically-typed languages for big projects. I find that code written in a dynamic language such as python gets "tangled" more quickly. Partly that is because it is much quicker to write code in a dynamic language and that leads to shortcuts and worse design, at least in my case. Partly it's because I have IntelliJ for quick and easy refactoring when I use Java, which I don't have for python.
The usual answer to that is testing testing testing. You're supposed to have an extensive unit test suite and run it often, particularly before a new version goes online.
Proponents of dynamically typed languages make the case that you have to test anyway because even in a statically typed language conformance to the crude rules of the type system covers only a small part of what can potentially go wrong.
I'll have couple of python functions I must interface with from the assembly code. The solution doesn't need to be a complete solution because I'm not going to interface with python code for too long. Anyway, I chewed it a bit:
What does a python object look like in memory?
How can I call a python function?
How can I pass python objects as python objects for ctypes interface?
Could ctypes interface make my work easier in any alternative way?
You will want to read and understand Extending and Embedding the Python Interpreter and the Python/C API Reference Manual. This describes how to interface with Python from C. Everything you can do in C you can equivalently do in assembly code too, but you're on your own for this as it is not directly described from a Python perspective.
It's certainly doable, but you'd have a much easier time reading the C API docs and writing a go-between function in C.
Come to think of it, C is highly recommended, since it may be hard to tell which of the routines you're calling might be implemented as preprocessor macros.
In python, under what circumstances is SWIG a better choice than ctypes for calling entry points in shared libraries? Let's assume you don't already have the SWIG interface file(s). What are the performance metrics of the two?
I have a rich experience of using swig. SWIG claims that it is a rapid solution for wrapping things. But in real life...
Cons:
SWIG is developed to be general, for everyone and for 20+ languages. Generally, it leads to drawbacks:
- needs configuration (SWIG .i templates), sometimes it is tricky,
- lack of treatment of some special cases (see python properties further),
- lack of performance for some languages.
Python cons:
1) Code style inconsistency. C++ and python have very different code styles (that is obvious, certainly), the possibilities of a swig of making target code more Pythonish is very limited. As an example, it is butt-heart to create properties from getters and setters. See this q&a
2) Lack of broad community. SWIG has some good documentation. But if one caught something that is not in the documentation, there is no information at all. No blogs nor googling helps. So one has to heavily dig SWIG generated code in such cases... That is terrible, I could say...
Pros:
In simple cases, it is really rapid, easy and straight forward
If you produced swig interface files once, you can wrap this C++ code to ANY of other 20+ languages (!!!).
One big concern about SWIG is a performance. Since version 2.04 SWIG includes '-builtin' flag which makes SWIG even faster than other automated ways of wrapping. At least some benchmarks shows this.
When to USE SWIG?
So I concluded for myself two cases when the swig is good to use:
2) If one needs to wrap C++ code for several languages. Or if potentially there could be a time when one needs to distribute the code for several languages. Using SWIG is reliable in this case.
1) If one needs to rapidly wrap just several functions from some C++ library for end use.
Live experience
Update :
It is a year and a half passed as we did a conversion of our library by using SWIG.
First, we made a python version. There were several moments when we experienced troubles with SWIG - it is true. But right now we expanded our library to Java and .NET. So we have 3 languages with 1 SWIG. And I could say that SWIG rocks in terms of saving a LOT of time.
Update 2:
It is two years as we use SWIG for this library. SWIG is integrated into our build system. Recently we had major API change of C++ library. SWIG worked perfectly. The only thing we needed to do is to add several %rename to .i files so our CppCamelStyleFunctions() now looks_more_pythonish in python. First I was concerned about some problems that could arise, but nothing went wrong. It was amazing. Just several edits and everything distributed in 3 languages. Now I am confident that it was a good solution to use SWIG in our case.
Update 3:
It is 3+ years we use SWIG for our library. Major change: python part was totally rewritten in pure python. The reason is that Python is used for the majority of applications of our library now. Even if the pure python version works slower than C++ wrapping, it is more convenient for users to work with pure python, not struggling with native libraries.
SWIG is still used for .NET and Java versions.
The Main question here "Would we use SWIG for python if we started the project from the beginning?". We would! SWIG allowed us to rapidly distribute our product to many languages. It worked for a period of time which gave us the opportunity for better understanding our users requirements.
SWIG generates (rather ugly) C or C++ code. It is straightforward to use for simple functions (things that can be translated directly) and reasonably easy to use for more complex functions (such as functions with output parameters that need an extra translation step to represent in Python.) For more powerful interfacing you often need to write bits of C as part of the interface file. For anything but simple use you will need to know about CPython and how it represents objects -- not hard, but something to keep in mind.
ctypes allows you to directly access C functions, structures and other data, and load arbitrary shared libraries. You do not need to write any C for this, but you do need to understand how C works. It is, you could argue, the flip side of SWIG: it doesn't generate code and it doesn't require a compiler at runtime, but for anything but simple use it does require that you understand how things like C datatypes, casting, memory management and alignment work. You also need to manually or automatically translate C structs, unions and arrays into the equivalent ctypes datastructure, including the right memory layout.
It is likely that in pure execution, SWIG is faster than ctypes -- because the management around the actual work is done in C at compiletime rather than in Python at runtime. However, unless you interface a lot of different C functions but each only a few times, it's unlikely the overhead will be really noticeable.
In development time, ctypes has a much lower startup cost: you don't have to learn about interface files, you don't have to generate .c files and compile them, you don't have to check out and silence warnings. You can just jump in and start using a single C function with minimal effort, then expand it to more. And you get to test and try things out directly in the Python interpreter. Wrapping lots of code is somewhat tedious, although there are attempts to make that simpler (like ctypes-configure.)
SWIG, on the other hand, can be used to generate wrappers for multiple languages (barring language-specific details that need filling in, like the custom C code I mentioned above.) When wrapping lots and lots of code that SWIG can handle with little help, the code generation can also be a lot simpler to set up than the ctypes equivalents.
CTypes is very cool and much easier than SWIG, but it has the drawback that poorly or malevolently-written python code can actually crash the python process. You should also consider boost python. IMHO it's actually easier than swig while giving you more control over the final python interface. If you are using C++ anyway, you also don't add any other languages to your mix.
In my experience, ctypes does have a big disadvantage: when something goes wrong (and it invariably will for any complex interfaces), it's a hell to debug.
The problem is that a big part of your stack is obscured by ctypes/ffi magic and there is no easy way to determine how did you get to a particular point and why parameter values are what they are..
You can also use Pyrex, which can act as glue between high-level Python code and low-level C code. lxml is written in Pyrex, for instance.
ctypes is great, but does not handle C++ classes. I've also found ctypes is about 10% slower than a direct C binding, but that will highly depend on what you are calling.
If you are going to go with ctypes, definitely check out the Pyglet and Pyopengl projects, that have massive examples of ctype bindings.
I'm going to be contrarian and suggest that, if you can, you should write your extension library using the standard Python API. It's really well-integrated from both a C and Python perspective... if you have any experience with the Perl API, you will find it a very pleasant surprise.
Ctypes is nice too, but as others have said, it doesn't do C++.
How big is the library you're trying to wrap? How quickly does the codebase change? Any other maintenance issues? These will all probably affect the choice of the best way to write the Python bindings.
Just wanted to add a few more considerations that I didn't see mentioned yet.
[EDIT: Ooops, didn't see Mike Steder's answer]
If you want to try using a non Cpython implementation (like PyPy, IronPython or Jython), then ctypes is about the only way to go. PyPy doesn't allow writing C-extensions, so that rules out pyrex/cython and Boost.python. For the same reason, ctypes is the only mechanism that will work for IronPython and (eventually, once they get it all working) jython.
As someone else mentioned, no compilation is required. This means that if a new version of the .dll or .so comes out, you can just drop it in, and load that new version. As long as the none of the interfaces changed, it's a drop in replacement.
Something to keep in mind is that SWIG targets only the CPython implementation. Since ctypes is also supported by the PyPy and IronPython implementations it may be worth writing your modules with ctypes for compatibility with the wider Python ecosystem.
I have found SWIG to be be a little bloated in its approach (in general, not just Python) and difficult to implement without having to cross the sore point of writing Python code with an explicit mindset to be SWIG friendly, rather than writing clean well-written Python code. It is, IMHO, a much more straightforward process to write C bindings to C++ (if using C++) and then use ctypes to interface to any C layer.
If the library you are interfacing to has a C interface as part of the library, another advantage of ctypes is that you don't have to compile a separate python-binding library to access third-party libraries. This is particularly nice in formulating a pure-python solution that avoids cross-platform compilation issues (for those third-party libs offered on disparate platforms). Having to embed compiled code into a package you wish to deploy on something like PyPi in a cross-platform friendly way is a pain; one of my most irritating points about Python packages using SWIG or underlying explicit C code is their general inavailability cross-platform. So consider this if you are working with cross-platform available third party libraries and developing a python solution around them.
As a real-world example, consider PyGTK. This (I believe) uses SWIG to generate C code to interface to the GTK C calls. I used this for the briefest time only to find it a real pain to set up and use, with quirky odd errors if you didn't do things in the correct order on setup and just in general. It was such a frustrating experience, and when I looked at the interace definitions provided by GTK on the web I realized what a simple excercise it would be to write a translator of those interface to python ctypes interface. A project called PyGGI was born, and in ONE day I was able to rewrite PyGTK to be a much more functiona and useful product that matches cleanly to the GTK C-object-oriented interfaces. And it required no compilation of C-code making it cross-platform friendly. (I was actually after interfacing to webkitgtk, which isn't so cross-platform). I can also easily deploy PyGGI to any platform supporting GTK.