Using embedded C library in Python emulation - python

Short Question
Which would be easier to emulate (in Python) a complex (SAE J1939) communication stack from an existing embedded C library:
1) Full port - meaning manually convert all of the C functions to python modules
2) Wrap the stack in a Python wrapper - meaning call the real c code in Python
Background Information
I have already written small portions of this stack in Python, however they are very non-trival to implement with 100% coverage. Because of this very reason, we have recently purchased an off the shelf SAE J1939 stack for our embedded platforms. To clarify, I know that portions touching the hardware layer will have to be re-created and mapped to the PC's CAN drivers.
I am hoping to find someone here on SO that has or even looked into porting a 5k LOC C library to Python. If there are any C to Python tools that work well that would be helpful for me to look into as well.

My advice would be to wrap it.
Reasons for that:
if you convert function by function, you'll introduce new bugs (we're just human) and this kind of stuff is pretty hard to test
wrapping for python is done easily, using swig or even ctypes to load a dll on the fly, you'll find tons of tutorial
if your lib gets updated, you have less impact in the long term.
However, you need to
check that the license you purchase allows you to do that
know that having same implementation on embedded and PC side, it won't help tracking bugs
you might have a bit less portability than a full python implementation (anyway, not much of a point for you as your low layer needs to be rewritten per target)

Definitely wrap it. It might be as easy are running ctypesgen.py and then using it. Check this blog article about using ctypesgen to create a wrapper for libreadline http://wavetossed.blogspot.com/2011/07/asynchronous-gnu-readline.html in order to get access to the full API.

Related

Porting an old fortran program to work with python+numpy [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I am supposed to be doing research with this huge Fortran 77 program (which I recently ported to Fortran 90 superficially). It is a very old piece of software used for modeling using finite element methods.
It is a monstrosity. It is roughly 240,000 lines.
Since it began its life in Fortran 77, it uses some really dirty hacks for dynamic memory allocation; basically it uses the functions from the C standard library, mixed programming with C and Fortran. I am yet to fully grasp how allocation works. The program is built to be easily extendable by the user, and the user generally needs to allocate some globally accessible arrays for later use. This is done by having an array of memory addresses, which point to the beginning addresses of dynamically allocable arrays. Of course, which element of the address array pointing to which information all depends on conventions which has to be learned by the user, before one can start to really program. There are two address arrays, one for integers, and the other for floating points.
By dirty hacks, I mean inconsistent ones. For example an update in the optimization algorithm of the GNU compilers caused the program to exit with random memory leaks.
The program is far from elegant. Global variable names are generally short (3-4 characters) and cryptic. Passing data across routines is of course accomplished by using common blocks, which include all program switches, and the aforementioned arrays.
The usage of the program is roughly like that of an interactive shell, albeit a stupid one. First, an input file is read by the program itself, then per choice, the user is dropped into a pseudo-shell, in which the user has to type 4 character wide commands, followed by the parameters. The parser then parses the command, and corresponding subroutine is called with the parameters. You would guess that there is a loop structure in this pseudo-parser (a goto bonanza, rather) which wraps the subroutine behavior in a manner more complex than it should be in the 21st century.
The format of the input file is the same (commands, then parameters), since it is the same parser. But the syntax is not really consistent (by that, I mean it lacks control structures, and some commands cause the finite state machine to do behavior that contradict with other commands; it lacks definite grammar), time to time causing the end user to discover pitfalls. The user must learn these pitfalls by experience; I did not see them in any documentation of the program. This is a problem that can easily be avoided with python, and it is not even necessary to implement a parser.
What I want to do:
Port parts of the program into python, namely the parts that don't have anything to do with numerical computation. This includes
cleaning up and abstracting the API with an OOP approach in python,
giving meaningful variable names,
migrating dynamic allocation to either numpy or Fortran 90 and losing the C part,
migrating non-numerical execution to python, and wrap the numerical objects using f2py, so there is no loss in performance. Have I told that the program is damn fast in its current state? Hopefully porting the calls to numerical subroutines and I/O to python will not slow it down to an impractical level (or will it?).
Making use of python's interactive shell as a replacement for the pseudo-shell. This way, there will not be any inconsistencies for the end user. The aforementioned commands will be simply replaced by functions defined in python. This will allow the user to actually access the data. Plus, the user will be able to extend the program without going to deep.
What I wonder:
Is f2py suitable and up-to this task of wrapping numerous subroutines and common blocks without any confusion? I have only seen single-file examples on the net for f2py; I know that numpy has used it to wrap LAPACK and stuff, but I need reassurance that f2py is a tool consistent enough for this task.
Whether there are any suggestions on the general strategy that I should follow, or pitfalls I should avoid.
How can & should I implement a system in this python-wrapped Fortran 90 environment, so that I will be able to modify (allocate and assign) globally accessible arrays and variables inside fortran routines. This should preferably omit address arrays and I should preferably be able to inject verbal representations into the namespaces. These variables should preferably be accessible inside both python and fortran.
Notes:
I may have been asking for too much, something beyond the boundaries of the possible realm. In this case, please forgive me for I am a beginner with this aspect of programming; and don't hesitate to correct me.
The "program" I have been talking about is open source but it is commercial and the license does not allow its distribution, so I decided not to mention its name. However, you could deduce it from the 2nd sentence and the description I gave throughout.
I'm doing something depressingly similar. Instead of dynamic memory allocation via C we have a single global array with integer indices (also at global scope), but otherwise it's much the same. Weird, inconsistent input file and all.
I'd advise against trying to rewrite the majority of the program, whether in python or anything else. It's time consuming, unpleasant and largely unnecessary. As an alternative, get the F77 code base to the point whether it compiles cleanly enough that you're willing to trust it, then write an interface routine.
I now have a big, ugly F77 code base which sits behind an interface. The program requires input as a text file so a large part of the interface's job is to produce that text file. Beyond that, the legacy code is reduced to a single gateway routine which takes a few arguments (including a means of identifying the text file) and returns the answer. If you use the iso_c_binding of Fortran 2003 you can expose the interface in a format C understands, at which point you can link it to whatever you wish.
As far as the modern code (mostly optimisation routines) is concerned, the legacy code base is the single subroutine behind the C interface. This is much nicer than trying to modify the old code further and probably a valid strategy for your case as well.
For an example how to generate the f2py interface library using multiple fortran files see this post.
f2py might be suitable for your task, but there are some pitfalls that might cause some problems. Some pitfalls concerning f2py are listed here and summarized below:
Concerning your specific problem you might run into problems with your allocatable arrays, because f2py was writen for Fortran77 and does not support many of the Fortran90+ features (such as allocatable arrays).
I also encountered a problem with an undocumented maximum array size (arround 400 x 200 x 20 x 20). If I used arrays bigger then that f2py would not be able to generate the python library. Especially the large matrices being passed arround in finitie element codes might be too big for interfacing. Therefore you would not have access to those in the Python part of the program.
Beneficial for you is that f2py should have no Problems with COMMON Blocks, etc. because it was especially written for Fortran77.
After passing the data through the interface to the Fortran routines, there should be no (or only minimal) slowdown if you do it right. The key is to minimize calculations in the Python part of the program per run. This includes the manipulation of the data arrays (shift, rotate, copy, etc.) but not passing of them (because the interface is pass-by-reference).
As an alternative you should have a look at Cython (also see the Link above and the linked working example therein). I think this might serve you better in the long run.
Implementation Suggestion
This suggestion is how I would do it incorporating my experiences with having done something similar (see Background below). It should largely be independent of how you interface the Python and Fortran code (f2py, Cython, ...).
Of course you should be very careful to not change the behaviour and therefore possibly the results of the program. Therefore generation of some tests and their corresponding reference in- & output files and test documentation including all steps, keystrokes, commands, etc. necessary to reproduce those results should be your first step.
In your case I would try to change the least amount possible of the Fortran program. I would try to wedge the "pseudo-shell" from the Fortran code, e.g. making it its own module, and build an interface to that module. Like that you can use all of the original Fortran code and the modifications, bugfixes and updates from your peers, even in the future. The key is to not distance your code to far from the original/ mainstream because in scientific communities usually not everybody will agree with major changes to the source code and update their workflow or source code accordingly. Therefore future work from your peers might not be made in your version, but in the original source code and it would be your own responsibility to merge those changes into your version, which gets easier the less you change.
Using that interface you can work on your python shell and maybe even build a GUI for it without having to worry about changing anything in the original progam. This reduces the risk to introduce bugs or change the results of the original. Your Shell/ GUI would therefore work as a wrapper around the original program to simplify the workflow and remove inconsistencies. All the "intelligence" and utilities, like error & cross checking of the user-input, help pages, tutorials/ howto, etc. would be implemented in the Python wrapper, which would parse these inputs, translate them to the corresponding commands for your Fortran program, send them and wait for the results.
After you have simplified the usage of the program I would write some automatisation for the tests (setup + evaluation) to complete your utilities suite. Like that even somebody new to the program would be able to make changes to the code without having to worry about unknowingly changing the results. This should enable your tools to benefit the community which will attract new users and therefore encourage further development within the community.
Only as the last step I would replace the parts of the code using C with Fortran90+ methods to simplify the code. This is an extensive change of the codebase and needs a lot of tests to ensure EVERY possible combination of commands is checked and verified before and after the changes.
This method also has the benefit, that you could possibly make your interface/ GUI open source (you have to check the licence of your program of course) as long as it is seperable from the source code of the Fortran program. The Fortran - Python interface would have to be provided, or installed/ generated from source files when your interface is loaded using some simple build skript as seen in the first link of this post.
For the manipulation of internal data I would write a seperate wrapper routine, that only handles the data interface. This should be done in Cython though to enable you to use allocatable arrays, etc. Because this interface would work with "pass-by-reference" you should be able to use the full collection of Python (numpy) tools to manipulate the arrays and data.
Background
I did something similar using our research code for helicopter rotordynamics. This is also a very old and large program written in Fortran77 (e.g. goto bonanza). The newer additions and modifications to the code are usually done in Fortran90/2003.
Using parts of this code (several subroutines & module files) I generated a python library to connect our GUI (Python & Qt) to the Fortran program; mainly for postprocessing of Fortran binary output files.

Is it possible to use re2 from Python?

i just discovered http://code.google.com/p/re2, a promising library that uses a long-neglected way (Thompson NFA) to implement a regular expression engine that can be orders of magnitudes faster than the available engines of awk, Perl, or Python.
so i downloaded the code and did the usual sudo make install thing. however, that action had seemingly done little more than adding /usr/local/include/re2/re2.h to my system. there seemed to be some *.a file in addition, but then what is it with this *.a extension?
i would like to use re2 from Python (preferrably Python 3.1) and was excited to see files like make_unicode_groups.py in the distro (maybe just used during the build process?). those however were not deployed on my machine.
how can i use re2 from Python?
update two friendly people have pointed out that i could try to build DLLs / *.so files from the sources and then use Python’s ctypes library to access those. can anyone give useful pointers how to do just that? i’m pretty much clueless here, especially with the first part (building the *.so files).
update i have also posted this question (earlier) to the re2 developers’ group, without reply till now (it is a small group), and today to the (somewhat more populous) comp.lang.py group [—thread here—]. the hope is that people from various corners can contact each other. my guess is a skilled person can do this in a few hours during their 20% your-free-time-belongs-google-too timeslice; it would tie me for weeks. is there a tool to automatically dumb-down C++ to whatever flavor of C that Python needs to be able to connect? then maybe getting a viable result can be reduced to clever tool chaining.
(rant)why is this so difficult? to think that in 2010 we still cannot have our abundant pieces of software just talk to each other. this is such a roadblock that whenever you want to address some C code from Python you must always cruft these linking bits. this requires a lot of work, but only delivers an extension module that is specific to the version of the C code and the version of Python, so it ages fast.(/rant) would it be possible to run such things in separate processes (say if i had an re2 executable that can produce results for data that comes in on, say, subprocess/Popen/communicate())? (this should not be a pure command-line tool that necessitates the opening of a process each time it is needed, but a single processs that runs continuously; maybe there exist wrappers that sort of ‘demonize’ such C code).
David Reiss has put together a Python wrapper for re2. It doesn't have all of the functionality of Python's re module, but it's a start. It's available here: http://github.com/facebook/pyre2.
Possible yes, easy no. Looking at the re2.h, this is a C++ library exposed as a class. There are two ways you could use it from Python.
1.) As Tuomas says, compile it as a DLL/so and use ctypes. In order to use it from python, though, you would need to wrap the object init and methods into c style externed functions. I've done this in the past with ctypes by externing functions that pass a pointer to the object around. The "init" function returns a void pointer to the object that gets passed on each subsequent method call. Very messy indeed.
2.) Wrap it into a true python module. Again those functions exposed to python would need to be extern "C". One option is to use Boost.Python, that would ease this work.
SWIG handles C++ (unlike ctypes), so it may be more straightforward to use it.
You could try to build re2 into its own DLL/so and use ctypes to call functions from that DLL/so. You will probably need to define your own entry points in the DLL/so.
You can use the python package https://pypi.org/project/google-re2/. Although look at the bottom, there are a few requirements to install yourself before installing the python package.

Would python be an appropriate choice for a video library for home use software

I am thinking of creating a video library software which keep track of all my videos and keep track of videos that I already haven't watched and stats like this. The stats will be specific to each user using the software.
My question is, is python appropriate to create this software or do I need something like c++.
Python is perfectly appropriate for such tasks - indeed the most popular video site, YouTube, is essentially programmed in Python (using, of course, lower-level components called from Python for such tasks as web serving, relational db, video transcoding -- there are plenty of such reusable opensource components for all these kinds of tasks, but your application's logic flow and all application-level logic can perfectly well be in Python).
Just yesterday evening, at the local Python interest group meeting in Mountain View, we had new members who just moved to Silicon Valley exactly to take Python-based jobs in the video industry, and they were saying that professional level video handing in the industry is also veering more and more towards Python -- stalwarts like Pixar and ILM had been using Python forever, but in the last year or two it's been a flood of Python adoption in the industry.
If you want your code to be REAL FAST, use C++ (or parallel fortran).
However in your application, 99% of the runtime isn't going to be in YOUR code, it's going to be in GUI libraries, OS calls, waiting for user interaction, calling libraries (written in C) to open video files and make thumbnails, that kind of stuff.
So using C++ will make your code 100 times faster, and your application will, as a result, be 1% faster, which is utterly useless. And if you write it in C++ you'll need months, whereas using Python you'll be finished much faster and have lots more fun.
Using C++ could even make it a lot slower, because in Python you can very easily build more scalable algorithms by using super powerful primitives like hashes, sets, generators, etc, try several algorithms in 5 minutes to see which one is the best, import a library which already does 90% of the work, etc.
Write it in Python.
Yes. Python is much easier to use than c++ for something like this. You may want to use it as a front-end to a DB such as sqlite3
Maybe you should take a look at this project:
Moovida
It's a complete media center, open source, written in python that is easy to extend. I don't know if it will do exactly what you want out of the box but you can probably easily add the features you want.
Of course you can use almost any programming language for almost any task. But after noting that, it's also obvious that different languages are also differently well adapted for different tasks.
C/C++ are languages that are very "hardware friendly". Basically, the languages are just one abstraction level above assembler, with C's use of pointers etc. C++ is almost like a (semi-)portable object oriented assembler, if one wants to be funny. :) This makes C/C++ fast and good at talking to hardware.
But those same features become mis-features in other cases. The pointers make it possible to walk all over the memory and unless you are careful you will leak memory all over the place. So I would say (and now C people will get angry) that C/C++ in fact are directly inappropriate for what you want to do.
You want a language that are higher, does more things like memory management automatically and invisibly. There are many to choose from there, but without a doubt Python is eminently suited for this. Python has the last couple of years emerged as The New Cool Language to write these kind of softwares in, and much multimedia software such as Freevo and the already mentioned Moovida are written in Python.

Anyone using Python for embedded projects? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My company is using Python for a relatively simple embedded project. Is anyone else out there using Python on embedded platforms? Overall it's working well for us, quick to develop apps, quick to debug. I like the overall "conciseness" of the language.
The only real problem I have in day to day work is that the lack of static checking vs a regular compiler can cause problems to be thrown at run-time, e.g. a simple accidental cat of a string and an int in a print statement can bring the whole application down.
We use python in quite a lot of embedded boards with ARM processors and 16 MB of RAM (running linux).
It works really well and is really easy to make custom code quickly - one of the strong points of python.
As for reliability of the code - we try to have 100% test coverage. Writing tests with python is very quick and it gives you a wonderful feeling of confidence. We use twisted trial to run the tests and report on coverage, but there are many other tools available.
In my experience python + tests is more reliable and much quicker to write than any other alternatives.
The only downsides for embedded work is that sometimes python can be slow and sometimes it uses a lot of memory (relatively speaking). This hasn't causes us a show stopping problem yet, and python is quite easy to profile for both speed and memory if it becomes a problem.
pychecker is a very useful too also which will catch quite a lot of common errors.
BTW, see this blog post: "Type inference for Python" for an interesting discussion of type inference and static typing, including links to some Guido van Rossum blog posts describing adding optional static typing to Python.
I agree with Bruce Eckel that one is better off practicing "strong testing" than relying on strong typing. I think that applies equally well to embedded development.
Personally, I've worked on some of the software that runs in the device used by BusRadio. It's an example of an embedded project built on Twisted and Python. The device is an embedded XScale processor running a debian-derived distribution, so it might not meet certain definitions of "embedded", but it is pretty dang small: it fits into the dashboard of a school bus.
There were some interesting issues with using Python with large libraries - the interpreter can take quite a while to start up and load all the code for Twisted on a really slow chip, and some things needed special-case optimizations. However, at no point was the dynamic nature of Python a problem. The software in question certainly wasn't perfect, but at least when using Twisted, a simple programming error will not "bring the whole application down". A traceback will get logged, and processing continues.
So, if you're in an embedded environment sufficiently unconstrained that you can use Python in the first place, it's no different than developing "regular" programs (games, desktop applications, web apps). You don't need static typing there, and you don't need it here either.
At my previous employer I had wanted to spend some time playing with building embedded systems in tinypy, which is a "minimalist implementation of Python in 64k of code". (But I never got to it and I no longer have time.)
Telit makes GSM/GPRS modem modules that include an embedded Python interpreter.
I haven't tried them myself, so I don't know how the Python interpreter compares or differs from a PC implementation, such as which included modules, RAM and ROM memory limits, execution speed, etc.
However, as user foresightyj pointed out in a comment, it appears that they use Python 1.5.x, which is a truly ancient version, and so I would have trouble taking them seriously. Python developers would not enjoy downgrading to such an ancient version without so many modern Python features. I would be concerned about security issues with such an old version.
I've been working on microwave telecommunication equipments based on old and slow powerpc and 16Mb of RAM.
I've been able to port the Python 2.6.1 interpreter on VxWorks, in order to have the command line interpreter available directly from the target shell, or to execute python scripts uploaded to the target flash.
We used those scripts to perform autotest on the target or execute diagnostic procedures.
Here some details on the whole procedure: HOW TO: Port Python to VxWorks
The only real problem I have in day to
day work is that the last of static
checking vs a regular compiler can
cause problems to be thrown at
run-time, e.g. a simple accidental cat
of a string and an int in a print
statement can bring the whole
application down.
Unit tests are your only safety against these things.
Indeed, Python is often used as a 'support language' while you need to write some kind of tests - i.e. I was involved in a project, which (Python based) test framework code base was (is?) almost as big as that of the main product.
Python 'agents' works on QNX, VxWorks - and most problems we have, was to port properly threading and network related parts of our code.
It might be worth to take a look OpenMoko project a lot of embedded development in Python is done there.
Things to watch-out:
- support for Python/C extension module might behave quite strangely depending on platform/OS
- most of embedded platforms offers quite out-dated versions of Python
- finally you will find out that there is a difference between 'proper' embedded software in which every bit counts, and 'modern' embedded software that is performed on >412Mhz XScale CPUs with more thatn 128MB, and then Python just don't match the hardware that you would like to target :(
We use Python here at the university for embedded applications based on the Gumstix hardware platform. Although more capable than traditional embedded systems, we find the mix of small formfactor, low (ish) power consumption and the ease in transferring code between development on desktop machines and the target hardware invaluable.
Python is also a great language to teach the students, and with the Gumstix its great they can get code working on a low power system, rather than the headache and heartbreak that comes with using dedicated languages such as NesC.
My team wrote an embedded software made out of C++ and Python. We decided to write basic classes and heavy computational routines in C++. We wrote logic in Python. Boost libraries as glue. Using boost is never easy, but the results is excellent. Fast and easy to modify. Using python to represent the custom needings, we are able to satisfy customers' needings realtime, changing the code using injection technics. Something really exciting! (ok, I'm a geek ;)
We started prototyping in python but we suddenly realized that it was clearly too slow. So we decided to structure the program in different computational layers, in order to reach the speed requirements. C++ was the best solution.
In order to use python and c++ together we had to keep a strict control on typing.
I worked for a company which used Python on an embedded product based around an Atmel AVR32 and running embedded Linux. The firmware was initially developed on a PC (due to lack of a working hardware prototype), then later moved to the embedded hardware running on the cross-compiled Python interpreter.
The ability to debug and modify source code "live" on the device was a big plus during development, and saved a lot of time. The big disadvantages were speed and memory usage of the Python interpreter.
Following the first release of production firmware we ported critical sections of code over to C/C++. The porting effort was quite straightforward and resulted in an improvement of several orders of magnitude on speed-critical code (as you would expect).
Incidently most of the design and production testing code was written in Python, mainly running inside a test harness on a PC.
In my experience, Python has been traditionally used in desktop environments more than in the embedded field. There are two reasons, related to the fact that Python is interpreted:
C/C++ languages have higher performance than Python (and this is important in embedded systems with a slow microcontroller)
C/C++ languages have more deterministic response times (and this is important in real-time embedded systems controlling something).
Of course, as embedded systems will become faster and time-to-market shorter, Python will be more adopted in the embedded sector.
I have a Python server (using Twisted) and some helper scripts running under XP Embedded, and it's been working great.
Recent developments
MicroPython is a lean and fast implementation of the Python 3 programming language that is optimised to run on a microcontroller.
The European Space Agency (ESA) is funding further development of MicroPython. It is doing so to assess the suitability of the language for space-based applications, in particular for payloads.
WiPy 1.0 & 2.0, LoPy & SiPy are wireless MicroPython platforms sold by Pycom.
Isn't the EVE Online client a showpiece of real-time, high-performance Python?
I'm using a Gatetel GT-HE910 series module which embeds the Telit modem including 3G, GPS, AD, IO and Python 2.7. This is used for a remote data aquisition application. Python is fairy slow on these modules but we only need an update every 15 minutes or in an alarm condition so they work well.
http://www.gatetel.com/#!gt-series/cscb
Blockquote
The only real problem I have in day to day work is that the last of static checking vs a regular compiler can cause problems to be thrown at run-time, e.g. a simple accidental cat of a string and an int in a print statement can bring the whole application down
To me it is a huge deal. Problems you could find at compile time and fix the problem now have to rely at run time. Not knowing the data type and having to write additional function just to check the datatype is hassle. There is no need to do that in C. How would you declare 'volatile' in python?
Blockquote
The only downsides for embedded work is that sometimes python can be slow and sometimes it uses a lot of memory (relatively speaking). This hasn't causes us a show stopping problem yet, and python is quite easy to profile for both speed and memory if it becomes a problem.
This is also huge. For Embedded sytems or RTOS time constraint is very important.
Python is not necessary quick to code. It really depends what language you are comfortable with. Honestly it takes me 1 day to write function and unnecessary object orientation stuff which I can do in 2 hours in C.
Testing is so inconvenient I have to write the code, py_compile, copy pyc in the target then run the program, then python quits complaining variable not defined or type cast error or some petty thing like that.
My suggestion is C toolchain is available for any target. C is fast, hardware oriented,challenging and fun. Stick with C for Embedded systems. No need to install configure silly python packages just to run it.

Python: SWIG vs ctypes

In python, under what circumstances is SWIG a better choice than ctypes for calling entry points in shared libraries? Let's assume you don't already have the SWIG interface file(s). What are the performance metrics of the two?
I have a rich experience of using swig. SWIG claims that it is a rapid solution for wrapping things. But in real life...
Cons:
SWIG is developed to be general, for everyone and for 20+ languages. Generally, it leads to drawbacks:
- needs configuration (SWIG .i templates), sometimes it is tricky,
- lack of treatment of some special cases (see python properties further),
- lack of performance for some languages.
Python cons:
1) Code style inconsistency. C++ and python have very different code styles (that is obvious, certainly), the possibilities of a swig of making target code more Pythonish is very limited. As an example, it is butt-heart to create properties from getters and setters. See this q&a
2) Lack of broad community. SWIG has some good documentation. But if one caught something that is not in the documentation, there is no information at all. No blogs nor googling helps. So one has to heavily dig SWIG generated code in such cases... That is terrible, I could say...
Pros:
In simple cases, it is really rapid, easy and straight forward
If you produced swig interface files once, you can wrap this C++ code to ANY of other 20+ languages (!!!).
One big concern about SWIG is a performance. Since version 2.04 SWIG includes '-builtin' flag which makes SWIG even faster than other automated ways of wrapping. At least some benchmarks shows this.
When to USE SWIG?
So I concluded for myself two cases when the swig is good to use:
2) If one needs to wrap C++ code for several languages. Or if potentially there could be a time when one needs to distribute the code for several languages. Using SWIG is reliable in this case.
1) If one needs to rapidly wrap just several functions from some C++ library for end use.
Live experience
Update :
It is a year and a half passed as we did a conversion of our library by using SWIG.
First, we made a python version. There were several moments when we experienced troubles with SWIG - it is true. But right now we expanded our library to Java and .NET. So we have 3 languages with 1 SWIG. And I could say that SWIG rocks in terms of saving a LOT of time.
Update 2:
It is two years as we use SWIG for this library. SWIG is integrated into our build system. Recently we had major API change of C++ library. SWIG worked perfectly. The only thing we needed to do is to add several %rename to .i files so our CppCamelStyleFunctions() now looks_more_pythonish in python. First I was concerned about some problems that could arise, but nothing went wrong. It was amazing. Just several edits and everything distributed in 3 languages. Now I am confident that it was a good solution to use SWIG in our case.
Update 3:
It is 3+ years we use SWIG for our library. Major change: python part was totally rewritten in pure python. The reason is that Python is used for the majority of applications of our library now. Even if the pure python version works slower than C++ wrapping, it is more convenient for users to work with pure python, not struggling with native libraries.
SWIG is still used for .NET and Java versions.
The Main question here "Would we use SWIG for python if we started the project from the beginning?". We would! SWIG allowed us to rapidly distribute our product to many languages. It worked for a period of time which gave us the opportunity for better understanding our users requirements.
SWIG generates (rather ugly) C or C++ code. It is straightforward to use for simple functions (things that can be translated directly) and reasonably easy to use for more complex functions (such as functions with output parameters that need an extra translation step to represent in Python.) For more powerful interfacing you often need to write bits of C as part of the interface file. For anything but simple use you will need to know about CPython and how it represents objects -- not hard, but something to keep in mind.
ctypes allows you to directly access C functions, structures and other data, and load arbitrary shared libraries. You do not need to write any C for this, but you do need to understand how C works. It is, you could argue, the flip side of SWIG: it doesn't generate code and it doesn't require a compiler at runtime, but for anything but simple use it does require that you understand how things like C datatypes, casting, memory management and alignment work. You also need to manually or automatically translate C structs, unions and arrays into the equivalent ctypes datastructure, including the right memory layout.
It is likely that in pure execution, SWIG is faster than ctypes -- because the management around the actual work is done in C at compiletime rather than in Python at runtime. However, unless you interface a lot of different C functions but each only a few times, it's unlikely the overhead will be really noticeable.
In development time, ctypes has a much lower startup cost: you don't have to learn about interface files, you don't have to generate .c files and compile them, you don't have to check out and silence warnings. You can just jump in and start using a single C function with minimal effort, then expand it to more. And you get to test and try things out directly in the Python interpreter. Wrapping lots of code is somewhat tedious, although there are attempts to make that simpler (like ctypes-configure.)
SWIG, on the other hand, can be used to generate wrappers for multiple languages (barring language-specific details that need filling in, like the custom C code I mentioned above.) When wrapping lots and lots of code that SWIG can handle with little help, the code generation can also be a lot simpler to set up than the ctypes equivalents.
CTypes is very cool and much easier than SWIG, but it has the drawback that poorly or malevolently-written python code can actually crash the python process. You should also consider boost python. IMHO it's actually easier than swig while giving you more control over the final python interface. If you are using C++ anyway, you also don't add any other languages to your mix.
In my experience, ctypes does have a big disadvantage: when something goes wrong (and it invariably will for any complex interfaces), it's a hell to debug.
The problem is that a big part of your stack is obscured by ctypes/ffi magic and there is no easy way to determine how did you get to a particular point and why parameter values are what they are..
You can also use Pyrex, which can act as glue between high-level Python code and low-level C code. lxml is written in Pyrex, for instance.
ctypes is great, but does not handle C++ classes. I've also found ctypes is about 10% slower than a direct C binding, but that will highly depend on what you are calling.
If you are going to go with ctypes, definitely check out the Pyglet and Pyopengl projects, that have massive examples of ctype bindings.
I'm going to be contrarian and suggest that, if you can, you should write your extension library using the standard Python API. It's really well-integrated from both a C and Python perspective... if you have any experience with the Perl API, you will find it a very pleasant surprise.
Ctypes is nice too, but as others have said, it doesn't do C++.
How big is the library you're trying to wrap? How quickly does the codebase change? Any other maintenance issues? These will all probably affect the choice of the best way to write the Python bindings.
Just wanted to add a few more considerations that I didn't see mentioned yet.
[EDIT: Ooops, didn't see Mike Steder's answer]
If you want to try using a non Cpython implementation (like PyPy, IronPython or Jython), then ctypes is about the only way to go. PyPy doesn't allow writing C-extensions, so that rules out pyrex/cython and Boost.python. For the same reason, ctypes is the only mechanism that will work for IronPython and (eventually, once they get it all working) jython.
As someone else mentioned, no compilation is required. This means that if a new version of the .dll or .so comes out, you can just drop it in, and load that new version. As long as the none of the interfaces changed, it's a drop in replacement.
Something to keep in mind is that SWIG targets only the CPython implementation. Since ctypes is also supported by the PyPy and IronPython implementations it may be worth writing your modules with ctypes for compatibility with the wider Python ecosystem.
I have found SWIG to be be a little bloated in its approach (in general, not just Python) and difficult to implement without having to cross the sore point of writing Python code with an explicit mindset to be SWIG friendly, rather than writing clean well-written Python code. It is, IMHO, a much more straightforward process to write C bindings to C++ (if using C++) and then use ctypes to interface to any C layer.
If the library you are interfacing to has a C interface as part of the library, another advantage of ctypes is that you don't have to compile a separate python-binding library to access third-party libraries. This is particularly nice in formulating a pure-python solution that avoids cross-platform compilation issues (for those third-party libs offered on disparate platforms). Having to embed compiled code into a package you wish to deploy on something like PyPi in a cross-platform friendly way is a pain; one of my most irritating points about Python packages using SWIG or underlying explicit C code is their general inavailability cross-platform. So consider this if you are working with cross-platform available third party libraries and developing a python solution around them.
As a real-world example, consider PyGTK. This (I believe) uses SWIG to generate C code to interface to the GTK C calls. I used this for the briefest time only to find it a real pain to set up and use, with quirky odd errors if you didn't do things in the correct order on setup and just in general. It was such a frustrating experience, and when I looked at the interace definitions provided by GTK on the web I realized what a simple excercise it would be to write a translator of those interface to python ctypes interface. A project called PyGGI was born, and in ONE day I was able to rewrite PyGTK to be a much more functiona and useful product that matches cleanly to the GTK C-object-oriented interfaces. And it required no compilation of C-code making it cross-platform friendly. (I was actually after interfacing to webkitgtk, which isn't so cross-platform). I can also easily deploy PyGGI to any platform supporting GTK.

Categories

Resources