Python extensions - performance

Python extensions - performance - python

I am using Boost.Python to extend python program functionality. Python scripts do a lot of calls to native modules so I am really concerned about the performance of python-to-cpp type conversion and data marshaling.
I decided to try exposing methods natively through Python C API. May be somebody already tried that before ? Any success ... at least in theory ?
The problem I run into is that how to convert PyObject* back to class instance, PyArg_parse provides O& option, but what I am looking is simply a pointer to C++ object in memory... how can I get it in function ?
if ( PyArg_ParseTuple(args, "O", &pyTestClass ) )
{
// how to get TestClass from pyTestClass ??
}
Thanks

I haven't tried Boost.Python, but I've extended Python using raw C as well as Cython. I recommend Cython; if you're careful enough you can get code with the same efficiency as raw C but with a lot less boilerplate code.
Regarding efficiency, it's relative. It depends on what you want to do and how you do it. For example, what I've done very often is write the inner loop of some image processing or matrix operation in C, and have this function be called by Python with pointers to matrices as arguments. The matrices themselves don't get copied, so the overhead is minimal.

Related

Return a python function from Python C-API

I am currently in the process of writing a small python module using the Python API, that will speed up some of the slower python code, that is repeatedly run in a in a simulation of sorts. My issue is that currently this code is takes a bunch of arguments, that in many use cases won't change. For example the function signature will be like: func(x,a,b,c,d,e), but after an initialisation only x will change. I therefore will have the python code littered with lambda x : func(x,a,b,c,d,e) where I wrap these before use. I have observed that this actually introduces quite a bit of calling overhead.
My idea to fix this was to create a PyObject* that is essentially C++ lambda instead of the python one. The main issue with this is that I have not found a way to create PyObjects from C++ lambdas, or even lower level functions. Since functions/lambdas in python can be passed as arguments I assume it is possible, but is there a clean way I'm missing.

I would seriously consider using swig instead of pybind11 for example. It's just peace of mind. If you don't want to use swig directly, you can at least see what swig does to wrap up features like proxy objects.
http://www.swig.org/Doc2.0/SWIGPlus.html#SWIGPlus_nn38

How to use a Matlab function from a Python script on a machine without Matlab

The setting
In a shared project I have written a Python script that needs to pass some parameters to a function a colleague wrote, receive its output and work on the results. Said function has been created in Matlab which I don't own (and which will not be available on the machine on which the final product will be running), so I asked them to convert it to an independent format.
What I received is a *.mexw64 file and a folder called "codegen" containing a whole lot of auto-generated stuff, among which is the wanted function as raw C code.
The problem: C code does not compile
In order to use this new C function in Python I need to compile it, so I copied all the files into a freshly created project in VisualStudio 2015 and tried just that. It failed, as the function's parameter type "real_T" is not recognized. I have never worked with C and barely understand what I'm doing here, anyway, so right now, I am sort of stuck. My colleague has never exported anything from Matlab before and does not know what to do, either.
The original Matlab function looks like this:
function [a] = convert_vector(b)
...
end
while the resulting C code is the following:
/* Include files */
#include "rt_nonfinite.h"
#include "testc.h"
/* Function Definitions */
void convert_vector(real_T b[2])
{
...
}
The question: How to compile C code?
In order to call C code from Python I need it to be compiled, but I don't know how to handle the "real_T" type error. Over at MATLAB Answers they say everything necessary should be in the .h-file, but apparently VisualStudio doesn't know this, even though it is #included. This might all be me having never had anything to do with C and/or VisualStudio before, though.
However, I am wondering, whether this is the right way to go, anyway. It seems awfully tedious to convert to uncompiled C and perform a lot of operations on it before being able to use a simple function. Maybe Codegen is not the best approach here?
The higher-level question: How to export?
I've read that there are other possibilities to export Matlab functions, for example as a standalone *.exe file, which sounds way simpler to me both in terms of usage and module handling. So, I guess, the real question here would be: What is the best way to export the Matlab function in order to use it from my Python script, so that
Matlab itself is not needed on the machine where the script is running and
I can pass parameters and receive results via Python?
Sorry for this somewhat convoluted question. I am grateful for suggestions regarding every part of the problem, as well as hints on how to improve the question itself.

array returned from shared library in python - is this a memory leak?

I have a problem with a project I am working on and am not sure about the best way to
resolve it.
Basically I am pushing a slow python algorithm into a c++ shared library that I am using to do a lot of the numerically intense stuff. One of the c++ functions is of the form:
const int* some_function(inputs){
//does some stuff
int *return_array = new int[10];
// fills return array with a few values
return return_array;
}
I.e returns an array here. This array is interpreted within python using numpy ndpointer as per:
lib.some_function.restype = ndpointer(dtype=c_int, shape=(10,))
I have a couple of questions that I have been fretting over for a while:
1) I have dynamically allocated memory here. Given that I am calling this function through the shared library and into python, do I cause a memory leak? My program is long running and I will likely call this function millions of times, so this is important.
2) Is there a better data structure I can be using? If this was a pure c++ function I would return a vector, but from googling around, this seems to be a non- ideal solution in python with ctypes. I also have other functions in the c++ library that call this function. Given that I have just written the function and am about to write the others, I know to delete[] the returned pointer after use in these functions. However, I am unsatisfied with the current situation, as if someone other than myself (or indeed myself in a few months) uses this function, there is a relatively high chance of future memory leaks.
Thanks!

Yes, you are leaking memory. It is not possible for the Python code to automatically free the pointed-to memory (since it has no idea how it was allocated). You need to provide a corresponding de-allocation function (to call delete[]) and tell Python how to call it (possibly using a wrapper framework as recommended by #RichardHidges).

You probably want to consider using either SWIG or boost::python
There's an example of converting a std::vector to a python list using boost::python here:
std::vector to boost::python::list
here is the link for swig:
http://www.swig.org

Wrapping a C library in Python: C, Cython or ctypes?

I want to call a C library from a Python application. I don't want to wrap the whole API, only the functions and datatypes that are relevant to my case. As I see it, I have three choices:
Create an actual extension module in C. Probably overkill, and I'd also like to avoid the overhead of learning extension writing.
Use Cython to expose the relevant parts from the C library to Python.
Do the whole thing in Python, using ctypes to communicate with the external library.
I'm not sure whether 2) or 3) is the better choice. The advantage of 3) is that ctypes is part of the standard library, and the resulting code would be pure Python – although I'm not sure how big that advantage actually is.
Are there more advantages / disadvantages with either choice? Which approach do you recommend?
Edit: Thanks for all your answers, they provide a good resource for anyone looking to do something similar. The decision, of course, is still to be made for the single case—there's no one "This is the right thing" sort of answer. For my own case, I'll probably go with ctypes, but I'm also looking forward to trying out Cython in some other project.
With there being no single true answer, accepting one is somewhat arbitrary; I chose FogleBird's answer as it provides some good insight into ctypes and it currently also is the highest-voted answer. However, I suggest to read all the answers to get a good overview.
Thanks again.

Warning: a Cython core developer's opinion ahead.
I almost always recommend Cython over ctypes. The reason is that it has a much smoother upgrade path. If you use ctypes, many things will be simple at first, and it's certainly cool to write your FFI code in plain Python, without compilation, build dependencies and all that. However, at some point, you will almost certainly find that you have to call into your C library a lot, either in a loop or in a longer series of interdependent calls, and you would like to speed that up. That's the point where you'll notice that you can't do that with ctypes. Or, when you need callback functions and you find that your Python callback code becomes a bottleneck, you'd like to speed it up and/or move it down into C as well. Again, you cannot do that with ctypes. So you have to switch languages at that point and start rewriting parts of your code, potentially reverse engineering your Python/ctypes code into plain C, thus spoiling the whole benefit of writing your code in plain Python in the first place.
With Cython, OTOH, you're completely free to make the wrapping and calling code as thin or thick as you want. You can start with simple calls into your C code from regular Python code, and Cython will translate them into native C calls, without any additional calling overhead, and with an extremely low conversion overhead for Python parameters. When you notice that you need even more performance at some point where you are making too many expensive calls into your C library, you can start annotating your surrounding Python code with static types and let Cython optimise it straight down into C for you. Or, you can start rewriting parts of your C code in Cython in order to avoid calls and to specialise and tighten your loops algorithmically. And if you need a fast callback, just write a function with the appropriate signature and pass it into the C callback registry directly. Again, no overhead, and it gives you plain C calling performance. And in the much less likely case that you really cannot get your code fast enough in Cython, you can still consider rewriting the truly critical parts of it in C (or C++ or Fortran) and call it from your Cython code naturally and natively. But then, this really becomes the last resort instead of the only option.
So, ctypes is nice to do simple things and to quickly get something running. However, as soon as things start to grow, you'll most likely come to the point where you notice that you'd better used Cython right from the start.

ctypes is your best bet for getting it done quickly, and it's a pleasure to work with as you're still writing Python!
I recently wrapped an FTDI driver for communicating with a USB chip using ctypes and it was great. I had it all done and working in less than one work day. (I only implemented the functions we needed, about 15 functions).
We were previously using a third-party module, PyUSB, for the same purpose. PyUSB is an actual C/Python extension module. But PyUSB wasn't releasing the GIL when doing blocking reads/writes, which was causing problems for us. So I wrote our own module using ctypes, which does release the GIL when calling the native functions.
One thing to note is that ctypes won't know about #define constants and stuff in the library you're using, only the functions, so you'll have to redefine those constants in your own code.
Here's an example of how the code ended up looking (lots snipped out, just trying to show you the gist of it):
from ctypes import *
d2xx = WinDLL('ftd2xx')
OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3
...
def openEx(serial):
serial = create_string_buffer(serial)
handle = c_int()
if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
return Handle(handle.value)
raise D2XXException
class Handle(object):
def __init__(self, handle):
self.handle = handle
...
def read(self, bytes):
buffer = create_string_buffer(bytes)
count = c_int()
if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
return buffer.raw[:count.value]
raise D2XXException
def write(self, data):
buffer = create_string_buffer(data)
count = c_int()
bytes = len(data)
if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
return count.value
raise D2XXException
Someone did some benchmarks on the various options.
I might be more hesitant if I had to wrap a C++ library with lots of classes/templates/etc. But ctypes works well with structs and can even callback into Python.

Cython is a pretty cool tool in itself, well worth learning, and is surprisingly close to the Python syntax. If you do any scientific computing with Numpy, then Cython is the way to go because it integrates with Numpy for fast matrix operations.
Cython is a superset of Python language. You can throw any valid Python file at it, and it will spit out a valid C program. In this case, Cython will just map the Python calls to the underlying CPython API. This results in perhaps a 50% speedup because your code is no longer interpreted.
To get some optimizations, you have to start telling Cython additional facts about your code, such as type declarations. If you tell it enough, it can boil the code down to pure C. That is, a for loop in Python becomes a for loop in C. Here you will see massive speed gains. You can also link to external C programs here.
Using Cython code is also incredibly easy. I thought the manual makes it sound difficult. You literally just do:
$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so
and then you can import mymodule in your Python code and forget entirely that it compiles down to C.
In any case, because Cython is so easy to setup and start using, I suggest trying it to see if it suits your needs. It won't be a waste if it turns out not to be the tool you're looking for.

For calling a C library from a Python application there is also cffi which is a new alternative for ctypes. It brings a fresh look for FFI:
it handles the problem in a fascinating, clean way (as opposed to ctypes)
it doesn't require to write non Python code (as in SWIG, Cython, ...)

I'll throw another one out there: SWIG
It's easy to learn, does a lot of things right, and supports many more languages so the time spent learning it can be pretty useful.
If you use SWIG, you are creating a new python extension module, but with SWIG doing most of the heavy lifting for you.

Personally, I'd write an extension module in C. Don't be intimidated by Python C extensions -- they're not hard at all to write. The documentation is very clear and helpful. When I first wrote a C extension in Python, I think it took me about an hour to figure out how to write one -- not much time at all.

If you have already a library with a defined API, I think ctypes is the best option, as you only have to do a little initialization and then more or less call the library the way you're used to.
I think Cython or creating an extension module in C (which is not very difficult) are more useful when you need new code, e.g. calling that library and do some complex, time-consuming tasks, and then passing the result to Python.
Another approach, for simple programs, is directly do a different process (compiled externally), outputting the result to standard output and call it with subprocess module. Sometimes it's the easiest approach.
For example, if you make a console C program that works more or less that way
$miCcode 10
Result: 12345678
You could call it from Python
>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678
With a little string formating, you can take the result in any way you want. You can also capture the standard error output, so it's quite flexible.

ctypes is great when you've already got a compiled library blob to deal with (such as OS libraries). The calling overhead is severe, however, so if you'll be making a lot of calls into the library, and you're going to be writing the C code anyway (or at least compiling it), I'd say to go for cython. It's not much more work, and it'll be much faster and more pythonic to use the resulting pyd file.
I personally tend to use cython for quick speedups of python code (loops and integer comparisons are two areas where cython particularly shines), and when there is some more involved code/wrapping of other libraries involved, I'll turn to Boost.Python. Boost.Python can be finicky to set up, but once you've got it working, it makes wrapping C/C++ code straightforward.
cython is also great at wrapping numpy (which I learned from the SciPy 2009 proceedings), but I haven't used numpy, so I can't comment on that.

I know this is an old question but this thing comes up on google when you search stuff like ctypes vs cython, and most of the answers here are written by those who are proficient already in cython or c which might not reflect the actual time you needed to invest to learn those to implement your solution. I am a complete beginner in both. I have never touched cython before, and have very little experience on c/c++.
For the last two days, I was looking for a way to delegate a performance heavy part of my code to something more low level than python. I implemented my code both in ctypes and Cython, which consisted basically of two simple functions.
I had a huge string list that needed to processed. Notice list and string.
Both types do not correspond perfectly to types in c, because python strings are by default unicode and c strings are not. Lists in python are simply NOT arrays of c.
Here is my verdict. Use cython. It integrates more fluently to python, and easier to work with in general. When something goes wrong ctypes just throws you segfault, at least cython will give you compile warnings with a stack trace whenever it is possible, and you can return a valid python object easily with cython.
Here is a detailed account on how much time I needed to invest in both them to implement the same function. I did very little C/C++ programming by the way:
Ctypes:
About 2h on researching how to transform my list of unicode strings to a c compatible type.
About an hour on how to return a string properly from a c function. Here I actually provided my own solution to SO once I have written the functions.
About half an hour to write the code in c, compile it to a dynamic library.
10 minutes to write a test code in python to check if c code works.
About an hour of doing some tests and rearranging the c code.
Then I plugged the c code into actual code base, and saw that ctypes does not play well with multiprocessing module as its handler is not pickable by default.
About 20 minutes I rearranged my code to not use multiprocessing module, and retried.
Then second function in my c code generated segfaults in my code base although it passed my testing code. Well, this is probably my fault for not checking well with edge cases, I was looking for a quick solution.
For about 40 minutes I tried to determine possible causes of these segfaults.
I split my functions into two libraries and tried again. Still had segfaults for my second function.
I decided to let go of the second function and use only the first function of c code and at the second or third iteration of the python loop that uses it, I had a UnicodeError about not decoding a byte at the some position though I encoded and decoded everthing explicitely.
At this point, I decided to search for an alternative and decided to look into cython:
Cython
10 min of reading cython hello world.
15 min of checking SO on how to use cython with setuptools instead of distutils.
10 min of reading on cython types and python types. I learnt I can use most of the builtin python types for static typing.
15 min of reannotating my python code with cython types.
10 min of modifying my setup.py to use compiled module in my codebase.
Plugged in the module directly to the multiprocessing version of codebase. It works.
For the record, I of course, did not measure the exact timings of my investment. It may very well be the case that my perception of time was a little to attentive due too mental effort required while I was dealing with ctypes. But it should convey the feel of dealing with cython and ctypes

There is one issue which made me use ctypes and not cython and which is not mentioned in other answers.
Using ctypes the result does not depend on compiler you are using at all. You may write a library using more or less any language which may be compiled to native shared library. It does not matter much, which system, which language and which compiler. Cython, however, is limited by the infrastructure. E.g, if you want to use intel compiler on windows, it is much more tricky to make cython work: you should "explain" compiler to cython, recompile something with this exact compiler, etc. Which significantly limits portability.

If you are targeting Windows and choose to wrap some proprietary C++ libraries, then you may soon discover that different versions of msvcrt***.dll (Visual C++ Runtime) are slightly incompatible.
This means that you may not be able to use Cython since resulting wrapper.pyd is linked against msvcr90.dll (Python 2.7) or msvcr100.dll (Python 3.x). If the library that you are wrapping is linked against different version of runtime, then you're out of luck.
Then to make things work you'll need to create C wrappers for C++ libraries, link that wrapper dll against the same version of msvcrt***.dll as your C++ library. And then use ctypes to load your hand-rolled wrapper dll dynamically at the runtime.
So there are lots of small details, which are described in great detail in following article:
"Beautiful Native Libraries (in Python)": http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/

There's also one possibility to use GObject Introspection for libraries that are using GLib.

Creating a wrapper for a C library in Python

I'm trying to create a wrapper of my own for FLAC, so that I can use FLAC in my own Python code.
I tried using ctypes first, but it showed a really weird interface to the library, e.g. all the init functions for FLAC streams and files became one function with no real information on how to initialize it. Especially since it wants a reference to a stream decoder, but Python has no way to store pointers ( BZZZT! ), and thus I can't store the pointer to the stream decoder. It doesn't help that the different init functions have a different number of arguments and some argument types differ. It also has a lot of enumerations and structures, and I don't know how to get these into my code.
I've been looking into Pyrex, but I kinda ran into the same problem with pointers, but I think I solved it, sort-of. The file isn't small either, and it's not even complete.
So I'm looking for alternatives, or guides that would help me understand the aforementioned ways better. It would really help if I could get a recommendation and/or help.

Python has no way to store pointers, and thus I can't store the pointer to the stream decoder
ctypes has pointers, and ctypes can be used to wrap existing C libraries. Just a tip, you will need to wrap/rewrite all relavent C structures into ctypes.Structure.
Take look at examples: code.google.com/p/pyxlib-ctypes and code.google.com/p/pycairo-ctypes. More info how to map function/procedure and its argtypes and restype at http://docs.python.org/library/ctypes.html
I've been looking into Pyrex, but I kinda ran into the same problem with pointers, but I think I solved it, sort-of. The file isn't small either, and it's not even complete.
cython may be what you need if you want clean syntax. www.cython.org
So I'm looking for alternatives, or guides that would help me understand the aforementioned ways better. It would really help if I could get a recommendation and/or help.
swig on other hand can always be used but it is more complicated if you are not used to it. www.swig.org

Did you have a look at http://www.swig.org/:
SWIG is a software development tool
that connects programs written in C
and C++ with a variety of high-level
programming languages.

but Python has no way to store pointers ( BZZZT! )
That is incorrect. You create a pointer like this:
pInt = POINTER(c_int)()
and you access it like this
pInt[0] # or p.contents

This post is old, but there's an alternative to ctypes: CFFI. It's a lot easier, somewhat faster, and works better under PyPy. Plus, it has great support for pointers. Here's an example:
from cffi import FFI
ffi = cffi.FFI()
ffi.cdef('''
struct x { void *a; }
void* get_buffer();
struct x* make_x(void*);
void change_x(struct x*, void*);
''')
dll = ffi.dlopen('libmyawesomelibrary.so')
buf = dll.get_buffer()
tst = dll.new('struct x*')
tst.a = buf
change_x(tst, get_buffer())
tst2 = make_x(get_buffer())

Some people use pyrex for this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python extensions - performance - python

Related

Return a python function from Python C-API

How to use a Matlab function from a Python script on a machine without Matlab

array returned from shared library in python - is this a memory leak?

Wrapping a C library in Python: C, Cython or ctypes?

Creating a wrapper for a C library in Python

Categories

Resources