Related
I am writing some C extension code for a python module.
The function I want to write is (in python)
output = 1./(1. + input)
where input is a numpy array of any shape.
Originally I was using NpyIter_MultiNew:
static PyObject *
helper_calc1(PyObject *self, PyObject *args){
PyObject * input;
PyObject * output = NULL;
if (!PyArg_ParseTuple(args, "O", &input)){
return NULL;
}
// -- input -----------------------------------------------
PyArrayObject * in_arr;
in_arr = (PyArrayObject *) PyArray_FROM_OTF(input, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);
if (in_arr == NULL){
goto fail;
}
// -- set up iterator -------------------------------------
PyArrayObject * op[2];
npy_uint32 op_flags[2];
npy_uint32 flags;
op[0] = in_arr;
op_flags[0] = NPY_ITER_READONLY;
op[1] = NULL;
op_flags[1] = NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE;
flags = NPY_ITER_EXTERNAL_LOOP | NPY_ITER_BUFFERED | NPY_ITER_GROWINNER;
NpyIter * iter = NpyIter_MultiNew(2, op,
flags, NPY_KEEPORDER, NPY_NO_CASTING,
op_flags, NULL);
if (iter == NULL){
goto fail;
};
NpyIter_IterNextFunc * iternext = NpyIter_GetIterNext(iter, NULL);
if (iternext == NULL){
NpyIter_Deallocate(iter);
goto fail;
};
// -- iterate ---------------------------------------------
npy_intp count;
char ** dataptr = NpyIter_GetDataPtrArray(iter);
npy_intp * strideptr = NpyIter_GetInnerStrideArray(iter);
npy_intp * innersizeptr = NpyIter_GetInnerLoopSizePtr(iter);
do {
count = *innersizeptr;
while (count--){
*(double *) dataptr[1] = 1. / (1. + *(double *)dataptr[0]);
dataptr[0] += strideptr[0];
dataptr[1] += strideptr[1];
}
} while (iternext(iter));
output = NpyIter_GetOperandArray(iter)[1];
if (NpyIter_Deallocate(iter) != NPY_SUCCEED){
goto fail;
}
Py_DECREF(in_arr);
return output;
fail:
Py_XDECREF(in_arr);
Py_XDECREF(output);
return NULL;
}
However, since this is just a single array (i.e. I don't need to be concerned about broadcasting multiple arrays),
Is there any reason I can't iterate over the data myself using, PyArray_DATA, a for loop and the array size?
static PyObject *
helper_calc2(PyObject *self, PyObject *args){
PyObject * input;
PyObject * output = NULL;
if (!PyArg_ParseTuple(args, "O", & in)){
return NULL;
}
// -- input -----------------------------------------------
PyArrayObject * in_arr;
in_arr = (PyArrayObject *) PyArray_FROM_OTF(input, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);
if (in_arr == NULL){
Py_XDECREF(in_arr);
return NULL;
}
int ndim = PyArray_NDIM(in_arr);
npy_intp * shape = PyArray_DIMS(in_arr);
int size = (int) PyArray_SIZE(in_arr);
double * in_data = (double *) PyArray_DATA(in_arr);
output = PyArray_SimpleNew(ndim, shape, NPY_DOUBLE);
double * out_data = (double *) PyArray_DATA((PyArrayObject *) output);
for (int i = 0; i < size; i++){
out_data[i] = 1. / (1. + in_data[i]);
}
Py_DECREF(in_arr);
return output;
fail:
Py_XDECREF(in_arr);
Py_XDECREF(output);
return NULL;
}
This second version runs more quickly and the code is shorter.
Are there any dangers I need to look out for when using,PyArray_DATA with a for loop instead of NpyIter_MultiNew?
From the PyArray_DATA documentation:
If you have not guaranteed a contiguous and/or aligned array then be sure you understand how to access the data in the array to avoid memory and/or alignment problems.
But I believe that this is taken care of by PyArray_FROM_OTF with the NPY_ARRAY_IN_ARRAY flag.
You should be ok in this case. Like you mentioned NPY_ARRAY_IN_ARRAY with PyArray_FROM_OTF takes care of that issue and since you are casting the data pointer to the right type, you should be fine. However, in general, as you seem to be aware of, you can't use this approach if you are directly accepting a NumPy array from Python code.
Happy C extension writing.
I am trying to implement a Python wrapper using the Python C API over a C++ library. I need to implement conversions so I can use objects in Python and C++. I already done that in the past but I have an error I really have a hard time with.
I have a very basic test function:
PyObject* convert_to_python() {
std::cout << "Convert to PyObject" << std::endl;
long int a = 20;
PyObject* py_a = PyInt_FromLong(a);
std::cout << "Convert to PyObject ok" << std::endl;
return py_a;
}
I call this function inside a GoogleTest macro:
TEST(Wrapper, ConvertTest) {
PyObject *py_m = convert_to_python();
}
And my output is:
Convert to PyObject
Segmentation fault (core dumped)
I also ran valgrind on it:
valgrind --tool=memcheck --track-origins=yes --leak-check=full ./my_convert
But it doesn't give me much information about it:
Invalid read of size 8
==19030== at 0x4F70A7B: PyInt_FromLong (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==19030== by 0x541E6BF: _object* pysmud_from<float>(smu::Matrix<float, 0, 0>&) (smu_type_conversions.cpp:308)
==19030== by 0x43A144: (anonymous namespace)::Wrapper_ConvertMatrix_Test::Body() (test_wrapper.cpp:12)
==19030== by 0x43A0C6: (anonymous namespace)::Wrapper_ConvertMatrix_Test::TestBody() (test_wrapper.cpp:10)
==19030== by 0x465B4D: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2078)
==19030== by 0x460684: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2114)
==19030== by 0x444C05: testing::Test::Run() (gtest.cc:2151)
==19030== by 0x4454C9: testing::TestInfo::Run() (gtest.cc:2326)
==19030== by 0x445BEA: testing::TestCase::Run() (gtest.cc:2444)
==19030== by 0x44CF41: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4315)
==19030== by 0x46712C: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
==19030== by 0x461532: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
==19030== Address 0x0 is not stack'd, malloc'd or (recently) free'd
I think this code should work but I can't get what it's wrong with what I wrote. Did I wrongly included or linked Python files and libraries ?
EDIT: Gives no errors
#include <Python.h>
PyObject* convert_long_int(long int a) {
PyObject *ret = PyInt_FromLong(a);
return ret;
}
int main(void) {
long int a = 65454984;
PyObject *pya = convert_long_int(a);
return 0;
}
If compiling with gcc -o wraptest -I/usr/include/python2.7 wraptest.c -L/usr/lib/x86_64-linux-gnu/ -lpython2.7
What does the initialization do ?
I can confirm the segmentation fault on Ubuntu 16.04 and Python 2.7, if I omit the initialization.
Looking at Embedding Python in Another Application, there's this example
#include <Python.h>
int
main(int argc, char *argv[])
{
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print 'Today is',ctime(time())\n");
Py_Finalize();
return 0;
}
So when I do an equivalent minimal main
int main()
{
Py_Initialize();
PyObject *p = convert_to_python();
Py_Finalize();
return 0;
}
it works without crash.
The difference between the two examples is
long int a = 20;
and
long int a = 65454984;
I guess, it has to do with PyInt_FromLong(long ival)
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.
Maybe Python tries to access an uninitialized pointer or memory range without the initialization.
When I change the example using a = 256, it crashes. Using a = 257, it doesn't.
Looking at cpython/Objects/intobject.c:79, you can see an array of pointers
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
which is accessed right below in PyInt_FromLong(long ival)
v = small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
But without initialization from _PyInt_Init(void)
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++) {
if (!free_list && (free_list = fill_free_list()) == NULL)
return 0;
/* PyObject_New is inlined */
v = free_list;
free_list = (PyIntObject *)Py_TYPE(v);
(void)PyObject_INIT(v, &PyInt_Type);
v->ob_ival = ival;
small_ints[ival + NSMALLNEGINTS] = v;
}
these pointers are all NULL, causing the crash.
I am writing a Python application that uses OpenCV's Python bindings to do marker detection and other image processing. I would like to use OpenCV's CUDA modules to CUDA-accelerate certain parts of my application, and noticed in their .hpp files that they seem to be using the OpenCV export macros for Python and Java. However, I do not seem to be able to access those CUDA functions, even though I am building OpenCV WITH_CUDA=ON.
Is it necessary to use a wrapper such as PyCUDA in order to access the GPU functions, such as threshold in cudaarithm? Or, are these CUDA-accelerated functions already being used if I call cv2.threshold() in my Python code (rather than the regular, CPU-based implementation)?
CV_EXPORTS double threshold(InputArray src, OutputArray dst, double thresh, double maxval, int type, Stream& stream = Stream::Null());
The submodules I see for cv2 are the following:
Error
aruco
detail
fisheye
flann
instr
ml
ocl
ogl
videostab
cv2.cuda, cv2.gpu, and cv2.cudaarithm all return with an AttributeError.
The CMake instruction I am running to build OpenCV is as follows:
cmake -DOPENCV_EXTRA_MODULES_PATH=/usr/local/lib/opencv_contrib/modules/ \
-D WITH_CUDA=ON -D CUDA_FAST_MATH=1 \
-D ENABLE_PRECOMPILED_HEADERS=OFF \
-D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF \
-D BUILD_opencv_java=OFF \
-DBUILD_opencv_bgsegm=OFF -DBUILD_opencv_bioinspired=OFF -DBUILD_opencv_ccalib=OFF -DBUILD_opencv_cnn_3dobj=OFF -DBUILD_opencv_contrib_world=OFF -DBUILD_opencv_cvv=OFF -DBUILD_opencv_datasets=OFF -DBUILD_openc
v_dnn=OFF -DBUILD_opencv_dnns_easily_fooled=OFF -DBUILD_opencv_dpm=OFF -DBUILD_opencv_face=OFF -DBUILD_opencv_fuzzy=OFF -DBUILD_opencv_hdf=OFF -DBUILD_opencv_line_descriptor=OFF -DBUILD_opencv_matlab=OFF -DBUILD_o
pencv_optflow=OFF -DBUILD_opencv_plot=OFF -DBUILD_opencv_README.md=OFF -DBUILD_opencv_reg=OFF -DBUILD_opencv_rgbd=OFF -DBUILD_opencv_saliency=OFF -DBUILD_opencv_sfm=OFF -DBUILD_opencv_stereo=OFF -DBUILD_opencv_str
uctured_light=OFF -DBUILD_opencv_surface_matching=OFF -DBUILD_opencv_text=OFF -DBUILD_opencv_tracking=OFF -DBUILD_opencv_viz=OFF -DBUILD_opencv_xfeatures2d=OFF -DBUILD_opencv_ximgproc=OFF -DBUILD_opencv_xobjdetect
=OFF -DBUILD_opencv_xphoto=OFF ..
So as confirmed in the answer and comment thread with #NAmorim, there are no accessible Python bindings to OpenCV's various CUDA modules.
I was able to get around this restriction by using Cython to gain access to the CUDA functions I needed and implementing the necessary logic to convert my Python objects (mainly NumPy arrays) to OpenCV C/C++ objects and back.
Working Code
I first wrote a Cython definition file, GpuWrapper.pxd. The purpose of this file is to reference external C/C++ classes and methods, such as the CUDA methods I am interested in.
from libcpp cimport bool
from cpython.ref cimport PyObject
# References PyObject to OpenCV object conversion code borrowed from OpenCV's own conversion file, cv2.cpp
cdef extern from 'pyopencv_converter.cpp':
cdef PyObject* pyopencv_from(const Mat& m)
cdef bool pyopencv_to(PyObject* o, Mat& m)
cdef extern from 'opencv2/imgproc.hpp' namespace 'cv':
cdef enum InterpolationFlags:
INTER_NEAREST = 0
cdef enum ColorConversionCodes:
COLOR_BGR2GRAY
cdef extern from 'opencv2/core/core.hpp':
cdef int CV_8UC1
cdef int CV_32FC1
cdef extern from 'opencv2/core/core.hpp' namespace 'cv':
cdef cppclass Size_[T]:
Size_() except +
Size_(T width, T height) except +
T width
T height
ctypedef Size_[int] Size2i
ctypedef Size2i Size
cdef cppclass Scalar[T]:
Scalar() except +
Scalar(T v0) except +
cdef extern from 'opencv2/core/core.hpp' namespace 'cv':
cdef cppclass Mat:
Mat() except +
void create(int, int, int) except +
void* data
int rows
int cols
cdef extern from 'opencv2/core/cuda.hpp' namespace 'cv::cuda':
cdef cppclass GpuMat:
GpuMat() except +
void upload(Mat arr) except +
void download(Mat dst) const
cdef cppclass Stream:
Stream() except +
cdef extern from 'opencv2/cudawarping.hpp' namespace 'cv::cuda':
cdef void warpPerspective(GpuMat src, GpuMat dst, Mat M, Size dsize, int flags, int borderMode, Scalar borderValue, Stream& stream)
# Function using default values
cdef void warpPerspective(GpuMat src, GpuMat dst, Mat M, Size dsize, int flags)
We also need the ability to convert Python objects to OpenCV objects. I copied the first couple hundred lines from OpenCV's modules/python/src2/cv2.cpp. You can find that code below in the appendix.
We can finally write our Cython wrapper methods to call OpenCV's CUDA functions! This is part of the Cython implementation file, GpuWrapper.pyx.
import numpy as np # Import Python functions, attributes, submodules of numpy
cimport numpy as np # Import numpy C/C++ API
def cudaWarpPerspectiveWrapper(np.ndarray[np.uint8_t, ndim=2] _src,
np.ndarray[np.float32_t, ndim=2] _M,
_size_tuple,
int _flags=INTER_NEAREST):
# Create GPU/device InputArray for src
cdef Mat src_mat
cdef GpuMat src_gpu
pyopencv_to(<PyObject*> _src, src_mat)
src_gpu.upload(src_mat)
# Create CPU/host InputArray for M
cdef Mat M_mat = Mat()
pyopencv_to(<PyObject*> _M, M_mat)
# Create Size object from size tuple
# Note that size/shape in Python is handled in row-major-order -- therefore, width is [1] and height is [0]
cdef Size size = Size(<int> _size_tuple[1], <int> _size_tuple[0])
# Create empty GPU/device OutputArray for dst
cdef GpuMat dst_gpu = GpuMat()
warpPerspective(src_gpu, dst_gpu, M_mat, size, INTER_NEAREST)
# Get result of dst
cdef Mat dst_host
dst_gpu.download(dst_host)
cdef np.ndarray out = <np.ndarray> pyopencv_from(dst_host)
return out
After running a setup script to cythonize and compile this code (see apendix), we can import GpuWrapper as a Python module and run cudaWarpPerspectiveWrapper. This allowed me to run the code through CUDA with only a mismatch of 0.34722222222222854% -- quite exciting!
References (can only post max of 2)
What is the easiest way to convert ndarray into cv::Mat?
Writing Python bindings for C++ code that use OpenCV
Appendix
pyopencv_converter.cpp
#include <Python.h>
#include "numpy/ndarrayobject.h"
#include "opencv2/core/core.hpp"
static PyObject* opencv_error = 0;
// === FAIL MESSAGE ====================================================================================================
static int failmsg(const char *fmt, ...)
{
char str[1000];
va_list ap;
va_start(ap, fmt);
vsnprintf(str, sizeof(str), fmt, ap);
va_end(ap);
PyErr_SetString(PyExc_TypeError, str);
return 0;
}
struct ArgInfo
{
const char * name;
bool outputarg;
// more fields may be added if necessary
ArgInfo(const char * name_, bool outputarg_)
: name(name_)
, outputarg(outputarg_) {}
// to match with older pyopencv_to function signature
operator const char *() const { return name; }
};
// === THREADING =======================================================================================================
class PyAllowThreads
{
public:
PyAllowThreads() : _state(PyEval_SaveThread()) {}
~PyAllowThreads()
{
PyEval_RestoreThread(_state);
}
private:
PyThreadState* _state;
};
class PyEnsureGIL
{
public:
PyEnsureGIL() : _state(PyGILState_Ensure()) {}
~PyEnsureGIL()
{
PyGILState_Release(_state);
}
private:
PyGILState_STATE _state;
};
// === ERROR HANDLING ==================================================================================================
#define ERRWRAP2(expr) \
try \
{ \
PyAllowThreads allowThreads; \
expr; \
} \
catch (const cv::Exception &e) \
{ \
PyErr_SetString(opencv_error, e.what()); \
return 0; \
}
// === USING NAMESPACE CV ==============================================================================================
using namespace cv;
// === NUMPY ALLOCATOR =================================================================================================
class NumpyAllocator : public MatAllocator
{
public:
NumpyAllocator() { stdAllocator = Mat::getStdAllocator(); }
~NumpyAllocator() {}
UMatData* allocate(PyObject* o, int dims, const int* sizes, int type, size_t* step) const
{
UMatData* u = new UMatData(this);
u->data = u->origdata = (uchar*)PyArray_DATA((PyArrayObject*) o);
npy_intp* _strides = PyArray_STRIDES((PyArrayObject*) o);
for( int i = 0; i < dims - 1; i++ )
step[i] = (size_t)_strides[i];
step[dims-1] = CV_ELEM_SIZE(type);
u->size = sizes[0]*step[0];
u->userdata = o;
return u;
}
UMatData* allocate(int dims0, const int* sizes, int type, void* data, size_t* step, int flags, UMatUsageFlags usageFlags) const
{
if( data != 0 )
{
CV_Error(Error::StsAssert, "The data should normally be NULL!");
// probably this is safe to do in such extreme case
return stdAllocator->allocate(dims0, sizes, type, data, step, flags, usageFlags);
}
PyEnsureGIL gil;
int depth = CV_MAT_DEPTH(type);
int cn = CV_MAT_CN(type);
const int f = (int)(sizeof(size_t)/8);
int typenum = depth == CV_8U ? NPY_UBYTE : depth == CV_8S ? NPY_BYTE :
depth == CV_16U ? NPY_USHORT : depth == CV_16S ? NPY_SHORT :
depth == CV_32S ? NPY_INT : depth == CV_32F ? NPY_FLOAT :
depth == CV_64F ? NPY_DOUBLE : f*NPY_ULONGLONG + (f^1)*NPY_UINT;
int i, dims = dims0;
cv::AutoBuffer<npy_intp> _sizes(dims + 1);
for( i = 0; i < dims; i++ )
_sizes[i] = sizes[i];
if( cn > 1 )
_sizes[dims++] = cn;
PyObject* o = PyArray_SimpleNew(dims, _sizes, typenum);
if(!o)
CV_Error_(Error::StsError, ("The numpy array of typenum=%d, ndims=%d can not be created", typenum, dims));
return allocate(o, dims0, sizes, type, step);
}
bool allocate(UMatData* u, int accessFlags, UMatUsageFlags usageFlags) const
{
return stdAllocator->allocate(u, accessFlags, usageFlags);
}
void deallocate(UMatData* u) const
{
if(!u)
return;
PyEnsureGIL gil;
CV_Assert(u->urefcount >= 0);
CV_Assert(u->refcount >= 0);
if(u->refcount == 0)
{
PyObject* o = (PyObject*)u->userdata;
Py_XDECREF(o);
delete u;
}
}
const MatAllocator* stdAllocator;
};
// === ALLOCATOR INITIALIZATION ========================================================================================
NumpyAllocator g_numpyAllocator;
// === CONVERTOR FUNCTIONS =============================================================================================
template<typename T> static
bool pyopencv_to(PyObject* obj, T& p, const char* name = "<unknown>");
template<typename T> static
PyObject* pyopencv_from(const T& src);
enum { ARG_NONE = 0, ARG_MAT = 1, ARG_SCALAR = 2 };
// special case, when the convertor needs full ArgInfo structure
static bool pyopencv_to(PyObject* o, Mat& m, const ArgInfo info)
{
bool allowND = true;
if(!o || o == Py_None)
{
if( !m.data )
m.allocator = &g_numpyAllocator;
return true;
}
if( PyInt_Check(o) )
{
double v[] = {static_cast<double>(PyInt_AsLong((PyObject*)o)), 0., 0., 0.};
m = Mat(4, 1, CV_64F, v).clone();
return true;
}
if( PyFloat_Check(o) )
{
double v[] = {PyFloat_AsDouble((PyObject*)o), 0., 0., 0.};
m = Mat(4, 1, CV_64F, v).clone();
return true;
}
if( PyTuple_Check(o) )
{
int i, sz = (int)PyTuple_Size((PyObject*)o);
m = Mat(sz, 1, CV_64F);
for( i = 0; i < sz; i++ )
{
PyObject* oi = PyTuple_GET_ITEM(o, i);
if( PyInt_Check(oi) )
m.at<double>(i) = (double)PyInt_AsLong(oi);
else if( PyFloat_Check(oi) )
m.at<double>(i) = (double)PyFloat_AsDouble(oi);
else
{
failmsg("%s is not a numerical tuple", info.name);
m.release();
return false;
}
}
return true;
}
if( !PyArray_Check(o) )
{
failmsg("%s is not a numpy array, neither a scalar", info.name);
return false;
}
PyArrayObject* oarr = (PyArrayObject*) o;
bool needcopy = false, needcast = false;
int typenum = PyArray_TYPE(oarr), new_typenum = typenum;
int type = typenum == NPY_UBYTE ? CV_8U :
typenum == NPY_BYTE ? CV_8S :
typenum == NPY_USHORT ? CV_16U :
typenum == NPY_SHORT ? CV_16S :
typenum == NPY_INT ? CV_32S :
typenum == NPY_INT32 ? CV_32S :
typenum == NPY_FLOAT ? CV_32F :
typenum == NPY_DOUBLE ? CV_64F : -1;
if( type < 0 )
{
if( typenum == NPY_INT64 || typenum == NPY_UINT64 || typenum == NPY_LONG )
{
needcopy = needcast = true;
new_typenum = NPY_INT;
type = CV_32S;
}
else
{
failmsg("%s data type = %d is not supported", info.name, typenum);
return false;
}
}
#ifndef CV_MAX_DIM
const int CV_MAX_DIM = 32;
#endif
int ndims = PyArray_NDIM(oarr);
if(ndims >= CV_MAX_DIM)
{
failmsg("%s dimensionality (=%d) is too high", info.name, ndims);
return false;
}
int size[CV_MAX_DIM+1];
size_t step[CV_MAX_DIM+1];
size_t elemsize = CV_ELEM_SIZE1(type);
const npy_intp* _sizes = PyArray_DIMS(oarr);
const npy_intp* _strides = PyArray_STRIDES(oarr);
bool ismultichannel = ndims == 3 && _sizes[2] <= CV_CN_MAX;
for( int i = ndims-1; i >= 0 && !needcopy; i-- )
{
// these checks handle cases of
// a) multi-dimensional (ndims > 2) arrays, as well as simpler 1- and 2-dimensional cases
// b) transposed arrays, where _strides[] elements go in non-descending order
// c) flipped arrays, where some of _strides[] elements are negative
// the _sizes[i] > 1 is needed to avoid spurious copies when NPY_RELAXED_STRIDES is set
if( (i == ndims-1 && _sizes[i] > 1 && (size_t)_strides[i] != elemsize) ||
(i < ndims-1 && _sizes[i] > 1 && _strides[i] < _strides[i+1]) )
needcopy = true;
}
if( ismultichannel && _strides[1] != (npy_intp)elemsize*_sizes[2] )
needcopy = true;
if (needcopy)
{
if (info.outputarg)
{
failmsg("Layout of the output array %s is incompatible with cv::Mat (step[ndims-1] != elemsize or step[1] != elemsize*nchannels)", info.name);
return false;
}
if( needcast ) {
o = PyArray_Cast(oarr, new_typenum);
oarr = (PyArrayObject*) o;
}
else {
oarr = PyArray_GETCONTIGUOUS(oarr);
o = (PyObject*) oarr;
}
_strides = PyArray_STRIDES(oarr);
}
// Normalize strides in case NPY_RELAXED_STRIDES is set
size_t default_step = elemsize;
for ( int i = ndims - 1; i >= 0; --i )
{
size[i] = (int)_sizes[i];
if ( size[i] > 1 )
{
step[i] = (size_t)_strides[i];
default_step = step[i] * size[i];
}
else
{
step[i] = default_step;
default_step *= size[i];
}
}
// handle degenerate case
if( ndims == 0) {
size[ndims] = 1;
step[ndims] = elemsize;
ndims++;
}
if( ismultichannel )
{
ndims--;
type |= CV_MAKETYPE(0, size[2]);
}
if( ndims > 2 && !allowND )
{
failmsg("%s has more than 2 dimensions", info.name);
return false;
}
m = Mat(ndims, size, type, PyArray_DATA(oarr), step);
m.u = g_numpyAllocator.allocate(o, ndims, size, type, step);
m.addref();
if( !needcopy )
{
Py_INCREF(o);
}
m.allocator = &g_numpyAllocator;
return true;
}
template<>
bool pyopencv_to(PyObject* o, Mat& m, const char* name)
{
return pyopencv_to(o, m, ArgInfo(name, 0));
}
template<>
PyObject* pyopencv_from(const Mat& m)
{
if( !m.data )
Py_RETURN_NONE;
Mat temp, *p = (Mat*)&m;
if(!p->u || p->allocator != &g_numpyAllocator)
{
temp.allocator = &g_numpyAllocator;
ERRWRAP2(m.copyTo(temp));
p = &temp;
}
PyObject* o = (PyObject*)p->u->userdata;
Py_INCREF(o);
return o;
}
setupGpuWrapper.py
import subprocess
import os
import numpy as np
from distutils.core import setup, Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext
"""
Run setup with the following command:
```
python setupGpuWrapper.py build_ext --inplace
```
"""
# Determine current directory of this setup file to find our module
CUR_DIR = os.path.dirname(__file__)
# Use pkg-config to determine library locations and include locations
opencv_libs_str = subprocess.check_output("pkg-config --libs opencv".split()).decode()
opencv_incs_str = subprocess.check_output("pkg-config --cflags opencv".split()).decode()
# Parse into usable format for Extension call
opencv_libs = [str(lib) for lib in opencv_libs_str.strip().split()]
opencv_incs = [str(inc) for inc in opencv_incs_str.strip().split()]
extensions = [
Extension('GpuWrapper',
sources=[os.path.join(CUR_DIR, 'GpuWrapper.pyx')],
language='c++',
include_dirs=[np.get_include()] + opencv_incs,
extra_link_args=opencv_libs)
]
setup(
cmdclass={'build_ext': build_ext},
name="GpuWrapper",
ext_modules=cythonize(extensions)
)
I did some testing on this with OpenCV 4.0.0. #nchaumont mentioned that starting with OpenCV 4, there were Python bindings for CUDA included.
As of at least Open CV 4.1.0, possibly earlier, the default Python bindings include CUDA, provided that Open CV was built with CUDA support.
Most functionality appears to be exposed as cv2.cuda.thing (for example, cv2.cuda.cvtColor().)
Currently, they lack any online documentation - for example, the GPU Canny edge detector makes no mention of Python. You can use the help function at Python's REPL to see the C++ docs though, which should be mostly equivalent.
I used following way to access OpenCV's C++ CUDA methods in Python:
Create custom opencv_contrib module
Write C++ code to wrap the OpenCV CUDA method
Using OpenCV python bindings, expose your custom method
Build opencv with opencv_contrib
Run python code to test
I created a small github repo to demonstrate the same
Or, are these CUDA-accelerated functions already being used if I call cv2.threshold() in my Python code (rather than the regular, CPU-based implementation)
No, you have to explicitly call them from the GPU accelerated module. Calling cv2.threshold() will simply run the CPU version.
Since the python API of OpenCV wraps around C++ functions, checking the C++ API usually offers useful hints on where the functions/modules are.
For instance, by this transition guide you can see the API changes that were made from OpenCV 2.X to 3.X. Here, the GPU module on OpenCV 3.X can be accessed by cv2.cuda and cv2.gpu on previous versions. And the cuda module in 3.X is divided into several small pieces:
cuda - CUDA-accelerated Computer Vision
cudaarithm - Operations on Matrices
cudabgsegm - Background Segmentation
cudacodec - Video Encoding/Decoding
cudafeatures2d - Feature Detection and Description
cudafilters - Image Filtering
cudaimgproc - Image Processing
cudalegacy - Legacy support
cudaoptflow - Optical Flow
cudastereo - Stereo Correspondence
cudawarping - Image Warping
cudev - Device layer
You should search for these modules within cv2.
I have an if clause within a for loop in which I have defined state_out beforehand with:
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
And the if conditions are like this:
if (conn_ctr<sum*2){
*(state_out->data + i*state_out->strides[0]) = true;
}
else {
*(state_out->data + i*state_out->strides[0]) = false;
}
When commenting these out, state_out returns as an all-False Numpy array. There is a problem with this assignment that I fail to see. As far as I know, all within the struct PyArrayObject that are called here in this code are pointers, so after the pointer arithmetic, it should be pointing to the address I intend to write. (All if conditions in the code are built by reaching values in this manner, and I know it works, since I managed to printf input arrays' values.) Then if I want to assign a bool to one of these parts in the memory, I should assign it via *(pointer_intended) = true What am I missing?
EDIT: I have spotted that even if I don't reach those values even if I put some printf functions within:
if (conn_ctr<sum*2){
printf("True!\n");
}
else {
printf("False!\n");
}
I get a SegFault again.
Thanks a lot, an the rest of the code is here.
#include <Python.h>
#include "numpy/arrayobject.h"
#include <stdio.h>
#include <stdbool.h>
static PyObject* trace(PyObject *self, PyObject *args);
static char doc[] =
"This is the C extension for xor_masking routine. It interfaces with Python via C-Api, and calculates the"
"next state with C pointer arithmetic";
static PyMethodDef TraceMethods[] = {
{"trace", trace, METH_VARARGS, doc},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC
inittrace(void)
{
(void) Py_InitModule("trace", TraceMethods);
import_array();
}
static PyObject* trace(PyObject *self, PyObject *args){
PyObject *adjacency ,*mask, *state;
PyArrayObject *adjacency_arr, *mask_arr, *state_arr, *state_out;
if (!PyArg_ParseTuple(args,"OOO:trace", &adjacency, &mask, &state)) return NULL;
adjacency_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(adjacency, NPY_BOOL,2,2);
if (adjacency_arr == NULL) return NULL;
mask_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(mask, NPY_BOOL,2,2);
if (mask_arr == NULL) return NULL;
state_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(state, NPY_BOOL,1,1);
if (state_arr == NULL) return NULL;
int dims[2], dims_new[1];
dims[0] = adjacency_arr -> dimensions[0];
dims[1] = adjacency_arr -> dimensions[1];
dims_new[0] = adjacency_arr -> dimensions[0];
if (!(dims[0]==dims[1] && mask_arr -> dimensions[0] == dims[0]
&& mask_arr -> dimensions[1] == dims[0]
&& state_arr -> dimensions[0] == dims[0]))
return NULL;
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
int i,j;
for(i=0;i<dims[0];i++){
int sum = 0;
int conn_ctr = 0;
for(j=0;j<dims[1];j++){
bool adj_value = (adjacency_arr->data + i*adjacency_arr->strides[0]
+j*adjacency_arr->strides[1]);
if (*(bool *) adj_value == true){
bool mask_value = (mask_arr->data + i*mask_arr->strides[0]
+j*mask_arr->strides[1]);
bool state_value = (state_arr->data + j*state_arr->strides[0]);
if ( (*(bool *) mask_value ^ *(bool *)state_value) == true){
sum++;
}
conn_ctr++;
}
}
if (conn_ctr<sum*2){
}
else {
}
}
Py_DECREF(adjacency_arr);
Py_DECREF(mask_arr);
Py_DECREF(state_arr);
return PyArray_Return(state_out);
}
if (conn_ctr<sum*2){
*(state_out->data + i*state_out->strides[0]) = true;
}
else {
*(state_out->data + i*state_out->strides[0]) = false;
}
Here, I naively make a pointer arithmetic, state_out->data is a pointer to the beginning of data, it is defined to be a pointer of char:SciPy Doc - Python Types and C-Structures
typedef struct PyArrayObject {
PyObject_HEAD
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
...
} PyArrayObject;
Which a portion of I copied here. state_out->strides is a pointer to an array of length of the dimension of the array we have. This is a 1d array in this case. So when I make the pointer arithmetic (state_out->data + i*state_out->strides[0]) I certainly aim to calculate the pointer that points the ith value of the array, but I failed to give the type of the pointer, so the
I had tried :
NPY_BOOL *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
which the variables are pointing towards the values that I am interested in my for loop, and state_out_ptr is the one that I am writing to. I had thought that since I state that the
constituents of these arrays are of type NPY_BOOL, the pointers that point to the data within the array would be of type NPY_BOOL also. This fails with a SegFault when one is working with data directly manipulating the memory. This is from the fact that NPY_BOOL is an enum for an integer (as pv kindly stated in the comments.) for NumPy to use internally,.There is a C typedef npy_bool in order to use within the code for boolean values. Scipy Docs. When I introduced my pointers with the type
npy_bool *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
Segmentation fault disappeared, and I succeeded in manipulating and returning a Numpy Array.
I'm not an expert, but this solved my issue, point out if I'm wrong.
The part that has changed in the source code is:
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
npy_bool *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
npy_intp i,j;
for(i=0;i<dims[0];i++){
npy_int sum = 0;
npy_int conn_ctr = 0;
for(j=0;j<dims[1];j++){
adj_value_ptr = (adjacency_arr->data + i*adjacency_arr->strides[0]
+j*adjacency_arr->strides[1]);
if (*adj_value_ptr == true){
mask_value_ptr = (mask_arr->data + i*mask_arr->strides[0]
+j*mask_arr->strides[1]);
state_value_ptr = (state_arr->data + j*state_arr->strides[0]);
if ( (*(bool *) mask_value_ptr ^ *(bool *)state_value_ptr) == true){
sum++;
}
conn_ctr++;
}
}
state_out_ptr = (state_out->data + i*state_out->strides[0]);
if (conn_ctr < sum*2){
*state_out_ptr = true;
}
else {
*state_out_ptr = false;
}
}
I found the bottleneck in my python code, played around with psycho etc. Then decided to write a c/c++ extension for performance.
With the help of swig you almost don't need to care about arguments etc. Everything works fine.
Now my question: swig creates a quite large py-file which does a lot of 'checkings' and 'PySwigObject' before calling the actual .pyd or .so code.
Does anyone of you have any experience whether there is some more performance to gain if you hand-write this file or let swig do it.
You should consider Boost.Python if you are not planning to generate bindings for other languages as well with swig.
If you have a lot of functions and classes to bind, Py++ is a great tool that automatically generates the needed code to make the bindings.
Pybindgen may also be an option, but it's a new project and less complete that Boost.Python.
Edit:
Maybe I need to be more explicit about pro and cons.
Swig:
pro: you can generate bindings for many scripting languages.
cons: I don't like the way the parser works. I don't know if the made some progress but two years ago the C++ parser was quite limited. Most of the time I had to copy/past my .h headers add some % characters and give extra hints to the swig parser.
I was also needed to deal with the Python C-API from time to time for (not so) complicated type conversions.
I'm not using it anymore.
Boost.Python:
pro:
It's a very complete library. It allows you to do almost everything that is possible with the C-API, but in C++. I never had to write C-API code with this library. I also never encountered bug due to the library. Code for bindings either works like a charm or refuse compile.
It's probably one of the best solutions currently available if you already have some C++ library to bind. But if you only have a small C function to rewrite, I would probably try with Cython.
cons: if you don't have a pre-compiled Boost.Python library you're going to use Bjam (sort of make replacement). I really hate Bjam and its syntax.
Python libraries created with B.P tend to become obese. It also takes a lot of time to compile them.
Py++ (discontinued): it's Boost.Python made easy. Py++ uses a C++ parser to read your code and then generates Boost.Python code automatically. You also have a great support from its author (no it's not me ;-) ).
cons: only the problems due to Boost.Python itself. Update: As of 2014 this project now looks discontinued.
Pybindgen:
It generates the code dealing with the C-API. You can either describe functions and classes in a Python file, or let Pybindgen read your headers and generate bindings automatically (for this it uses pygccxml, a python library wrote by the author of Py++).
cons: it's a young project, with a smaller team than Boost.Python. There are still some limitations: you cannot use multiple inheritance for your C++ classes, Callbacks (not automatically, custom callback handling code can be written, though). Translation of Python exceptions to C.
It's definitely worth a good look.
A new one:
On 2009/01/20 the author of Py++ announced a new package for interfacing C/C++ code with python. It is based on ctypes. I didn't try it already but I will! Note: this project looks discontiued, as Py++.
CFFI: I did not know the existence of this one until very recently so for now I cannot give my opinion. It looks like you can define C functions in Python strings and call them directly from the same Python module.
Cython: This is the method I'm currently using in my projects. Basically you write code in special .pyx files. Those files are compiled (translated) into C code which in turn are compiled to CPython modules.
Cython code can look like regular Python (and in fact pure Python are valid .pyx Cython files), but you can also more information like variable types. This optional typing allows Cython to generate faster C code. Code in Cython files can call both pure Python functions but also C and C++ functions (and also C++ methods).
It took me some time to think in Cython, that in the same code call C and C++ function, mix Python and C variables, and so on. But it's a very powerful language, with an active (in 2014) and friendly community.
SWIG 2.0.4 has introduced a new -builtin option that improves performance.
I did some benchmarking using an example program that does a lot of fast calls to a C++ extension.
I built the extension using boost.python, PyBindGen, SIP and SWIG with and without the -builtin option. Here are the results (average of 100 runs):
SWIG with -builtin 2.67s
SIP 2.70s
PyBindGen 2.74s
boost.python 3.07s
SWIG without -builtin 4.65s
SWIG used to be slowest. With the new -builtin option, SWIG seems to be fastest.
For sure you will always have a performance gain doing this by hand, but the gain will be very small compared to the effort required to do this. I don't have any figure to give you but I don't recommend this, because you will need to maintain the interface by hand, and this is not an option if your module is large!
You did the right thing to chose to use a scripting language because you wanted rapid development. This way you've avoided the early optimization syndrome, and now you want to optimize bottleneck parts, great! But if you do the C/python interface by hand you will fall in the early optimization syndrome for sure.
If you want something with less interface code, you can think about creating a dll from your C code, and use that library directly from python with cstruct.
Consider also Cython if you want to use only python code in your program.
Using Cython is pretty good. You can write your C extension with a Python-like syntax and have it generate C code. Boilerplate included. Since you have the code already in python, you have to do just a few changes to your bottleneck code and C code will be generated from it.
Example. hello.pyx:
cdef int hello(int a, int b):
return a + b
That generates 601 lines of boilerplate code:
/* Generated by Cython 0.10.3 on Mon Jan 19 08:24:44 2009 */
#define PY_SSIZE_T_CLEAN
#include "Python.h"
#include "structmember.h"
#ifndef PY_LONG_LONG
#define PY_LONG_LONG LONG_LONG
#endif
#ifndef DL_EXPORT
#define DL_EXPORT(t) t
#endif
#if PY_VERSION_HEX < 0x02040000
#define METH_COEXIST 0
#endif
#if PY_VERSION_HEX < 0x02050000
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
#define PyInt_FromSsize_t(z) PyInt_FromLong(z)
#define PyInt_AsSsize_t(o) PyInt_AsLong(o)
#define PyNumber_Index(o) PyNumber_Int(o)
#define PyIndex_Check(o) PyNumber_Check(o)
#endif
#if PY_VERSION_HEX < 0x02060000
#define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)
#define PyVarObject_HEAD_INIT(type, size) \
PyObject_HEAD_INIT(type) size,
#define PyType_Modified(t)
typedef struct {
void *buf;
PyObject *obj;
Py_ssize_t len;
Py_ssize_t itemsize;
int readonly;
int ndim;
char *format;
Py_ssize_t *shape;
Py_ssize_t *strides;
Py_ssize_t *suboffsets;
void *internal;
} Py_buffer;
#define PyBUF_SIMPLE 0
#define PyBUF_WRITABLE 0x0001
#define PyBUF_LOCK 0x0002
#define PyBUF_FORMAT 0x0004
#define PyBUF_ND 0x0008
#define PyBUF_STRIDES (0x0010 | PyBUF_ND)
#define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES)
#define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES)
#define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES)
#define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES)
#endif
#if PY_MAJOR_VERSION < 3
#define __Pyx_BUILTIN_MODULE_NAME "__builtin__"
#else
#define __Pyx_BUILTIN_MODULE_NAME "builtins"
#endif
#if PY_MAJOR_VERSION >= 3
#define Py_TPFLAGS_CHECKTYPES 0
#define Py_TPFLAGS_HAVE_INDEX 0
#endif
#if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3)
#define Py_TPFLAGS_HAVE_NEWBUFFER 0
#endif
#if PY_MAJOR_VERSION >= 3
#define PyBaseString_Type PyUnicode_Type
#define PyString_Type PyBytes_Type
#define PyInt_Type PyLong_Type
#define PyInt_Check(op) PyLong_Check(op)
#define PyInt_CheckExact(op) PyLong_CheckExact(op)
#define PyInt_FromString PyLong_FromString
#define PyInt_FromUnicode PyLong_FromUnicode
#define PyInt_FromLong PyLong_FromLong
#define PyInt_FromSize_t PyLong_FromSize_t
#define PyInt_FromSsize_t PyLong_FromSsize_t
#define PyInt_AsLong PyLong_AsLong
#define PyInt_AS_LONG PyLong_AS_LONG
#define PyInt_AsSsize_t PyLong_AsSsize_t
#define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask
#define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask
#define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y)
#else
#define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y)
#define PyBytes_Type PyString_Type
#endif
#if PY_MAJOR_VERSION >= 3
#define PyMethod_New(func, self, klass) PyInstanceMethod_New(func)
#endif
#if !defined(WIN32) && !defined(MS_WINDOWS)
#ifndef __stdcall
#define __stdcall
#endif
#ifndef __cdecl
#define __cdecl
#endif
#else
#define _USE_MATH_DEFINES
#endif
#ifdef __cplusplus
#define __PYX_EXTERN_C extern "C"
#else
#define __PYX_EXTERN_C extern
#endif
#include <math.h>
#define __PYX_HAVE_API__helloworld
#ifdef __GNUC__
#define INLINE __inline__
#elif _WIN32
#define INLINE __inline
#else
#define INLINE
#endif
typedef struct
{PyObject **p; char *s; long n;
char is_unicode; char intern; char is_identifier;}
__Pyx_StringTabEntry; /*proto*/
static int __pyx_skip_dispatch = 0;
/* Type Conversion Predeclarations */
#if PY_MAJOR_VERSION < 3
#define __Pyx_PyBytes_FromString PyString_FromString
#define __Pyx_PyBytes_AsString PyString_AsString
#else
#define __Pyx_PyBytes_FromString PyBytes_FromString
#define __Pyx_PyBytes_AsString PyBytes_AsString
#endif
#define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False))
static INLINE int __Pyx_PyObject_IsTrue(PyObject* x);
static INLINE PY_LONG_LONG __pyx_PyInt_AsLongLong(PyObject* x);
static INLINE unsigned PY_LONG_LONG __pyx_PyInt_AsUnsignedLongLong(PyObject* x);
static INLINE Py_ssize_t __pyx_PyIndex_AsSsize_t(PyObject* b);
#define __pyx_PyInt_AsLong(x) (PyInt_CheckExact(x) ? PyInt_AS_LONG(x) : PyInt_AsLong(x))
#define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x))
static INLINE unsigned char __pyx_PyInt_unsigned_char(PyObject* x);
static INLINE unsigned short __pyx_PyInt_unsigned_short(PyObject* x);
static INLINE char __pyx_PyInt_char(PyObject* x);
static INLINE short __pyx_PyInt_short(PyObject* x);
static INLINE int __pyx_PyInt_int(PyObject* x);
static INLINE long __pyx_PyInt_long(PyObject* x);
static INLINE signed char __pyx_PyInt_signed_char(PyObject* x);
static INLINE signed short __pyx_PyInt_signed_short(PyObject* x);
static INLINE signed int __pyx_PyInt_signed_int(PyObject* x);
static INLINE signed long __pyx_PyInt_signed_long(PyObject* x);
static INLINE long double __pyx_PyInt_long_double(PyObject* x);
#ifdef __GNUC__
/* Test for GCC > 2.95 */
#if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95))
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#else /* __GNUC__ > 2 ... */
#define likely(x) (x)
#define unlikely(x) (x)
#endif /* __GNUC__ > 2 ... */
#else /* __GNUC__ */
#define likely(x) (x)
#define unlikely(x) (x)
#endif /* __GNUC__ */
static PyObject *__pyx_m;
static PyObject *__pyx_b;
static PyObject *__pyx_empty_tuple;
static int __pyx_lineno;
static int __pyx_clineno = 0;
static const char * __pyx_cfilenm= __FILE__;
static const char *__pyx_filename;
static const char **__pyx_f;
static void __Pyx_AddTraceback(const char *funcname); /*proto*/
/* Type declarations */
/* Module declarations from helloworld */
static int __pyx_f_10helloworld_hello(int, int); /*proto*/
/* Implementation of helloworld */
/* "/home/nosklo/devel/ctest/hello.pyx":1
* cdef int hello(int a, int b): # <<<<<<<<<<<<<<
* return a + b
*
*/
static int __pyx_f_10helloworld_hello(int __pyx_v_a, int __pyx_v_b) {
int __pyx_r;
/* "/home/nosklo/devel/ctest/hello.pyx":2
* cdef int hello(int a, int b):
* return a + b # <<<<<<<<<<<<<<
*
*/
__pyx_r = (__pyx_v_a + __pyx_v_b);
goto __pyx_L0;
__pyx_r = 0;
__pyx_L0:;
return __pyx_r;
}
static struct PyMethodDef __pyx_methods[] = {
{0, 0, 0, 0}
};
static void __pyx_init_filenames(void); /*proto*/
#if PY_MAJOR_VERSION >= 3
static struct PyModuleDef __pyx_moduledef = {
PyModuleDef_HEAD_INIT,
"helloworld",
0, /* m_doc */
-1, /* m_size */
__pyx_methods /* m_methods */,
NULL, /* m_reload */
NULL, /* m_traverse */
NULL, /* m_clear */
NULL /* m_free */
};
#endif
static int __Pyx_InitCachedBuiltins(void) {
return 0;
return -1;
}
static int __Pyx_InitGlobals(void) {
return 0;
return -1;
}
#if PY_MAJOR_VERSION < 3
PyMODINIT_FUNC inithelloworld(void); /*proto*/
PyMODINIT_FUNC inithelloworld(void)
#else
PyMODINIT_FUNC PyInit_helloworld(void); /*proto*/
PyMODINIT_FUNC PyInit_helloworld(void)
#endif
{
__pyx_empty_tuple = PyTuple_New(0);
if (unlikely(!__pyx_empty_tuple))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
/*--- Library function declarations ---*/
__pyx_init_filenames();
/*--- Initialize various global constants etc. ---*/
if (unlikely(__Pyx_InitGlobals() < 0))
{__pyx_filename = __pyx_f[0];
__pyx_lineno = 1;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;}
/*--- Module creation code ---*/
#if PY_MAJOR_VERSION < 3
__pyx_m = Py_InitModule4("helloworld", __pyx_methods, 0, 0, PYTHON_API_VERSION);
#else
__pyx_m = PyModule_Create(&__pyx_moduledef);
#endif
if (!__pyx_m)
{__pyx_filename = __pyx_f[0];
__pyx_lineno = 1; __pyx_clineno = __LINE__;
goto __pyx_L1_error;};
#if PY_MAJOR_VERSION < 3
Py_INCREF(__pyx_m);
#endif
__pyx_b = PyImport_AddModule(__Pyx_BUILTIN_MODULE_NAME);
if (!__pyx_b)
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;};
if (PyObject_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0)
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;};
/*--- Builtin init code ---*/
if (unlikely(__Pyx_InitCachedBuiltins() < 0))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
__pyx_skip_dispatch = 0;
/*--- Global init code ---*/
/*--- Function export code ---*/
/*--- Type init code ---*/
/*--- Type import code ---*/
/*--- Function import code ---*/
/*--- Execution code ---*/
/* "/home/nosklo/devel/ctest/hello.pyx":1
* cdef int hello(int a, int b): # <<<<<<<<<<<<<<
* return a + b
*
*/
#if PY_MAJOR_VERSION < 3
return;
#else
return __pyx_m;
#endif
__pyx_L1_error:;
__Pyx_AddTraceback("helloworld");
#if PY_MAJOR_VERSION >= 3
return NULL;
#endif
}
static const char *__pyx_filenames[] = {
"hello.pyx",
};
/* Runtime support code */
static void __pyx_init_filenames(void) {
__pyx_f = __pyx_filenames;
}
#include "compile.h"
#include "frameobject.h"
#include "traceback.h"
static void __Pyx_AddTraceback(const char *funcname) {
PyObject *py_srcfile = 0;
PyObject *py_funcname = 0;
PyObject *py_globals = 0;
PyObject *empty_string = 0;
PyCodeObject *py_code = 0;
PyFrameObject *py_frame = 0;
#if PY_MAJOR_VERSION < 3
py_srcfile = PyString_FromString(__pyx_filename);
#else
py_srcfile = PyUnicode_FromString(__pyx_filename);
#endif
if (!py_srcfile) goto bad;
if (__pyx_clineno) {
#if PY_MAJOR_VERSION < 3
py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname,
__pyx_cfilenm, __pyx_clineno);
#else
py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname,
__pyx_cfilenm, __pyx_clineno);
#endif
}
else {
#if PY_MAJOR_VERSION < 3
py_funcname = PyString_FromString(funcname);
#else
py_funcname = PyUnicode_FromString(funcname);
#endif
}
if (!py_funcname) goto bad;
py_globals = PyModule_GetDict(__pyx_m);
if (!py_globals) goto bad;
#if PY_MAJOR_VERSION < 3
empty_string = PyString_FromStringAndSize("", 0);
#else
empty_string = PyBytes_FromStringAndSize("", 0);
#endif
if (!empty_string) goto bad;
py_code = PyCode_New(
0, /*int argcount,*/
#if PY_MAJOR_VERSION >= 3
0, /*int kwonlyargcount,*/
#endif
0, /*int nlocals,*/
0, /*int stacksize,*/
0, /*int flags,*/
empty_string, /*PyObject *code,*/
__pyx_empty_tuple, /*PyObject *consts,*/
__pyx_empty_tuple, /*PyObject *names,*/
__pyx_empty_tuple, /*PyObject *varnames,*/
__pyx_empty_tuple, /*PyObject *freevars,*/
__pyx_empty_tuple, /*PyObject *cellvars,*/
py_srcfile, /*PyObject *filename,*/
py_funcname, /*PyObject *name,*/
__pyx_lineno, /*int firstlineno,*/
empty_string /*PyObject *lnotab*/
);
if (!py_code) goto bad;
py_frame = PyFrame_New(
PyThreadState_GET(), /*PyThreadState *tstate,*/
py_code, /*PyCodeObject *code,*/
py_globals, /*PyObject *globals,*/
0 /*PyObject *locals*/
);
if (!py_frame) goto bad;
py_frame->f_lineno = __pyx_lineno;
PyTraceBack_Here(py_frame);
bad:
Py_XDECREF(py_srcfile);
Py_XDECREF(py_funcname);
Py_XDECREF(empty_string);
Py_XDECREF(py_code);
Py_XDECREF(py_frame);
}
/* Type Conversion Functions */
static INLINE Py_ssize_t __pyx_PyIndex_AsSsize_t(PyObject* b) {
Py_ssize_t ival;
PyObject* x = PyNumber_Index(b);
if (!x) return -1;
ival = PyInt_AsSsize_t(x);
Py_DECREF(x);
return ival;
}
static INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {
if (x == Py_True) return 1;
else if (x == Py_False) return 0;
else return PyObject_IsTrue(x);
}
static INLINE PY_LONG_LONG __pyx_PyInt_AsLongLong(PyObject* x) {
if (PyInt_CheckExact(x)) {
return PyInt_AS_LONG(x);
}
else if (PyLong_CheckExact(x)) {
return PyLong_AsLongLong(x);
}
else {
PY_LONG_LONG val;
PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1;
val = __pyx_PyInt_AsLongLong(tmp);
Py_DECREF(tmp);
return val;
}
}
static INLINE unsigned PY_LONG_LONG __pyx_PyInt_AsUnsignedLongLong(PyObject* x) {
if (PyInt_CheckExact(x)) {
long val = PyInt_AS_LONG(x);
if (unlikely(val < 0)) {
PyErr_SetString(PyExc_TypeError, "Negative assignment to unsigned type.");
return (unsigned PY_LONG_LONG)-1;
}
return val;
}
else if (PyLong_CheckExact(x)) {
return PyLong_AsUnsignedLongLong(x);
}
else {
PY_LONG_LONG val;
PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1;
val = __pyx_PyInt_AsUnsignedLongLong(tmp);
Py_DECREF(tmp);
return val;
}
}
static INLINE unsigned char __pyx_PyInt_unsigned_char(PyObject* x) {
if (sizeof(unsigned char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
unsigned char val = (unsigned char)long_val;
if (unlikely((val != long_val) || (long_val < 0))) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to unsigned char");
return (unsigned char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE unsigned short __pyx_PyInt_unsigned_short(PyObject* x) {
if (sizeof(unsigned short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
unsigned short val = (unsigned short)long_val;
if (unlikely((val != long_val) || (long_val < 0))) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to unsigned short");
return (unsigned short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE char __pyx_PyInt_char(PyObject* x) {
if (sizeof(char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
char val = (char)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to char");
return (char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE short __pyx_PyInt_short(PyObject* x) {
if (sizeof(short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
short val = (short)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to short");
return (short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE int __pyx_PyInt_int(PyObject* x) {
if (sizeof(int) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
int val = (int)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to int");
return (int)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE long __pyx_PyInt_long(PyObject* x) {
if (sizeof(long) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
long val = (long)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to long");
return (long)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed char __pyx_PyInt_signed_char(PyObject* x) {
if (sizeof(signed char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed char val = (signed char)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed char");
return (signed char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed short __pyx_PyInt_signed_short(PyObject* x) {
if (sizeof(signed short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed short val = (signed short)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed short");
return (signed short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed int __pyx_PyInt_signed_int(PyObject* x) {
if (sizeof(signed int) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed int val = (signed int)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed int");
return (signed int)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed long __pyx_PyInt_signed_long(PyObject* x) {
if (sizeof(signed long) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed long val = (signed long)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed long");
return (signed long)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE long double __pyx_PyInt_long_double(PyObject* x) {
if (sizeof(long double) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
long double val = (long double)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to long double");
return (long double)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
An observation: Based on the benchmarking conducted by the pybindgen developers, there is no significant difference between boost.python and swig. I haven't done my own benchmarking to verify how much of this depends on the proper use of the boost.python functionality.
Note also that there may be a reason that pybindgen seems to be in general quite a bit faster than swig and boost.python: it may not produce as versatile a binding as the other two. For instance, exception propagation, call argument type checking, etc. I haven't had a chance to use pybindgen yet but I intend to.
Boost is in general quite big package to install, and last I saw you can't just install boost python you pretty much need the whole Boost library. As others have mentioned compilation will be slow due to heavy use of template programming, which also means typically rather cryptic error messages at compile time.
Summary: given how easy SWIG is to install and use, that it generates decent binding that is robust and versatile, and that one interface file allows your C++ DLL to be available from several other languages like LUA, C#, and Java, I would favor it over boost.python. But unless you really need multi-language support I would take a close look at PyBindGen because of its purported speed, and pay close attention to robustness and versatility of binding it generates.
There be dragons here. Don't swig, don't boost. For any complicated project the code you have to fill in yourself to make them work becomes unmanageable quickly. If it's a plain C API to your library (no classes), you can just use ctypes. It will be easy and painless, and you won't have to spend hours trawling through the documentation for these labyrinthine wrapper projects trying to find the one tiny note about the feature you need.
Since you are concerned with speed and overhead, I suggest considering PyBindGen .
I have experience using it to wrap a large internal C++ library. After trying SWIG, SIP, and Boost.Python I prefer PyBindGen for the following reasons:
A PyBindGen wrapper is pure-Python, no need to learn another file format
PyBindGen generates Python C API calls directly, there is no speed-robbing indirection layer like SWIG.
The generated C code is clean and simple to understand. I like Cython too, but trying to read its C output can be difficult at times.
STL sequence containers are supported (we use a lot of std::vector's)
If its not a big extension, boost::python might also be an option, it executes faster than swig, because you control what's happening, but it'll take longer to dev.
Anyways swig's overhead is acceptable if the amount of work within a single call is large enough. For example if you issue is that you have some medium sized logic block you want to move to C/C++, but that block is called within a tight-loop, frequently, you might have to avoid swig, but I can't really think of any real-world examples except for scripted graphics shaders.
Before giving up on your python code, have a look at ShedSkin. They claim better performance than Psyco on some code (and also state that it is still experimental).
Else, there are several choices for binding C/C++ code to python.
Boost is lengthy to compile but is really the most flexible and easy to use solution.
I have never used SWIG but compared to boost, it's not as flexible as it's generic binding framework, not a framework dedicated to python.
Next choice is Pyrex. It allows to write pseudo python code that gets compiled as a C extension.
There is an article worth reading on the topic Cython, pybind11, cffi – which tool should you choose?
Quick recap for the impatient:
Cython compiles your python to C/C++ allowing you to embed your C/C++ into python code. Uses static binding. For python programmers.
pybind11 (and boost.python) is the opposite. Bind your stuff at compile time from the C++ side. For C++ programmers.
CFFI allows you to bind the native stuff dynamically at runtime. Simple to use, but higher performance penalty.