Wrapping std::array in Cython and Exposing it to memory views - python

It seems that there is currently a pull request in Cython's repo to wrap c++ std::array but until then, I could use some help. I am currently wrapping the std::array like so:
cdef extern from "<array>" namespace "std" nogil:
cdef cppclass array2 "std::array<double, 2>":
array2() except+
double& operator[](size_t)
This works, but I have to loop over a cython memory view say, double arr[:], and copy the values one by one. Is there an easier way to do this? Essentially I'd like to do the following:
cdef double arr[2]
arr[0] = 1.0
arr[1] = 2.0
cdef array2 array2_arr = arr
#and the reverse
cdef array2 reverse
reverse[0] = 1.0
reverse[1] = 2.0
cdef double reverse_arr[2] = reverse
Is this completely unreasonable? As a result, working with std::array is extremely tedious because I have a to have a for-loop to copy values from cython to c++ and from c++ to cython. Furthermore, since cython doesn't give us the ability to have non-type template parameters, I have to define a wrapper for every variation of std::array in my code. Any suggestions on how to efficiently work with std::array would be greatly appreciated.
edit:
I'm now able to go from a memory view to an array2 type using the following:
def __cinit__(self, double[:] mem):
cdef array2 *arr = <array2 *>(&mem[0])
But it seems that no matter what I do, I cannot get cython to convert an array2 type to a memoryview:
cdef array2 arr = self.thisptr.getArray()
# error: '__pyx_t_1' declared as a pointer to a reference of type 'double &'
cdef double[::1] mview = <double[:2]>(&arr[0])
#OR
# Stop must be provided to indicate shape
cdef double[::1] mview = <double[::2]>(&arr[0])
Please help me figure out how to cast a C++ pointer to a memory view. Every combination I have tried to date has resulted in some kind of casting error.
EDIT:
I found that I am to perform the following syntax with no error using a newer version of Cython (I was using Cythong 0.22) and upgraded to 0.23.5.
cdef double[::1] mview = <double[:4]>(&arr[0])
However, if I attempt to return mview from the function I am using it in, I get garbage memory. Returning the memoryview to the pointer of my array loses scope and therefore destructs my array automatically. As soon as I figure out how to properly return my array, I'll attempt to update the official answer.

After much fiddling, I found the answer to my question.
Definition of array and class that uses array:
cdef extern from "<array>" namespace "std" nogil:
cdef cppclass array4 "std::array<int, 4>":
array4() except+
int& operator[](size_t)
cdef extern from "Rectangle.h" namespace "shapes":
cdef cppclass ArrayFun:
ArrayFun(array4&)
array4 getArray()
Python implementation
cdef class PyArrayFun:
cdef ArrayFun *thisptr # hold a C++ instance which we're wrapping
def __cinit__(self, int[:] mem):
#
# Conversion from memoryview to std::array<int,4>
#
cdef array4 *arr = <array4 *>(&mem[0])
self.thisptr = new ArrayFun(arr[0])
def getArray(self):
cdef array4 arr = self.thisptr.getArray()
#
# Conversion from std::array<int, 4> to memoryview
#
cdef int[::1] mview = <int[:4]>(&arr[0])
cdef int[::1] new_view = mview.copy()
for i in range(0,4):
print ("arr is ", arr[i])
print("new_view is ", new_view[i])
# A COPY MUST be returned because arr goes out of scope and is
# default destructed when this function exist. Therefore we have to
# copy again. This kinda of sucks because we have to copy the
# internal array out from C++, and then we have to copy the array
# out from Python, therefore 2 copies for one array.
return mview.copy()

Related

Pass any-dimensional numpy array to Cython class

I'm trying to pass an np.ndarray from Python to instantiate a Cython class. However, I can't work out how to do it for an any-dimensional array. I'd like my .pyx interface to look like:
wrapper.pyx:
import numpy as np
cimport numpy as np
cdef extern from "myClass.h":
cdef cppclass C_myClass "myClass":
void myClass(np.float32_t*, int*, int)
cdef class array:
cdef C_myClass* cython_class
cdef int shape[8] # max number of dimensions = 8
cdef int ndim
def __init__(self, np.ndarray[dtype=np.float32_t] numpy_array):
self.ndim = numpy_array.ndim
for dim in range(self.ndim):
self.shape[dim] = numpy_array.shape[dim]
self.cython_class = new C_myClass(&numpy_array[0], &self.shape[0], self.ndim)
def __del__(self):
del self.cython_class
Such that the class constructor can look like:
myClass.h:
myClass(float* array_, int* shape_, int ndim_);
Do any of you know how to handle an array of any dimensions within Cython, while still being able to get the array shape parameters (I don't want the user to have to flatten the array or pass in the array shapes themselves)?
There isn't a direct Cython type for "numpy array with any number of dimensions" (or indeed "typed memoryview with any number of dimensions").
Therefore, I suggest leaving the array argument to the constructor untyped and using the buffer protocol yourself.
The following code is untested, but should give you the right idea
from cpython.buffer cimport Py_buffer, PyObject_GetBuffer, PyBUF_ND, PyBUF_FORMAT, PyBuffer_Release
...
def __init__(self, array):
cdef Py_buffer buf
# shape and ndim can be passed as before - they don't
# need the array to be typed
self.ndim = numpy_array.ndim
for dim in range(self.ndim):
self.shape[dim] = numpy_array.shape[dim]
# Note that PyBUF_ND requires a C contiguous array. This looks
# to be an implicit requirement of your C++ class that you
# probably don't realize you have
PyObject_GetBuffer(array, buf, PyBUF_ND | PyBUF_FORMAT)
if not buf.format == "f":
PyBuffer_Release(buf)
raise TypeError("Numpy array must have dtype float32_t")
self.cython_class = new C_myClass(<float32_t*>buf.buf, &self.shape[0], self.ndim)
You then need to make sure you release the buffer with ByBuffer_Release.
It isn't clear whether C_myClass copies the data or holds a pointer to it.
if C_myClass copies the data then call PyBuffer_release straight away after constructing self.cython_class
if C_myClass holds a reference to the data then call PyBuffer_release in the destructor after self.cython_class is deleted.

How can I transfer a pointer variable from one cython module to another using a Python script

Assume we have a cython class A with a pointer to float like in
# A.pyx
cdef class A:
cdef float * ptr
We also have a cython class B in another module which needs access to the data under ptr:
# B.pyx
cdef class B:
cdef float * f_ptr
cpdef submit(self, ptr_var):
self.f_ptr= get_from( ptr_var ) # ???
The corresponding Python code using A and B might be something like
from A import A
from B import B
a = A()
b = B()
ptr = a.get_ptr()
b.submit(ptr)
How can we define get_ptr() and what would we use for get_from in B?
The solution is to wrap the pointer variable into a Python object. Module libc.stdint offers a type named uintptr_t which is an integer large enough for storing any kind of pointer. With this the solution might look as follows.
from libc.stdint cimport uintptr_t
cdef class A:
cdef float * ptr
def get_ptr(self):
return <uintptr_t>self.ptr
The expression in angle brackets <uintptr_t> corresponds to a cast to uintptr_t. In class B we then have to cast the variable back to a pointer to float.
from libc.stdint cimport uintptr_t
cdef class B:
cdef float * f_ptr
cpdef submit(self, uintptr_t ptr_var):
self.f_ptr= <float *>ptr_var
This works for any kind of pointers not only for pointers to float. One has to make sure that both modules (A and B) deal with the same kind of pointer since that information is lost once the pointer is wrapped in a uintptr_t.

What is this declaration in Cython? cdef PyObject **workers. Is it a pointer to a pointer?

I'm trying to utilize the concepts in this sample code to run some Cython code in parallel, but I can't seem to find any information in the Cython documentation about what this notation actually means.
cdef FLOAT_t[:] numbers
cdef unsigned int i
cdef INDEX_t n_workers
cdef PyObject **workers
cdef list ref_workers #Here to maintain references on Python side
def __init__(Parent self, INDEX_t n_workers, list numbers):
cdef INDEX_t i
self.n_workers = n_workers
self.numbers = np.array(numbers,dtype=float)
self.workers = <PyObject **>malloc(self.n_workers*cython.sizeof(cython.pointer(PyObject)))
#Populate worker pool
self.ref_workers = []
for i in range(self.n_workers):
self.ref_workers.append(Worker())
self.workers[i] = <PyObject*>self.ref_workers[i]
def __dealloc__(Parent self):
free(self.workers)
Does the ** notation mean that it is a pointer to a pointer of a PyObject? I understand that the <> notation is meant to dereference the pointer, so is this line:
self.workers = <PyObject **>malloc(self.n_workers*cython.sizeof(cython.pointer(PyObject)))
allocating an unknown amount of memory, since the size of the PyObject is unknown until self.workers is filled with dereferenced PyObjects?
Not only is it a pointer to a PyObject* pointer it's also the pointer to the first element of an array of PyObject* pointers.
You can see it is allocating memory to accommodate self.n_workers, presumably the workers are implemented using a PyObject derivative, so in memory you will have:
self.workers -> self.workers[0] (PyObject* for 1st worker)
self.workers[1] (PyObject* for 2nd worker)
....
self.workers[N-1] (PyObject* for last worker)

How to expose a function returning a C++ object to Python without copying the object?

In another question I learnt how to expose a function returning a C++ object to Python by copying the object. Having to perform a copy does not seem optimal. How can I return the object without copying it? i.e. how can I directly access the peaks returned by self.thisptr.getPeaks(data) in PyPeakDetection.getPeaks (defined in peak_detection_.pyx)?
peak_detection.hpp
#ifndef PEAKDETECTION_H
#define PEAKDETECTION_H
#include <string>
#include <map>
#include <vector>
#include "peak.hpp"
class PeakDetection
{
public:
PeakDetection(std::map<std::string, std::string> config);
std::vector<Peak> getPeaks(std::vector<float> &data);
private:
float _threshold;
};
#endif
peak_detection.cpp
#include <iostream>
#include <string>
#include "peak.hpp"
#include "peak_detection.hpp"
using namespace std;
PeakDetection::PeakDetection(map<string, string> config)
{
_threshold = stof(config["_threshold"]);
}
vector<Peak> PeakDetection::getPeaks(vector<float> &data){
Peak peak1 = Peak(10,1);
Peak peak2 = Peak(20,2);
vector<Peak> test;
test.push_back(peak1);
test.push_back(peak2);
return test;
}
peak.hpp
#ifndef PEAK_H
#define PEAK_H
class Peak {
public:
float freq;
float mag;
Peak() : freq(), mag() {}
Peak(float f, float m) : freq(f), mag(m) {}
};
#endif
peak_detection_.pyx
# distutils: language = c++
# distutils: sources = peak_detection.cpp
from libcpp.vector cimport vector
from libcpp.map cimport map
from libcpp.string cimport string
cdef extern from "peak.hpp":
cdef cppclass Peak:
Peak()
Peak(Peak &)
float freq, mag
cdef class PyPeak:
cdef Peak *thisptr
def __cinit__(self):
self.thisptr = new Peak()
def __dealloc__(self):
del self.thisptr
cdef copy(self, Peak &other):
del self.thisptr
self.thisptr = new Peak(other)
def __repr__(self):
return "<Peak: freq={0}, mag={1}>".format(self.freq, self.mag)
property freq:
def __get__(self): return self.thisptr.freq
def __set__(self, freq): self.thisptr.freq = freq
property mag:
def __get__(self): return self.thisptr.mag
def __set__(self, mag): self.thisptr.mag = mag
cdef extern from "peak_detection.hpp":
cdef cppclass PeakDetection:
PeakDetection(map[string,string])
vector[Peak] getPeaks(vector[float])
cdef class PyPeakDetection:
cdef PeakDetection *thisptr
def __cinit__(self, map[string,string] config):
self.thisptr = new PeakDetection(config)
def __dealloc__(self):
del self.thisptr
def getPeaks(self, data):
cdef Peak peak
cdef PyPeak new_peak
cdef vector[Peak] peaks = self.thisptr.getPeaks(data)
retval = []
for peak in peaks:
new_peak = PyPeak()
new_peak.copy(peak) # how can I avoid that copy?
retval.append(new_peak)
return retval
If you have a modern C++ compiler and can use rvalue references, move constructors and std::move it's pretty straight-forward. I think the easiest way is to create a Cython wrapper for the vector, and then use a move constructor to take hold of the contents of the vector.
All code shown goes in peak_detection_.pyx.
First wrap std::move. For simplicity I've just wrapped the one case we want (vector<Peak>) rather than messing about with templates.
cdef extern from "<utility>":
vector[Peak]&& move(vector[Peak]&&) # just define for peak rather than anything else
Second, create a vector wrapper class. This defines the Python functions necessary to access it like a list. It also defines a function to call the move assignment operator
cdef class PyPeakVector:
cdef vector[Peak] vec
cdef move_from(self, vector[Peak]&& move_this):
self.vec = move(move_this)
def __getitem__(self,idx):
return PyPeak2(self,idx)
def __len__(self):
return self.vec.size()
Then define the class the wraps the Peak. This is slightly different to your other class in that it doesn't own the Peak it wraps (the vector does). Otherwise, most of the functions remain the same
cdef class PyPeak2:
cdef int idx
cdef PyPeakVector vector # keep this alive, since it owns the peak rather that PyPeak2
def __cinit__(self,PyPeakVector vec,idx):
self.vector = vec
self.idx = idx
cdef Peak* getthisptr(self):
# lookup the pointer each time - it isn't generally safe
# to store pointers incase the vector is resized
return &self.vector.vec[self.idx]
# rest of functions as is
# don't define a destructor since we don't own the Peak
Finally, implement getPeaks()
cdef class PyPeakDetection:
# ...
def getPeaks(self, data):
cdef Peak peak
cdef PyPeak new_peak
cdef vector[Peak] peaks = self.thisptr.getPeaks(data)
retval = PyPeakVector()
retval.move_from(move(peaks))
return retval
Alternative approaches:
If Peak was nontrivial you could go for an approach where you call move on Peak rather that on the vector, as you construct your PyPeaks. For the case you have here move and copy will be equivalent for `Peak.
If you can't use C++11 features you'll need to change the interface a little. Instead of having your C++ getPeaks function return a vector have it take an empty vector reference (owned by PyPeakVector) as an input argument and write into it. Much of the rest of the wrapping remains the same.
There are two projects that accomplish interfacing with C++ code into Python that have withstood the test of time Boost.Python and SWIG. Both work by adding additional markup to pertinent C/C++ code and generating dynamically loaded python extension libraries (.so files) and the related python modules.
However, depending on your use case there may still be some additional markup that looks like "copying." However, the copying should not be as extensive and it will all be exposed in the C++ code rather than being explicitly copied verbatim in Cython/Pyrex.

Convert Python List to Vector<int> in Cython

I need to convert a Python list of ints to vector[int] in a cdef function to call another C function. I tried this:
cdef pylist_to_handles(hs):
cdef vector[int] o_vect
for h in hs:
o_vect.push_back(h)
return o_vect
This should work because I only need to call this from other cdef functions, but I get this error:
Cannot convert 'vector<int>' to Python object
What am I doing wrong ?
In Cython 0.17 using libcpp.vector, you can do this:
cdef vector[int] vect = hs
What you really have is this:
cdef object pylist_to_handles(hs):
...
return <object>o_vect
If you do not explicitily set a type, it is assumed to be a python object ("object" in code). As you see in the code, you're trying to cast vector[int] to an object, but Cython does not know how to handle that.
Just add a return type in cdef:
cdef vector[int] pylist_to_handles(hs):

Categories

Resources