Best choice for passing float arrays in Cython - python

Say there is a C++ class in which we would like to define a function to be called in python. On the python side the goal is being able to call this function with:
Input: of type 2D numpy-array(float32), or list of lists, or other suggestions
Output: of type 2D numpy-array(float32), or list of lists, or other suggestions
and if it helps latency/simplicity 1D array is also ok.
One would for example define a function in the header with:
bool func(const std::string& name);
which has string type as input and bool as output.
What can be a good choice with the above requirements to write in the header?
And finally, after the header file, what should be written in the pyx/pyd file for Cython?

Cython Input
The most natural Cython type to use for the input interface between Python and Cython would be a 2D typed memoryview. This will take a 2D numpy array, as well as any other 2D array type that exports the buffer interface (there aren't too many other types since Numpy is pretty ubiquitous, but some image-handling libraries have some alternatives).
I'd avoid using list-of-lists as an interface - the length of the second dimension is poorly defined. However, Numpy arrays are easily created from list-of-lists.
Cython Output
For output you return either a Cython memoryview, or a Numpy array (easily created from a memoryview with np.asarray(memview)). I'd probably return a Numpy array, but make a decision based on whether you want to make Numpy a hard dependency.
C++ interface
This is very difficult to answer without knowing about your code. If you have existing code you should just use the type that's most natural to that if at all possible.
You can get a pointer from your memoryview with &memview[0,0], and access its attributes .shape and .strides to get information about how the data is stored. (If you make the memoryview contiguous then you know strides from shape so it's simpler). You then need to decided whether to copy the data, or just use a pointer to the Python-owned data (if C++ only keeps the data for the duration of the function call then using a pointer is good).
Similar considerations apply to the output data, but it's hard to know without knowing what you're trying to do in C++.

Related

Best way to convert a std::vector< std::array<double, 3> > to Python object using Cython

I'm using Cython to wrap a C++ library. In the C++ code there is some data that represents a list of 3D vectors. It is stored in the object std::vector< std::array<double, 3> >. My current method to convert this into a python object is to loop over the vector and use the method arrayd3ToNumpy in the answer to my previous question on each element. This is quite slow, however, when the vector is very large. I'm not as worried about making copies of the data (since I believe the auto casting of vector to list creates a copy), but more about the speed of the casting.
This this case the data is a simple C type, continguous in memory. I'd therefore expose it to Python via a wrapper cdef class that has the buffer protocol.
There's a guide (with an example) in the Cython documentation. In your case you don't need to store ncols - it's just 3 defined by the array type.
The key advantage is that you can largely eliminate copying. You could use your type directly with Cython's typed memoryviews to quickly access the underlying data. You could also do np.asarray to create a Numpy array that just wraps the underlying data that already exists.

Conversion - pyBind11 - Passing a torch.Tensor() in Python into a C++ method expecting a std::vector

For a project I am working on, I require passing a PyTorch tensor (with dtype torch.int16) to a C++ method which expects an std::vector (basically, a list of integers).
I have the binding figured out, and I am already able to accomplish this task if I first convert my tensor into a numpy array (and use pybind11's numpy support), but for optimizing my code I want to be able to do this without this conversion to a numpy array.
Based on my research, I assume that I will have to write a custom type-caster. How should I proceed?

Proper way to cast numpy.matrix to C double pointer

What is the canonical way of getting a numpy matrix as an argument to a C function which takes a double pointer?
Context: I'm using numpy to validate some C code-to wit, I have a C function which takes a const double ** const, and I'm using ctypes to call the .so from Python.
I've tried:
func.argtypes = ctypeslib.ndpointer(dtype=double, ndim=2, flags="C_CONTIGUOUS")
and passed the numpy matrix directly (didn't work), as well as
func.argtypes = ctypes.POINTER(ctypes.POINTER(ctypes.c_double))
and then passed the numpy matrix via various casts. Casting led to the Python error
TypeError: _type_ must have storage info
Note: This question came up a few years ago here, but there was no completely successful resolution.
I think you're looking for the ctypes interface in numpy's ndarrays (or matrix for that matter). You may have a look here for more information on numpy's manual.
Please note that numpy C-API stores ndarrays (or matrices for that matter) with a single pointer (see http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#c.PyArrayObject). You cannot convert this single pointer into a double pointer in C, just because these are different types. Furthermore, numpy not only stores the data on these objects, it also stores information about the shape of the matrix and the strides (how the data is organised on the provided data pointer). Without knowing all these information, your code will not work on all conditions. For example, if you transpose a matrix before interfacing it to your code, you'll get unexpected results!
There are a couple of solutions depending on your settings:
If you can modify your library's API, change it so that you can pass numpy information concerning not only the data pointer, but also the shape of the matrix and its strides. Then, use the above link to figure out how to interface numpy/ctypes support with your new API.
If you cannot modify your API, I suggest you create a Python function based on ctypes that converts the contents of the numpy array into a double-pointer matrix created with ctypes itself (as suggested on this discussion). Also add support for the numpy object shape and stride so that you can run the conversion properly. After converting, pass the newly created double pointer structure to your original function.

what is a reason to use ndarray instead of python array

I build a class with some iteration over coming data. The data are in an array form without use of numpy objects. On my code I often use .append to create another array. At some point I changed one of the big array 1000x2000 to numpy.array. Now I have an error after error. I started to convert all of the arrays into ndarray but comments like .append does not work any more. I start to have a problems with pointing to rows, columns or cells. and have to rebuild all code.
I try to google an answer to the question: "what is and advantage of using ndarray over normal array" I can't find a sensible answer. Can you write when should I start to use ndarrays and if in your practice do you use both of them or stick to one only.
Sorry if the question is a novice level, but I am new to python, just try to move from Matlab and want to understand what are pros and cons. Thanks
NumPy and Python arrays share the property of being efficiently stored in memory.
NumPy arrays can be added together, multiplied by a number, you can calculate, say, the sine of all their values in one function call, etc. As HYRY pointed out, they can also have more than one dimension. You cannot do this with Python arrays.
On the other hand, Python arrays can indeed be appended to. Note that NumPy arrays can however be concatenated together (hstack(), vstack(),…). That said, NumPy arrays are mostly meant to have a fixed number of elements.
It is common to first build a list (or a Python array) of values iteratively and then convert it to a NumPy array (with numpy.array(), or, more efficiently, with numpy.frombuffer(), as HYRY mentioned): this allows mathematical operations on arrays (or matrices) to be performed very conveniently (simple syntax for complex operations). Alternatively, numpy.fromiter() might be used to construct the array from an iterator. Or loadtxt() to construct it from a text file.
There are at least two main reasons for using NumPy arrays:
NumPy arrays require less space than Python lists. So you can deal with more data in a NumPy array (in-memory) than you can with Python lists.
NumPy arrays have a vast library of functions and methods unavailable
to Python lists or Python arrays.
Yes, you can not simply convert lists to NumPy arrays and expect your code to continue to work. The methods are different, the bool semantics are different. For the best performance, even the algorithm may need to change.
However, if you are looking for a Python replacement for Matlab, you will definitely find uses for NumPy. It is worth learning.
array.array can change size dynamically. If you are collecting data from some source, it's better to use array.array. But array.array is only one dimension, and there is no calculation functions to do with it. So, when you want to do some calculation with your data, convert it to numpy.ndarray, and use functions in numpy.
numpy.frombuffer can create a numpy.ndarray that shares the same data buffer with array.array objects, it's fast because it don't need to copy the data.
Here is a demo:
import numpy as np
import array
import random
a = array.array("d")
# a for loop that collects 10 channels data
for x in range(100):
a.extend([random.random() for _ in xrange(10)])
# create a ndarray that share the same data buffer with array a, and reshape it to 2D
na = np.frombuffer(a, dtype=float).reshape(-1, 10)
# call some numpy function to do the calculation
np.sum(na, axis=0)
Another great advantage of using NumPy arrays over built-in lists is the fact that NumPy has a C API that allows native C and C++ code to access NumPy arrays directly. Hence, many Python libraries written in low-level languages like C are expecting you to work with NumPy arrays instead of Python lists.
Reference: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Passing a numpy array to Delphi dll

I am trying to implement an image classification algorithm in Python. The problem is that python takes very long with looping through the array. That's why I decided to write a Delphi dll which performs the array-processing. My problem is that I don't know how to pass the multidimensional python-array to my dll-function.
Delphi dll extract: (I use this function only for testing)
type
TImgArray = array of array of Integer;
function count(a: TImgArray): Integer; cdecl;
begin
result:= high(a);
end;
relevant Python code:
arraydll = cdll.LoadLibrary("C:\\ArrayFunctions.dll")
c_int_p = ctypes.POINTER(ctypes.c_int32)
data = valBD.ReadAsArray()
data = data.astype(np.int32)
data_p = data.ctypes.data_as(c_int_p)
print arraydll.count(data_p)
The value returned by the dll-function is not the right one (it is 2816 instead of
7339). That's why I guess that there's somethin wrong with my type-conversion :(
Thanks in advance,
Mario
What you're doing won't work, and is likely to corrupt memory too.
A Delphi dynamic array is implemented under the hood as a data structure that holds some metadata about the array, including the length. But what you're passing to it is a C-style pointer-as-array, which is a pointer, not a Delphi dynamic array data structure. That structure is specific to Delphi and you can't use it in other languages. (Even if you did manage to implement the exact same structure in another language, you still couldn't pass it to a Delphi DLL and have it work right, because Delphi's memory manager is involved under the hood. Doing it this way is just asking for heap corruption and/or exceptions being raised by the memory manager.)
If you want to pass a C-style array into a DLL, you have to do the C way, by passing a second parameter that includes the length of the array. Your Python code should already know the length, and it shouldn't take any time to calculate it. You can definitely use Delphi to speed up image processing. But the array dimensions have to come from the Python side. There's no shortcut you can take here.
Your Delphi function declaration should look something like this:
type
TImgArray = array[0..MAXINT] of Integer;
PImgArray = ^TImgArray;
function ProcessSomething(a: PImgArray; size: integer): Integer; cdecl;
Normal Python arrays are normally called "Lists". A numpy.array type in Python is a special type that is more memory efficient than a normal Python list of normal Python floating point objects. Numpy.array wraps a standard block of memory that is accessed as a native C array type. This in turn, does NOT map cleanly to Delphi array types.
As David says, if you're willing to use C, this will all be easier. If you want to use Delphi and access Numpy.array, I suspect that the easiest way to do it would be to find a way to export some simple C functions that access the Numpy.array type. In C I would import the numpy headers, and then write functions that I can call from Pascal. Then I would import these functions from a DLL:
function GetNumpyArrayValue( arrayObj:Pointer; index:Integer):Double;
I haven't written any CPython wrapper code in a while. This would be easier if you wanted to simply access CORE PYTHON types from Delphi. The existing Python-for-delphi wrappers will help you. Using numpy with Delphi is just a lot more work.
Since you're only writing a DLL and not a whole application, I would seriously advise you forget about Delphi and write this puppy in plain C, which is what Python extensions (which is what you're writing) really should be written in.
In short, since you're writing a DLL, in Pascal, you're going to need at least another small DLL in C, just to bridge the types between the Python extension types (numpy.array) and the Python floating point values. And even then, you're not going to easily (quickly) get an array value you could read in Delphi as a native delphi array type.
The very fastest access mechanism I can think of is this:
type
PDouble = ^Double;
function GetNumpyArrayValue( arrayObj:Pointer; var doubleVector:PDouble; var doubleVectorLen:Integer):Boolean;
You could then use doubleVector (pointer) type math to access the underlying C array memory type.

Categories

Resources