I am trying to implement an image classification algorithm in Python. The problem is that python takes very long with looping through the array. That's why I decided to write a Delphi dll which performs the array-processing. My problem is that I don't know how to pass the multidimensional python-array to my dll-function.
Delphi dll extract: (I use this function only for testing)
type
TImgArray = array of array of Integer;
function count(a: TImgArray): Integer; cdecl;
begin
result:= high(a);
end;
relevant Python code:
arraydll = cdll.LoadLibrary("C:\\ArrayFunctions.dll")
c_int_p = ctypes.POINTER(ctypes.c_int32)
data = valBD.ReadAsArray()
data = data.astype(np.int32)
data_p = data.ctypes.data_as(c_int_p)
print arraydll.count(data_p)
The value returned by the dll-function is not the right one (it is 2816 instead of
7339). That's why I guess that there's somethin wrong with my type-conversion :(
Thanks in advance,
Mario
What you're doing won't work, and is likely to corrupt memory too.
A Delphi dynamic array is implemented under the hood as a data structure that holds some metadata about the array, including the length. But what you're passing to it is a C-style pointer-as-array, which is a pointer, not a Delphi dynamic array data structure. That structure is specific to Delphi and you can't use it in other languages. (Even if you did manage to implement the exact same structure in another language, you still couldn't pass it to a Delphi DLL and have it work right, because Delphi's memory manager is involved under the hood. Doing it this way is just asking for heap corruption and/or exceptions being raised by the memory manager.)
If you want to pass a C-style array into a DLL, you have to do the C way, by passing a second parameter that includes the length of the array. Your Python code should already know the length, and it shouldn't take any time to calculate it. You can definitely use Delphi to speed up image processing. But the array dimensions have to come from the Python side. There's no shortcut you can take here.
Your Delphi function declaration should look something like this:
type
TImgArray = array[0..MAXINT] of Integer;
PImgArray = ^TImgArray;
function ProcessSomething(a: PImgArray; size: integer): Integer; cdecl;
Normal Python arrays are normally called "Lists". A numpy.array type in Python is a special type that is more memory efficient than a normal Python list of normal Python floating point objects. Numpy.array wraps a standard block of memory that is accessed as a native C array type. This in turn, does NOT map cleanly to Delphi array types.
As David says, if you're willing to use C, this will all be easier. If you want to use Delphi and access Numpy.array, I suspect that the easiest way to do it would be to find a way to export some simple C functions that access the Numpy.array type. In C I would import the numpy headers, and then write functions that I can call from Pascal. Then I would import these functions from a DLL:
function GetNumpyArrayValue( arrayObj:Pointer; index:Integer):Double;
I haven't written any CPython wrapper code in a while. This would be easier if you wanted to simply access CORE PYTHON types from Delphi. The existing Python-for-delphi wrappers will help you. Using numpy with Delphi is just a lot more work.
Since you're only writing a DLL and not a whole application, I would seriously advise you forget about Delphi and write this puppy in plain C, which is what Python extensions (which is what you're writing) really should be written in.
In short, since you're writing a DLL, in Pascal, you're going to need at least another small DLL in C, just to bridge the types between the Python extension types (numpy.array) and the Python floating point values. And even then, you're not going to easily (quickly) get an array value you could read in Delphi as a native delphi array type.
The very fastest access mechanism I can think of is this:
type
PDouble = ^Double;
function GetNumpyArrayValue( arrayObj:Pointer; var doubleVector:PDouble; var doubleVectorLen:Integer):Boolean;
You could then use doubleVector (pointer) type math to access the underlying C array memory type.
Related
I'm new to C extensions for NumPy and I'm wondering if the following workflow is possible.
Pre-allocate an array in NumPy
Pass this array to a C extension
Modify array data in-place in C
Use the updated array in Python with standard NumPy functions
In particular, I'd like to do this while ensuring I'm making zero new copies of the data at any step.
I'm familiar with boilerplate on the C side such as PyModuleDef, PyMethodDef, and the PyObject* arguments but a lot of examples I've seen involve coercion to C arrays which to my understanding involves copying and/or casting. I'm also aware of Cython though I don't know if it does similar coercions or copies under the hood. I'm specifically interested in simple indexed get- and set- operations on ndarray with numeric (eg. int32) values.
Could someone provide a minimal working example of creating a NumPy array, modifying it in-place in a C extension, and using the results in Python subsequently?
Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy
choosing between writing raw C module and using cython depends on the purpose of the module written.
if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).
however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).
I am just now making some experiments with the Python C api. I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64, that should be the equivalents of C int8_t and double. I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python. Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.
I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64
Perhaps you misunderstand what you see. numpy.int8 etc. are Python extension types wrapping instances of native data types. Some of their characteristics are reflective of that, but instances are still ordinary Python objects.
I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python.
Python code cannot work directly with native data types. Neither Numpy nor Cython enables such a thing, though they may give you that impression. You can create Python wrappers for native data types, as Numpy does, but that's more involved than you may suppose, and why reinvent the wheel? You can also implement Python methods in C via the C API or via Cython, but using them involves converting between Python objects and native values.
Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.
That would be a viable way to create wrappers such as I mentioned above. I don't see what you think is special about Cython extension types as applied to existing C data types, however. Presumably, the extension type would have a single member with the wanted native type:
cdef class CDouble:
cdef double value
def __init__(self, d):
self.value = d
# other methods ...
Of course, it's all the "other methods" that will be the main trick. There is no particular magic here; you'll need to write implementations for all the methods you need, including the methods for supporting Python arithmetic operators such as + (__add__() in that case). Before you start down this road, you might want to run this to get an idea of what Numpy's wrapper types provide:
import numpy
print(dir(numpy.int8(0)))
I get 136 methods, and that's pretty much all per-wrapper. You might not need as much for your purposes, but you really ought to think carefully about how much work you're considering taking on, and whether the benefit is worth the effort.
Say there is a C++ class in which we would like to define a function to be called in python. On the python side the goal is being able to call this function with:
Input: of type 2D numpy-array(float32), or list of lists, or other suggestions
Output: of type 2D numpy-array(float32), or list of lists, or other suggestions
and if it helps latency/simplicity 1D array is also ok.
One would for example define a function in the header with:
bool func(const std::string& name);
which has string type as input and bool as output.
What can be a good choice with the above requirements to write in the header?
And finally, after the header file, what should be written in the pyx/pyd file for Cython?
Cython Input
The most natural Cython type to use for the input interface between Python and Cython would be a 2D typed memoryview. This will take a 2D numpy array, as well as any other 2D array type that exports the buffer interface (there aren't too many other types since Numpy is pretty ubiquitous, but some image-handling libraries have some alternatives).
I'd avoid using list-of-lists as an interface - the length of the second dimension is poorly defined. However, Numpy arrays are easily created from list-of-lists.
Cython Output
For output you return either a Cython memoryview, or a Numpy array (easily created from a memoryview with np.asarray(memview)). I'd probably return a Numpy array, but make a decision based on whether you want to make Numpy a hard dependency.
C++ interface
This is very difficult to answer without knowing about your code. If you have existing code you should just use the type that's most natural to that if at all possible.
You can get a pointer from your memoryview with &memview[0,0], and access its attributes .shape and .strides to get information about how the data is stored. (If you make the memoryview contiguous then you know strides from shape so it's simpler). You then need to decided whether to copy the data, or just use a pointer to the Python-owned data (if C++ only keeps the data for the duration of the function call then using a pointer is good).
Similar considerations apply to the output data, but it's hard to know without knowing what you're trying to do in C++.
I'm working on a project which requires handling matrices of custom C structures, with some C functions implementing operations over these structures.
So far, we're proceeding as follows:
Build python wrapper classes around the C structures using ctypes
Override __and__ and __xor__ for that object calling the appropriate underlying C functions
Build numpy arrays containing instances of these classes
Now we are facing some performance issues, and I feel this is not a proper way to handle this thing because we have a C library implementing natively expensive operations on the data type, then numpy implementing natively expensive operations on matrixes, but (I guess) in this configuration every single operation will be proxyed by the python wrapper class.
Is there a way to implement this with numpy that make the operations fully native? I've read there are utilities for wrapping ctypes types into numpy arrays (see here), but what about the operator overloading?
We're not bound to using ctypes, but we would love to be able to still use python (which I believe has big advantages over C in terms of code maintainability...)
Does anyone know if this is possible, and how? Would you suggest other different solutions?
Thanks a lot!
To elaborate somewhat on the above comment (arrays of structs versus struct of arrays):
import numpy as np
#create an array of structs
array_of_structs = np.zeros((3,3), dtype=[('a', np.float64), ('b',np.int32)])
#we may as well think of it as a struct of arrays
struct_of_arrays = array_of_structs['a'], array_of_structs['b']
#this should print 8
print array_of_structs['b'].ctypes.data - array_of_structs['a'].ctypes.data
Note that working with a 'struct of arrays' approach requires a rather different coding style. That is, we largely have to let go of the OOP way of organizing our code. Naturally, you may still choose to place all operations relating to a particular object in a single namespace, but such a data layout encourages a more vectorized, or numpythonic coding style. Rather than passing single objects around between functions, we rather take the whole collection or matrix of objects to be the smallest atomic construct that we manipulate.
Of course, the same data may also be accessed in an OOP manner, where it is desirable. But that will not do you much good as far as numpy is concerned. Note that for performance reasons, it is often also preferable to maintain each attribute of your object as a separate contiguous array, since placing all attributes contiguously is cache-inefficient for operations which act on only a subset of attributes of the object.
I'm fairly sure I'm not looking in the right places, because information on this topic has been remarkably hard to come by.
I have a small function in C++ that Python will invoke and that create a small array of floats and return them to Python.
First, it appears that you cannot use Py_BuildValue() to return an array of arbitrary size in C++ as a list in Python (it is unclear to me why this should be). An old but still-relevant post here suggests instantiating a PyList object, populating it with elements from the array, and then returning that instead.
Which is an acceptable solution. However, my numbers are C++ style floats. While the Python library provides ample conversion operations (C string -> Python string, C double -> Python float, etc), I can't find a means to simply convert a C++ float to a Python float. I know Python floats are equivalent to C/C++ doubles, so I suppose I could cast the floats to doubles and then to a PyObject via PyFloat_FromDouble() but I feel there must exist a more direct way of doing this.
Because this is an exceedingly short function, and largely a proof of concept, I did not feel it should be necessary to take the time to learn SWIG or Boost Python or somesuch; I'd like to do this with the built-in Python/C++ API. Any suggestions would be greatly appreciated!
Since a Python float is essentially a C double, the conversion has to happen at some point or other. Quoting the reference:
Python does not support single-precision floating point numbers; the savings in processor and memory usage that are usually the reason for using these is dwarfed by the overhead of using objects in Python, so there is no reason to complicate the language with two kinds of floating point numbers.
So convert to double (possibly simply by passing the value to a function which accepts double) and you are set.
As the text also indicates, floating point values in python are objects, which come with quite a bit of overhead. PyFloat_FromDouble will take care of getting that right. So going through that function is the right thing to do. The same holds for list creation.