Operate on Numpy array from C extension without memory copy

Operate on Numpy array from C extension without memory copy - python

I'm new to C extensions for NumPy and I'm wondering if the following workflow is possible.
Pre-allocate an array in NumPy
Pass this array to a C extension
Modify array data in-place in C
Use the updated array in Python with standard NumPy functions
In particular, I'd like to do this while ensuring I'm making zero new copies of the data at any step.
I'm familiar with boilerplate on the C side such as PyModuleDef, PyMethodDef, and the PyObject* arguments but a lot of examples I've seen involve coercion to C arrays which to my understanding involves copying and/or casting. I'm also aware of Cython though I don't know if it does similar coercions or copies under the hood. I'm specifically interested in simple indexed get- and set- operations on ndarray with numeric (eg. int32) values.
Could someone provide a minimal working example of creating a NumPy array, modifying it in-place in a C extension, and using the results in Python subsequently?

Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy
choosing between writing raw C module and using cython depends on the purpose of the module written.
if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).
however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).

Related

Numpy fixed size data types behind the scenes

I am just now making some experiments with the Python C api. I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64, that should be the equivalents of C int8_t and double. I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python. Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.

I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64
Perhaps you misunderstand what you see. numpy.int8 etc. are Python extension types wrapping instances of native data types. Some of their characteristics are reflective of that, but instances are still ordinary Python objects.
I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python.
Python code cannot work directly with native data types. Neither Numpy nor Cython enables such a thing, though they may give you that impression. You can create Python wrappers for native data types, as Numpy does, but that's more involved than you may suppose, and why reinvent the wheel? You can also implement Python methods in C via the C API or via Cython, but using them involves converting between Python objects and native values.
Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.
That would be a viable way to create wrappers such as I mentioned above. I don't see what you think is special about Cython extension types as applied to existing C data types, however. Presumably, the extension type would have a single member with the wanted native type:
cdef class CDouble:
cdef double value
def __init__(self, d):
self.value = d
# other methods ...
Of course, it's all the "other methods" that will be the main trick. There is no particular magic here; you'll need to write implementations for all the methods you need, including the methods for supporting Python arithmetic operators such as + (__add__() in that case). Before you start down this road, you might want to run this to get an idea of what Numpy's wrapper types provide:
import numpy
print(dir(numpy.int8(0)))
I get 136 methods, and that's pretty much all per-wrapper. You might not need as much for your purposes, but you really ought to think carefully about how much work you're considering taking on, and whether the benefit is worth the effort.

Sorting performance comparison between numpy array, python list, and Fortran

I have been using Fortran for my computational physics related work for a long time, and recently started learning and playing around with Python. I am aware of the fact that being an interpreted language Python is generally slower than Fortran for primarily CPU-intensive computational work. But then I thought using numpy would significantly improve the performance for a simple task like sorting.
So my test case was sorting an array/a list of size 10,000 containing random floats using bubble sort (just a test case with many array operations, so no need to comment on the performance of the algorithm itself). My timing results are as follows (all functions use identical algorithm):
Python3 (using numpy array, but my own function instead of numpy.sort): 33.115s
Python3 (using list): 9.927s
Fortran (gfortran) : 0.291s
Python3 (using numpy.sort): 0.269s (not a fair comparison, since it uses a different algorithm)
I was surprised that operating with numpy array is ~3 times slower than with python list, and ~100 times slower than Fortran. So at this point my questions are:
Why operating with numpy array is significantly slower than python list for this test case?
In case an algorithm that I need is not already implemented in scipy/numpy, and I need to write my own function within Python framework with best performance in mind, which data type I should operate with: numpy array or list?
If my applications are performance oriented, and I want to write functions with equivalent performance as in-built numpy functions (e.g. np.sort), what tools/framework I should learn/use?

You seem to be misunderstanding what NumPy does to speed up computations.
The speedup you get in NumPy does not come from NumPy using some smart way of saving data. Or compiling your Python code to C automatically.
Instead, NumPy implements many useful algorithms in C or Fortran, numpy.sort() being one of them. These functions understand np.ndarrays as input and loop over the data in a C/Fortran loop.
If you want to write fast NumPy code there are really three ways to do that:
Break down your code into NumPy operations (multiplications, dot-product, sort, broadcasting etc.)
Write the algorithm you want to implement in C/Fortran and also write bindings to Python that accept np.ndarrays (internally they're a contiguous array of the type you've chosen).
Use Numba to speed up your function by having Python Just-In-Time compile your code to machine code (with some limitations)

Why operating with numpy array is significantly slower than python list for this test case?
NumPy arrays are container for numerical data. They contain metadata (the type and shape of the array) and a block of memory for the data itself. As such, any operation on the elements of a NumPy array involve some overhead. Python lists are better optimized for "plain Python" code: reading or writing to a list element is faster than it is for a NumPy array. The benefit of NumPy array comes from "whole array operations" (so called array operations) and from compiled extensions. C/C++/Fortran, Cython or Numba can access the content of NumPy arrays without overhead.
In case an algorithm that I need is not already implemented in scipy/numpy, and I need to write my own function within Python
framework with best performance in mind, which data type I should
operate with: numpy array or list?
For numerical code, NumPy arrays are better: you can access their content "à la C" or "à la Fortran" in compiled extensions.
If my applications are performance oriented, and I want to write functions with equivalent performance as in-built numpy functions
(e.g. np.sort), what tools/framework I should learn/use?
There are many. You can write in C using the NumPy C-API (complicated, I wouldn't advise it but it is good to know that it exists). Cython is a mature "Python-like" and "C-like" language that enables an easy and progressive performance improvements. Numba is a drop-in "just in time" interpreter: provided that you restrict your code to direct numerical operations on NumPy arrays, Numba will convert the code on the fly to a compiled equivalent.

Boost python: passing large data structure to python

I'm currently embedding Python in my C++ program using boost/python in order to use matplotlib. Now I'm stuck at a point where I have to construct a large data structure, let's say a dense 10000x10000 matrix of doubles. I want to plot columns of that matrix and I figured that i have multiple options to do so:
Iterating and copying every value into a numpy array --> I don't want to do that for an obvious reason which is doubled memory consumption
Iterating and exporting every value into a file than importing it in python --> I could do that completely without boost/python and I don't think this is a nice way
Allocate and store the matrix in Python and just update the values from C++ --> But as stated here it's not a good idea to switch back and forth between the Python interpreter and my C++ program
Somehow expose the matrix to python without having to copy it --> All I can find on that matter is about extending Python with C++ classes and not embedding
Which of these is the best option concerning performance and of course memory consumption or is there an even better way of doing that kind of task.

To prevent copying in Boost.Python, one can either:
Use policies to return internal references
Allocate on the free store and use policies to have Python manage the object
Allocate the Python object then extract a reference to the array within C++
Use a smart pointer to share ownership between C++ and Python
If the matrix has a C-style contiguous memory layout, then consider using the Numpy C-API. The PyArray_SimpleNewFromData() function can be used to create an ndarray object thats wraps memory that has been allocated elsewhere. This would allow one to expose the data to Python without requiring copying or transferring each element between the languages. The how to extend documentation is a great resource for dealing with the Numpy C-API:
Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. [...] A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.
[...]
If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.
Also, while the plotting function may create copies of the array, it can do so within the C-API, allowing it to take advantage of the memory layout.
If performance is a concern, it may be worth considering the plotting itself:
taking a sample of the data and plotting it may be sufficient depending on the data distribution
using a raster based backend, such as Agg, will often out perform vector based backends on large datasets
benchmarking other tools that are designed for large data, such as Vispy

Altough Tanner's answer brought me a big step forward, I ended up using Boost.NumPy, an inofficial extension to Boost.Python that can easily be added. It wraps around the NumPy C API and makes it more save and easier to use.

Numpy matrix operations on custom C structures with overloaded operators

I'm working on a project which requires handling matrices of custom C structures, with some C functions implementing operations over these structures.
So far, we're proceeding as follows:
Build python wrapper classes around the C structures using ctypes
Override __and__ and __xor__ for that object calling the appropriate underlying C functions
Build numpy arrays containing instances of these classes
Now we are facing some performance issues, and I feel this is not a proper way to handle this thing because we have a C library implementing natively expensive operations on the data type, then numpy implementing natively expensive operations on matrixes, but (I guess) in this configuration every single operation will be proxyed by the python wrapper class.
Is there a way to implement this with numpy that make the operations fully native? I've read there are utilities for wrapping ctypes types into numpy arrays (see here), but what about the operator overloading?
We're not bound to using ctypes, but we would love to be able to still use python (which I believe has big advantages over C in terms of code maintainability...)
Does anyone know if this is possible, and how? Would you suggest other different solutions?
Thanks a lot!

To elaborate somewhat on the above comment (arrays of structs versus struct of arrays):
import numpy as np
#create an array of structs
array_of_structs = np.zeros((3,3), dtype=[('a', np.float64), ('b',np.int32)])
#we may as well think of it as a struct of arrays
struct_of_arrays = array_of_structs['a'], array_of_structs['b']
#this should print 8
print array_of_structs['b'].ctypes.data - array_of_structs['a'].ctypes.data
Note that working with a 'struct of arrays' approach requires a rather different coding style. That is, we largely have to let go of the OOP way of organizing our code. Naturally, you may still choose to place all operations relating to a particular object in a single namespace, but such a data layout encourages a more vectorized, or numpythonic coding style. Rather than passing single objects around between functions, we rather take the whole collection or matrix of objects to be the smallest atomic construct that we manipulate.
Of course, the same data may also be accessed in an OOP manner, where it is desirable. But that will not do you much good as far as numpy is concerned. Note that for performance reasons, it is often also preferable to maintain each attribute of your object as a separate contiguous array, since placing all attributes contiguously is cache-inefficient for operations which act on only a subset of attributes of the object.

Passing a numpy array to Delphi dll

I am trying to implement an image classification algorithm in Python. The problem is that python takes very long with looping through the array. That's why I decided to write a Delphi dll which performs the array-processing. My problem is that I don't know how to pass the multidimensional python-array to my dll-function.
Delphi dll extract: (I use this function only for testing)
type
TImgArray = array of array of Integer;
function count(a: TImgArray): Integer; cdecl;
begin
result:= high(a);
end;
relevant Python code:
arraydll = cdll.LoadLibrary("C:\\ArrayFunctions.dll")
c_int_p = ctypes.POINTER(ctypes.c_int32)
data = valBD.ReadAsArray()
data = data.astype(np.int32)
data_p = data.ctypes.data_as(c_int_p)
print arraydll.count(data_p)
The value returned by the dll-function is not the right one (it is 2816 instead of
7339). That's why I guess that there's somethin wrong with my type-conversion :(
Thanks in advance,
Mario

What you're doing won't work, and is likely to corrupt memory too.
A Delphi dynamic array is implemented under the hood as a data structure that holds some metadata about the array, including the length. But what you're passing to it is a C-style pointer-as-array, which is a pointer, not a Delphi dynamic array data structure. That structure is specific to Delphi and you can't use it in other languages. (Even if you did manage to implement the exact same structure in another language, you still couldn't pass it to a Delphi DLL and have it work right, because Delphi's memory manager is involved under the hood. Doing it this way is just asking for heap corruption and/or exceptions being raised by the memory manager.)
If you want to pass a C-style array into a DLL, you have to do the C way, by passing a second parameter that includes the length of the array. Your Python code should already know the length, and it shouldn't take any time to calculate it. You can definitely use Delphi to speed up image processing. But the array dimensions have to come from the Python side. There's no shortcut you can take here.
Your Delphi function declaration should look something like this:
type
TImgArray = array[0..MAXINT] of Integer;
PImgArray = ^TImgArray;
function ProcessSomething(a: PImgArray; size: integer): Integer; cdecl;

Normal Python arrays are normally called "Lists". A numpy.array type in Python is a special type that is more memory efficient than a normal Python list of normal Python floating point objects. Numpy.array wraps a standard block of memory that is accessed as a native C array type. This in turn, does NOT map cleanly to Delphi array types.
As David says, if you're willing to use C, this will all be easier. If you want to use Delphi and access Numpy.array, I suspect that the easiest way to do it would be to find a way to export some simple C functions that access the Numpy.array type. In C I would import the numpy headers, and then write functions that I can call from Pascal. Then I would import these functions from a DLL:
function GetNumpyArrayValue( arrayObj:Pointer; index:Integer):Double;
I haven't written any CPython wrapper code in a while. This would be easier if you wanted to simply access CORE PYTHON types from Delphi. The existing Python-for-delphi wrappers will help you. Using numpy with Delphi is just a lot more work.
Since you're only writing a DLL and not a whole application, I would seriously advise you forget about Delphi and write this puppy in plain C, which is what Python extensions (which is what you're writing) really should be written in.
In short, since you're writing a DLL, in Pascal, you're going to need at least another small DLL in C, just to bridge the types between the Python extension types (numpy.array) and the Python floating point values. And even then, you're not going to easily (quickly) get an array value you could read in Delphi as a native delphi array type.
The very fastest access mechanism I can think of is this:
type
PDouble = ^Double;
function GetNumpyArrayValue( arrayObj:Pointer; var doubleVector:PDouble; var doubleVectorLen:Integer):Boolean;
You could then use doubleVector (pointer) type math to access the underlying C array memory type.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.