Very slow Numpy buffer pointer access

Very slow Numpy buffer pointer access - python

I'm trying to get the pointer to a Numpy array so that I can manipulate it quickly in my Cython code. I found two ways of getting the buffer's pointer, one using array.__array_interface__['data'][0] and the other with array.ctypes.data. They are both painfully slow.
I have created a small Cython class that simply creates a numpy array and stores the pointer to its buffer:
cdef class ArrayHolder:
cdef array
cdef long *ptr
def __init__(ArrayHolder self, allocate=True):
self.array = np.zeros((4, 12,), dtype=np.int)
cdef long ptr = self.array.__array_interface__['data'][0]
self.ptr = <long *>ptr
Then, back in Python, I create multiple instances of this class, like so:
for i in range(1000000):
holder = ArrayHolder()
This takes around 3.6 seconds. Using array.ctypes.data is half a second slower .
When I comment out the call to __array_instance__['data'] and run the code again, it completes in around 1 second.
Why is obtaining the address of the Numpy array buffer so slow?

This can be helped a lot by using Cython's static typing mechanisms. That way Cython is aware that what you're dealing with is an appropriate type of array array, and can generate optimised C code.
cimport numpy as np # just so it knows np.int_t
cdef class ArrayHolder:
cdef np.int_t[:,:] array # now specified as a specific array type
cdef np.int_t *ptr # note I've changed this to match the array type
def __init__(ArrayHolder self, allocate=True):
self.array = np.zeros((4, 12,), dtype=np.int)
self.ptr = &self.array[0,0] # location of the first element
In this version there's a small cost at the assignment of self.array to check that the object is in fact an array. However, element lookup and taking the address are now around as fast as using a C pointer.
In your old version it was an arbitrary python object, so there was a dictionary lookup for __array_instance__, a dictionary lookup for __getitem__ to allow a dictionary lookup for data. A further dictionary lookup for __getitem__ to allow to you find index 0.
One thing to note: if you've used cdef to tell Cython the array type, you can do all your indexing directly on the array and it'll be pretty much type same speed as using the pointer, so you can probably skip creating the pointer entirely (unless you need it to pass to external C code). Turn off boundscheck and wraparound for the last little bit of speed.

I'm guessing, it's some sort of lazy-loading. Numpy only does the memset() on the table when you first access it. I would try to create this array without filling it with zeros to gain time.
Here's my test:
import numpy as np
cdef class ArrayHolder:
cdef array
cdef long *ptr
def __init__(ArrayHolder self, allocate=True):
self.array = np.zeros((4, 12,), dtype=np.int)
def ptr(ArrayHolder self):
cdef long ptr = self.array.__array_interface__['data'][0]
from timeit import timeit
from cyth import ArrayHolder
print(timeit("ArrayHolder()", number=1000000, setup="from cyth import ArrayHolder"))
print(timeit("ArrayHolder().ptr()", number=1000000, setup="from cyth import ArrayHolder"))
$ python test.py
1.0442328620702028
3.4246508290525526

Related

Cython: wrap a string vector with PyArray_SimpleNewFromData

[This question has been edited as the presented code contained issues not related to the actual problem]
I have implemented a generic function that shall convert c++ pointers to numpy arrays:
cdef np.ndarray pointer_to_array(void *ptr, np.npy_intp N, int np_type):
cdef np.ndarray arr = np.PyArray_SimpleNewFromData(1, &N, np_type, ptr)
return arr
Here, *ptr is a pointer to the underlying data, N is the size of the array and np_type is the numpy code for the type. The function works well for types with fixed size such as double.
However, I would like to apply the function to some kind of string array (e.g. dtype('<U10')). So what I try is
# mymodule.pyx
cdef get_v():
# `v` is a std::vector[std::string] defined in an
# external header file and is assumed to persist over the
# lifetime of the module
return pointer_to_array(v.data(), v.size(), np.NPY_STRING)
However, I obtain a ValueError: data type must provide an itemsize. This makes sense, since numpy needs to know the size of the string. How can I pass this information to PyArray_SimpleNewFromData? Or is there another way to wrap c++ arrays of strings with numpy arrays?
Further code needed for the example
# mymodule.pxd
from libcpp.vector cimport vector
from libcpp.string cimport string
cdef extern from "mymodule_cpp.h":
vector[string] v
# mymodule_cpp.h
#include <vector>
#include <string>
std::vector[std::string] v(10, "Hello.");

How to return or save large malloc'd arrays in Cython as Python objects?

I want to create a large number of simulated samples from a model using Cython that I need to analyze later using Python. The result of one run of my simulation script should be a 10000 x 10000 array.
I have defined a function using def and tried to declare my arrays as cdef int my_array[10000][10000]. The my_script.pyx file compile correctly but when I run the script I got a "segmentation fault" error (I am on Linux).
Looking for a solution, I have learned that this issue is caused by allocating memory on the stack instead of the heap so I decided to use PyMem_Malloc to allocate the memory. Here's kind of a minimum version of what I'm trying to do:
import cython
from cpython.mem cimport PyMem_Malloc
from libc.stdlib cimport rand, srand, RAND_MAX
srand(time(NULL))
def my_array_func(int a_param)
cdef int i
cdef int **my_array = <int **>PyMem_Malloc(sizeof(int *) * 10000)
for i in range(10000):
my_array[i] = <int *>PyMem_Malloc(sizeof(int) * 10000)
cdef int j
cdef int k
for j in range(10000):
for k in range(10000):
my_array[j][k] = <float>rand()/RAND_MAX * a_param
return my_array
When I try to compile this file, I got an error Cannot convert 'int **' to Python object which makes sense because my_array is not properly an array so I guess it cannot be returned as a Python object (sorry, my knowledge of C is really really rusty).
Is there a way to let the function return my 2D array such that it can be used as input to other Python functions? Another more than welcome solution might be to directly save the array in a file that can be imported later by a Python script.
Thanks.

In line with #DavidW 's comment, when matrix computations are involved in Cython it is advisable to use numpy arrays to own the memory and to live in pythonland.
In your case, it would look like this:
import cython
cimport numpy as np
import numpy as np
from libc.stdlib cimport rand, srand, RAND_MAX
from libc.time cimport time
srand(time(NULL))
def my_array_func(int a_param):
cdef int n_rows=10000, ncols=10000
# Mem alloc + Python object owning memory
cdef np.ndarray[dtype=int, ndim=2] my_array = np.empty((n_rows,ncols), dtype=int)
# Memoryview: iterate over my_array at C speed
cdef int[:,::1] my_array_view = my_array
# Fill array
cdef int i, j
for i in range(n_rows):
for j in range(ncols):
my_array_view[i,j] = <int> (rand()/RAND_MAX * a_param)
return my_array
Allocating an empty chunk of memory with defined size, making sure it is owned by a Python object and has all the nice array properties (like .shape) is what you get in a single line with the cdef np.ndarray[.... Looping over this array can be done with no Python interaction by using a memoryview.

How to convert a 2D numpy array into an array of pointers in Cython? [duplicate]

I have some C code that has the following declaration:
int myfunc(int m, int n, const double **a, double **b, double *c);
So a is a constant 2D array, b is a 2D array, and c is a 1D array, all dynamically allocated. b and c do not need to be anything specifically before they are passed to myfunc, and should be understood as output information. For the purposes of this question, I'm not allowed to change the declaration of myfunc.
Question 1: How do I convert a given numpy array a_np into an array a with the format required by this C function, so that I can call this C function in Cython with a?
Question 2: Are the declarations for b and c below correct, or do they need to be in some other format for the C function to understand them as a 2D and 1D array (respectively)?
My attempt:
myfile.pxd
cdef extern from "myfile.h":
int myfunc(int p, int q, const double **a, double **b, double *c)
mytest.pyx
cimport cython
cimport myfile
import numpy as np
cimport numpy as np
p = 3
q = 4
cdef:
double** a = np.random.random([p,q])
double** b
double* c
myfile.myfunc(p, q, a, b, c)
Then in iPython I run
import pyximport; pyximport.install()
import mytest
The line with the definition of a gives me the error message Cannot convert Python object to 'double **'. I don't get any error messages regarding b or c, but since I'm unable to run the C function at this time, I'm not sure the declarations of b and c are written correctly (that is, in a way that will enable the C function to output a 2D and a 1D array, respectively).
Other attempts: I've also tried following the solution here, but this doesn't work with the double-asterisk type of arrays I have in the myfunc declaration. The solution here does not apply to my task because I can't change the declaration of myfunc.

Create a helper array in cython
To get a double** from a numpy array, you can create a helper-array of pointers in your *.pyx file. Further more, you have to make sure that the numpy array has the correct memory layout. (It might involve creating a copy)
Fortran order
If your C-function expects fortran order (all x-coordinates in one list, all y coordinates in another list, all z-coordinates in a third list, if your array a corresponds to a list of points in 3D space)
N,M = a.shape
# Make sure the array a has the correct memory layout (here F-order)
cdef np.ndarray[double, ndim=2, mode="fortran"] a_cython =
np.asarray(a, dtype = float, order="F")
#Create our helper array
cdef double** point_to_a = <double **>malloc(M * sizeof(double*))
if not point_to_a: raise MemoryError
try:
#Fillup the array with pointers
for i in range(M):
point_to_a[i] = &a_cython[0, i]
# Call the C function that expects a double**
myfunc(... &point_to_a[0], ...)
finally:
free(point_to_a)
C-order
If your C-function expects C-order ([x1,y1,z1] is the first list, [x2,y2,z2] the second list for a list of 3D points):
N,M = a.shape
# Make sure the array a has the correct memory layout (here C-order)
cdef np.ndarray[double, ndim=2, mode="c"] a_cython =
np.asarray(a, dtype = float, order="C")
#Create our helper array
cdef double** point_to_a = <double **>malloc(N * sizeof(double*))
if not point_to_a: raise MemoryError
try:
for i in range(N):
point_to_a[i] = &a_cython[i, 0]
# Call the C function that expects a double**
myfunc(... &point_to_a[0], ...)
finally:
free(point_to_a)

Reply 1: You can pass NumPy array via Cython to C using the location of the start of the array (see code below).
Reply 2: Your declarations seem correct but I don't use this approach of explicit memory management. You can use NumPy to declare cdef-ed arrays.
Use
cdef double[:,::1] a = np.random.random([p, q])
cdef double[:,::1] b = np.empty([p, q])
cdef double[::1] b = np.empty(q)
Then pass &a[0], the location of the start of the array, to your C function. The ::1 is to ensure contiguousness.
A good reference for this is Jake Vanderplas' blog: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/
Finally, typically one creates functions in Cython and calls them in Python, so your Python code would be:
import pyximport; pyximport.install()
import mytest
mytest.mywrappedfunc()
where mywrappedfunc is a Python (def and not cdef) function defined in the module that can do the array declaration show above.

Pass Numpy array to C 2D array in pointer to pointer format using Cython [duplicate]

I have some C code that has the following declaration:
int myfunc(int m, int n, const double **a, double **b, double *c);
So a is a constant 2D array, b is a 2D array, and c is a 1D array, all dynamically allocated. b and c do not need to be anything specifically before they are passed to myfunc, and should be understood as output information. For the purposes of this question, I'm not allowed to change the declaration of myfunc.
Question 1: How do I convert a given numpy array a_np into an array a with the format required by this C function, so that I can call this C function in Cython with a?
Question 2: Are the declarations for b and c below correct, or do they need to be in some other format for the C function to understand them as a 2D and 1D array (respectively)?
My attempt:
myfile.pxd
cdef extern from "myfile.h":
int myfunc(int p, int q, const double **a, double **b, double *c)
mytest.pyx
cimport cython
cimport myfile
import numpy as np
cimport numpy as np
p = 3
q = 4
cdef:
double** a = np.random.random([p,q])
double** b
double* c
myfile.myfunc(p, q, a, b, c)
Then in iPython I run
import pyximport; pyximport.install()
import mytest
The line with the definition of a gives me the error message Cannot convert Python object to 'double **'. I don't get any error messages regarding b or c, but since I'm unable to run the C function at this time, I'm not sure the declarations of b and c are written correctly (that is, in a way that will enable the C function to output a 2D and a 1D array, respectively).
Other attempts: I've also tried following the solution here, but this doesn't work with the double-asterisk type of arrays I have in the myfunc declaration. The solution here does not apply to my task because I can't change the declaration of myfunc.

Create a helper array in cython
To get a double** from a numpy array, you can create a helper-array of pointers in your *.pyx file. Further more, you have to make sure that the numpy array has the correct memory layout. (It might involve creating a copy)
Fortran order
If your C-function expects fortran order (all x-coordinates in one list, all y coordinates in another list, all z-coordinates in a third list, if your array a corresponds to a list of points in 3D space)
N,M = a.shape
# Make sure the array a has the correct memory layout (here F-order)
cdef np.ndarray[double, ndim=2, mode="fortran"] a_cython =
np.asarray(a, dtype = float, order="F")
#Create our helper array
cdef double** point_to_a = <double **>malloc(M * sizeof(double*))
if not point_to_a: raise MemoryError
try:
#Fillup the array with pointers
for i in range(M):
point_to_a[i] = &a_cython[0, i]
# Call the C function that expects a double**
myfunc(... &point_to_a[0], ...)
finally:
free(point_to_a)
C-order
If your C-function expects C-order ([x1,y1,z1] is the first list, [x2,y2,z2] the second list for a list of 3D points):
N,M = a.shape
# Make sure the array a has the correct memory layout (here C-order)
cdef np.ndarray[double, ndim=2, mode="c"] a_cython =
np.asarray(a, dtype = float, order="C")
#Create our helper array
cdef double** point_to_a = <double **>malloc(N * sizeof(double*))
if not point_to_a: raise MemoryError
try:
for i in range(N):
point_to_a[i] = &a_cython[i, 0]
# Call the C function that expects a double**
myfunc(... &point_to_a[0], ...)
finally:
free(point_to_a)

Reply 1: You can pass NumPy array via Cython to C using the location of the start of the array (see code below).
Reply 2: Your declarations seem correct but I don't use this approach of explicit memory management. You can use NumPy to declare cdef-ed arrays.
Use
cdef double[:,::1] a = np.random.random([p, q])
cdef double[:,::1] b = np.empty([p, q])
cdef double[::1] b = np.empty(q)
Then pass &a[0], the location of the start of the array, to your C function. The ::1 is to ensure contiguousness.
A good reference for this is Jake Vanderplas' blog: https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/
Finally, typically one creates functions in Cython and calls them in Python, so your Python code would be:
import pyximport; pyximport.install()
import mytest
mytest.mywrappedfunc()
where mywrappedfunc is a Python (def and not cdef) function defined in the module that can do the array declaration show above.

Cython function with variable sized matrix input

I am trying to convert part of a native python function to cython to improve the compute time. I would like to write a cython function just for the loop component that is taking up the time (as ipython lprun kindly told me). However this function takes in variably sized matrices .. and I can't see how to bring that across easily to statically typed cython.
for index1 in range(0,num_products):
for index2 in range(0,num_products):
cond_prob = (data[index1] * data[index2]).sum() / max(col_sums[index1], col_sums[index2])
prox[index1][index2] = cond_prob
This issue is that num_products changes year to year, so the matrix (data) size is variable.
What is the best strategy here?
Should I write two C functions. One to create a matrix of a certain dimension using memalloc, and then One to do the loops over the created matrix?
Is there some fancy cython/numpy wizardry to help in this scenario? Can I write a C function that takes in a variably sized Numpy Array in memory and pass the size?

Cython code is (strategically) statically typed, but that doesn't mean that arrays must have a fixed size. In straight C passing a multidimensional array to a function can be a little awkward maybe, but in Cython you should be able to do something like the following:
Note I took the function and variable names from your follow-up question.
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.cdivision(True)
def cooccurance_probability_cy(double[:,:] X):
cdef int P, i, j, k
P = X.shape[0]
cdef double item
cdef double [:] CS = np.sum(X, axis=1)
cdef double [:,:] D = np.empty((P, P), dtype=np.float)
for i in range(P):
for j in range(P):
item = 0
for k in range(P):
item += X[i,k] * X[j,k]
D[i,j] = item / max(CS[i], CS[j])
return D
On the other hand, using just Numpy should also be quite fast for this problem, if you use the right functions and some broadcasting. In fact, as the calculation complexity is dominated by the matrix multiplication, I found the following is much faster than the Cython code above (np.inner uses a highly optimized BLAS routine):
def new(X):
CS = np.sum(X, axis=1, keepdims=True)
D = np.inner(X,X) / np.maximum(CS, CS.T)
return D

Have you tried getting rid of the for loops in numpy?
for the first part of your equation you could for example try:
(data[ np.newaxis,:] * data[:,np.newaxis]).sum(2)
if memory is an issue you can also use the np.einsum() function.
For the second part one could probably also cook up a numpy expression (bit more difficult) if you've not already tried that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Very slow Numpy buffer pointer access - python

Related

Cython: wrap a string vector with PyArray_SimpleNewFromData

How to return or save large malloc'd arrays in Cython as Python objects?

How to convert a 2D numpy array into an array of pointers in Cython? [duplicate]

Pass Numpy array to C 2D array in pointer to pointer format using Cython [duplicate]

Cython function with variable sized matrix input

Categories

Resources