Creating 2D/3D C arrays in cython - python

Can anyone enlighten me how to pass a 2D array created in cython to a cdef function? I can do that with 1D array, but
not with 2D (or higher), let me illustrate the situation:
This is the C code that I would like to reproduce in cython:
#include <stdio.h>
void print_my_1Darray();
void print_my_2Darray();
int main(void){
int arr1D[] = {1,2,3,4,5,6,7,8,9,10,11,12};
int arr2D[3][4] = {{1,2,3,4},{5,6,7,8},{9,10,11,12}};
print_my_1Darray(arr1D);
printf("\n");
print_my_2Darray(arr2D);
printf("\n");
return 0;
}
void print_my_1Darray( int x[] ){
int i;
for(i=0; i < 12; i++){
printf("c[%d] = %d\n",i, x[i]);
}
}
void print_my_2Darray( int x[3][4] ){
int i, j;
for(i=0; i < 3; i++){
for(j=0; j < 4; j++){
printf("c[%d][%d] = %d\n",j, i, x[i][j]);
}
}
}
And then if I try to reproduce this in Cython like this:
cimport cython
import numpy as np
cimport numpy as cnp
def testfunc():
cdef int *arr1D = [1,2,3,4,5,6,7,8,9,10,11,12]
print_my_1D_array(arr1D)
cdef int *arr2D = [[1,2,3,4], [5,6,7,8], [9,10,11,12]] # <-- WRONG!
print_my_2D_array(arr2D)
cdef void print_my_1D_array(int c_arr[12]):
cdef int i
for i in range(4):
print c_arr[i]
cdef void print_my_2D_array(int c_arr[3][4]):
cdef int i, j
for i in range(3):
for j in range(4):
print c_arr[i][j]
and when I compile this pyx script I get the error:
cdef int *arr2D = [[1,2,3,4][5,6,7,8][9,10,11,12]]
print_my_2D_array(arr2D)
^
------------------------------------------------------------
test2.pyx:18:27: Cannot assign type 'int *' to 'int (*)[4]'
It seems that I can create something with the
"cdef int *arr2D = [[1,2,3,4][5,6,7,8][9,10,11,12]]"
line and it compiles ok until I try to pass it to a function or simply print it's members...
Can anyone explain what's happening there and how to create pure-c 2D/3D arrays in cython and how to pass them to c-level functions? Also, I am trying to avoid numpy arrays there to avoid python overhead, as my code will require very fast calculations on arrays.

You need to make your input array static:
cdef int arr2D[3][4]
arr2D[0][:] = [1, 2, 3, 4]
arr2D[1][:] = [5, 6, 7, 8]
arr2D[2][:] = [9, 10, 11, 12]

Related

Python : Call by Reference

I'm new to python. Can anyone help me to understand the call by reference in python.
#include <stdio.h>
#include <conio.h>
#include <malloc.h>
void rd(float *a, int *n)
{
int i;
for (i=1;i<= *n;i++) {
printf("Enter element %d: ",
i); scanf("%f", &a[i]);
}
}
float sum(float *a, int *n)
{
int i; float s=0;
for (i=1 ; i <= *n ; i++) s = s +
a[i]; return s;
}
int main(void)
{
int size; float *x, g;
printf("Give size of array: "); scanf("%d", &size);
x = (float *)malloc(size*sizeof(float)); // dynamic memory allocation
printf("\n");
rd(x, &size); // passing the addresses
g = sum(x, &size); // passing the addresses
printf("\nSum of elements = %f\n", g);
printf("\nDONE ! Hit any key ...");
getch(); return 0;
}
This is C example i trying to solve in python. Any help would be appreciated.
In python there is no way to pass "the address" of a "place" (a variable, an array element, a dictionary value or an instance member).
The only way to provide other code the ability to change a place is to provide a "path" to reach it (e.g. the variable name, the array and the index and so on). As a very strange alternative (not used often in Python) you can pass a "writer" function that will change the place... for example:
def func(a, b, placeWriter):
placeWriter(a + b)
def caller():
mylist = [1, 2, 3, 4]
def writer(x):
mylist[3] = x
func(10, 20, writer)
Much more common instead is writing functions that simply return the needed values; note that in Python returning multiple values is trivial while in C this is not supported and passing addresses is used instead:
def func(): # void f(int *a, int *b, int *c) {
return 1, 2, 3 # *a=1; *b=2; *c=3;
# }
def caller(): # void caller() { int a, b, c;
a, b, c = func() # func(&a, &b, &c);
...

Passing a numpy array to C++

I have some code writen in Python for which the output is a numpy array, and now I want to send that output to C++ code, where the heavy part of the calculations will be performed.
I have tried using cython's public cdef, but I am running on some issues. I would appreciate your help! Here goes my code:
pymodule.pyx:
from pythonmodule import result # result is my numpy array
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cdef public void cfunc():
print 'I am in here!!!'
cdef np.ndarray[np.float64_t, ndim=2, mode='c'] res = result
print res
Once this is cythonized, I call:
pymain.c:
#include <Python.h>
#include <numpy/arrayobject.h>
#include "pymodule.h"
int main() {
Py_Initialize();
initpymodule();
test(2);
Py_Finalize();
}
int test(int a)
{
Py_Initialize();
initpymodule();
cfunc();
return 0;
}
I am getting a NameError for the result variable at C++. I have tried defining it with pointers and calling it indirectly from other functions, but the array remains invisible. I am pretty sure the answer is quite simple, but I just do not get it. Thanks for your help!
Short Answer
The NameError was cause by the fact that Python couldn't find the module, the working directory isn't automatically added to your PYTHONPATH. Using setenv with setenv("PYTHONPATH", ".", 1); in your C/C++ code fixes this.
Longer Answer
There's an easy way to do this, apparently. With a python module pythonmodule.py containing an already created array:
import numpy as np
result = np.arange(20, dtype=np.float).reshape((2, 10))
You can structure your pymodule.pyx to export that array by using the public keyword. By adding some auxiliary functions, you'll generally won't need to touch neither the Python, nor the Numpy C-API:
from pythonmodule import result
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
cdef public np.ndarray getNPArray():
""" Return array from pythonmodule. """
return <np.ndarray>result
cdef public int getShape(np.ndarray arr, int shape):
""" Return Shape of the Array based on shape par value. """
return <int>arr.shape[1] if shape else <int>arr.shape[0]
cdef public void copyData(float *** dst, np.ndarray src):
""" Copy data from src numpy array to dst. """
cdef float **tmp
cdef int i, j, m = src.shape[0], n=src.shape[1];
# Allocate initial pointer
tmp = <float **>malloc(m * sizeof(float *))
if not tmp:
raise MemoryError()
# Allocate rows
for j in range(m):
tmp[j] = <float *>malloc(n * sizeof(float))
if not tmp[j]:
raise MemoryError()
# Copy numpy Array
for i in range(m):
for j in range(n):
tmp[i][j] = src[i, j]
# Assign pointer to dst
dst[0] = tmp
Function getNPArray and getShape return the array and its shape, respectively. copyData was added in order to just extract the ndarray.data and copy it so you can then finalize Python and work without having the interpreter initialized.
A sample program (in C, C++ should look identical) would look like this:
#include <Python.h>
#include "numpy/arrayobject.h"
#include "pyxmod.h"
#include <stdio.h>
void printArray(float **arr, int m, int n);
void getArray(float ***arr, int * m, int * n);
int main(int argc, char **argv){
// Holds data and shapes.
float **data = NULL;
int m, n;
// Gets array and then prints it.
getArray(&data, &m, &n);
printArray(data, m, n);
return 0;
}
void getArray(float ***data, int * m, int * n){
// setenv is important, makes python find
// modules in working directory
setenv("PYTHONPATH", ".", 1);
// Initialize interpreter and module
Py_Initialize();
initpyxmod();
// Use Cython functions.
PyArrayObject *arr = getNPArray();
*m = getShape(arr, 0);
*n = getShape(arr, 1);
copyData(data, arr);
if (data == NULL){ //really redundant.
fprintf(stderr, "Data is NULL\n");
return ;
}
Py_DECREF(arr);
Py_Finalize();
}
void printArray(float **arr, int m, int n){
int i, j;
for(i=0; i < m; i++){
for(j=0; j < n; j++)
printf("%f ", arr[i][j]);
printf("\n");
}
}
Always remember to set:
setenv("PYTHONPATH", ".", 1);
before you call Py_Initialize so Python can find modules in the working directory.
The rest is pretty straight-forward. It might need some additional error-checking and definitely needs a function to free the allocated memmory.
Alternate Way w/o Cython:
Doing it the way you are attempting is way hassle than it's worth, you would probably be better off using numpy.save to save your array in a npy binary file and then use some C++ library that reads that file for you.

How to create empty char arrays in Cython without loops

Well, this seems easy, but I can't find a single reference on the web. In C we can create a char array of n null-characters as follows:
char arr[n] = "";
But when I try to do the same in Cython with
cdef char arr[n] = ""
I get this compilation error:
Error compiling Cython file:
------------------------------------------------------------
...
cdef char a[n] = ""
^
------------------------------------------------------------
Syntax error in C variable declaration
Obviously Cython doesn't allow to declare arrays this way, but is there an alternative? I don't want to manually set each item in the array, that is I'm not looking for something like this
cdef char a[10]
for i in range(0, 10, 1):
a[i] = b"\0"
You don't have to set each element to make a length-zero C string. It is sufficient to just zero the first element:
cdef char arr[n]
arr[0] = 0
Next, if you want to zero the whole char array, use memset
from libc.string cimport memset
cdef char arr[n]
memset(arr, 0, n)
And if C purists complain about the 0 instead of '\0', note that the '\0' is a Python string (unicode in Python 3) in Cython. '\0' is not a C char in Cython! memset expects an integer value for its second argument, not a Python string.
If you really want to know the int value of a C '\0' in Cython, you must write a helper function in C:
/* zerochar.h */
static int zerochar()
{
return '\0';
}
And now:
cdef extern from "zerochar.h":
int zerochar()
cdef char arr[n]
arr[0] = zerochar()
or
cdef extern from "zerochar.h":
int zerochar()
from libc.string cimport memset
cdef char arr[n]
memset(arr, zerochar(), n)
In C '' is used for a char, and "" for a string. But any 'empty char' does not really make sense, probably what you want is '\0' or just 0
Maybe:
import cython
from libc.stdlib cimport malloc, free
cdef char * test():
n = 10
cdef char *arr = <char *>malloc(n * sizeof(char))
for n in range(n):
arr[n] = '\0'
return arr
Edit
void *
calloc(size_t count, size_t size);
Does that for you,
How about:
cdef char *arr = ['\0']*n

Returning Cython array

How does one properly initialize and return a Cython array? For instance:
cdef public double* cyTest(double[] input):
cdef double output[3]
for i in xrange(3):
output[i] = input[i]**2
print 'loop: ' + str(output[i])
return output
cdef double* test = [1,2,3]
cdef double* results = cyTest(test)
for i in xrange(3):
print 'return: ' + str(results[i])
This returns:
loop: 1.0->1.0
loop: 2.0->4.0
loop: 3.0->9.0
return: 1.88706086937e-299
return: 9.7051011575e+236
return: 1.88706086795e-299
So obviously, results still points only to garbage instead of the values it should point to. Admittedly, I am slightly confused with mixing the pointer and array syntax and which one is preferable/possible in a Cython context.
In the end, I want to call cyTest from a pure C++ function:
#include <iostream>
#include <Python.h>
#include "cyTest.h"
void main() {
Py_Initialize();
initcyTest();
double input[3] = {1,2,3};
double* output = cyTest(input);
for(int i = 0; i < 3; i++)
std::cout << "cout: " << output[i] << std::endl;
Py_Finalize();
}
This returns similar results:
loop: 1.0->1.0
loop: 2.0->4.0
loop: 3.0->9.0
cout: 1
cout: 6.30058e+077
cout: 6.39301e-308
Anyone care to explain what error I'm making? I'd like to keep it as simple as possible. It's just returning an array from Cython to C++ after all. I'll deal with dynamic memory allocation later, if not necessary.
You are returning reference to local array ( output ), which will not work.
Try changing your script to:
from cpython.mem cimport PyMem_Malloc
cdef public double * cyTest(double[] input):
cdef double * output = < double * >PyMem_Malloc( sizeof(double) * 3 )
for i in xrange(3):
output[i] = input[i]**2
print 'loop: ' + str(output[i])
return output
And in your c++ code,
after you done using double* output issue free( output );
If you want to use cdef double* results = cyTest(test) in your pyx script then don't forget to use PyMem_Free(results)

Extending Numpy with C function

I am trying to speed up my Numpy code and decided that I wanted to implement one particular function where my code spent most of the time in C.
I'm actually a rookie in C, but I managed to write the function which normalizes every row in a matrix to sum to 1. I can compile it and I tested it with some data (in C) and it does what I want. At that point I was very proud of myself.
Now I'm trying to call my glorious function from Python where it should accept a 2d-Numpy array.
The various things I've tried are
SWIG
SWIG + numpy.i
ctypes
My function has the prototype
void normalize_logspace_matrix(size_t nrow, size_t ncol, double mat[nrow][ncol]);
So it takes a pointer to a variable-length array and modifies it in place.
I tried the following pure SWIG interface file:
%module c_utils
%{
extern void normalize_logspace_matrix(size_t, size_t, double mat[*][*]);
%}
extern void normalize_logspace_matrix(size_t, size_t, double** mat);
Then I would do (on Mac OS X 64bit):
> swig -python c-utils.i
> gcc -fPIC c-utils_wrap.c -o c-utils_wrap.o \
-I/Library/Frameworks/Python.framework/Versions/6.2/include/python2.6/ \
-L/Library/Frameworks/Python.framework/Versions/6.2/lib/python2.6/ -c
c-utils_wrap.c: In function ‘_wrap_normalize_logspace_matrix’:
c-utils_wrap.c:2867: warning: passing argument 3 of ‘normalize_logspace_matrix’ from incompatible pointer type
> g++ -dynamiclib c-utils.o -o _c_utils.so
In Python I then get the following error on importing my module:
>>> import c_utils
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define init function (initc_utils)
Next I tried this approach using SWIG + numpy.i:
%module c_utils
%{
#define SWIG_FILE_WITH_INIT
#include "c-utils.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
%apply ( int DIM1, int DIM2, DATA_TYPE* INPLACE_ARRAY2 )
{(size_t nrow, size_t ncol, double* mat)};
%include "c-utils.h"
However, I don't get any further than this:
> swig -python c-utils.i
c-utils.i:13: Warning 453: Can't apply (int DIM1,int DIM2,DATA_TYPE *INPLACE_ARRAY2). No typemaps are defined.
SWIG doesn't seem to find the typemaps defined in numpy.i, but I don't understand why, because numpy.i is in the same directory and SWIG doesn't complain that it can't find it.
With ctypes I didn't get very far, but got lost in the docs pretty quickly since I couldn't figure out how to pass it a 2d-array and then get the result back.
So could somebody show me the magic trick how to make my function available in Python/Numpy?
Unless you have a really good reason not to, you should use cython to interface C and python. (We are starting to use cython instead of raw C inside numpy/scipy themselves).
You can see a simple example in my scikits talkbox (since cython has improved quite a bit since then, I think you could write it better today).
def cslfilter(c_np.ndarray b, c_np.ndarray a, c_np.ndarray x):
"""Fast version of slfilter for a set of frames and filter coefficients.
More precisely, given rank 2 arrays for coefficients and input, this
computes:
for i in range(x.shape[0]):
y[i] = lfilter(b[i], a[i], x[i])
This is mostly useful for processing on a set of windows with variable
filters, e.g. to compute LPC residual from a signal chopped into a set of
windows.
Parameters
----------
b: array
recursive coefficients
a: array
non-recursive coefficients
x: array
signal to filter
Note
----
This is a specialized function, and does not handle other types than
double, nor initial conditions."""
cdef int na, nb, nfr, i, nx
cdef double *raw_x, *raw_a, *raw_b, *raw_y
cdef c_np.ndarray[double, ndim=2] tb
cdef c_np.ndarray[double, ndim=2] ta
cdef c_np.ndarray[double, ndim=2] tx
cdef c_np.ndarray[double, ndim=2] ty
dt = np.common_type(a, b, x)
if not dt == np.float64:
raise ValueError("Only float64 supported for now")
if not x.ndim == 2:
raise ValueError("Only input of rank 2 support")
if not b.ndim == 2:
raise ValueError("Only b of rank 2 support")
if not a.ndim == 2:
raise ValueError("Only a of rank 2 support")
nfr = a.shape[0]
if not nfr == b.shape[0]:
raise ValueError("Number of filters should be the same")
if not nfr == x.shape[0]:
raise ValueError, \
"Number of filters and number of frames should be the same"
tx = np.ascontiguousarray(x, dtype=dt)
ty = np.ones((x.shape[0], x.shape[1]), dt)
na = a.shape[1]
nb = b.shape[1]
nx = x.shape[1]
ta = np.ascontiguousarray(np.copy(a), dtype=dt)
tb = np.ascontiguousarray(np.copy(b), dtype=dt)
raw_x = <double*>tx.data
raw_b = <double*>tb.data
raw_a = <double*>ta.data
raw_y = <double*>ty.data
for i in range(nfr):
filter_double(raw_b, nb, raw_a, na, raw_x, nx, raw_y)
raw_b += nb
raw_a += na
raw_x += nx
raw_y += nx
return ty
As you can see, besides the usual argument checking you would do in python, it is almost the same thing (filter_double is a function which can be written in pure C in a separate library if you want to). Of course, since it is compiled code, failing to check your argument will crash your interpreter instead of raising exception (there are several levels of safety vs speed tradeoffs available with recent cython, though).
To answer the real question: SWIG doesn't tell you it can't find any typemaps. It tells you it can't apply the typemap (int DIM1,int DIM2,DATA_TYPE *INPLACE_ARRAY2), which is because there is no typemap defined for DATA_TYPE *. You need to tell it you want to apply it to a double*:
%apply ( int DIM1, int DIM2, double* INPLACE_ARRAY2 )
{(size_t nrow, size_t ncol, double* mat)};
First, are you sure that you were writing the fastest possible numpy code? If by normalise you mean divide the whole row by its sum, then you can write fast vectorised code which looks something like this:
matrix /= matrix.sum(axis=0)
If this is not what you had in mind and you are still sure that you need a fast C extension, I would strongly recommend you write it in cython instead of C. This will save you all the overhead and difficulties in wrapping code, and allow you to write something which looks like python code but which can be made to run as fast as C in most circumstances.
I agree with others that a little Cython is well worth learning.
But if you must write C or C++, use a 1d array which overlays the 2d, like this:
// sum1rows.cpp: 2d A as 1d A1
// Unfortunately
// void f( int m, int n, double a[m][n] ) { ... }
// is valid c but not c++ .
// See also
// http://stackoverflow.com/questions/3959457/high-performance-c-multi-dimensional-arrays
// http://stackoverflow.com/questions/tagged/multidimensional-array c++
#include <stdio.h>
void sum1( int n, double x[] ) // x /= sum(x)
{
float sum = 0;
for( int j = 0; j < n; j ++ )
sum += x[j];
for( int j = 0; j < n; j ++ )
x[j] /= sum;
}
void sum1rows( int nrow, int ncol, double A1[] ) // 1d A1 == 2d A[nrow][ncol]
{
for( int j = 0; j < nrow*ncol; j += ncol )
sum1( ncol, &A1[j] );
}
int main( int argc, char** argv )
{
int nrow = 100, ncol = 10;
double A[nrow][ncol];
for( int j = 0; j < nrow; j ++ )
for( int k = 0; k < ncol; k ++ )
A[j][k] = (j+1) * k;
double* A1 = &A[0][0]; // A as 1d array -- bad practice
sum1rows( nrow, ncol, A1 );
for( int j = 0; j < 2; j ++ ){
for( int k = 0; k < ncol; k ++ ){
printf( "%.2g ", A[j][k] );
}
printf( "\n" );
}
}
Added 8 Nov: as you probably know, numpy.reshape can overlay a numpy 2d array with a 1d view to pass to sum1rows, like this:
import numpy as np
A = np.arange(10).reshape((2,5))
A1 = A.reshape(A.size) # a 1d view of A, not a copy
# sum1rows( 2, 5, A1 )
A[1,1] += 10
print "A:", A
print "A1:", A1
SciPy has an extension tutorial with example code for arrays.
http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html

Categories

Resources