I have some code writen in Python for which the output is a numpy array, and now I want to send that output to C++ code, where the heavy part of the calculations will be performed.
I have tried using cython's public cdef, but I am running on some issues. I would appreciate your help! Here goes my code:
pymodule.pyx:
from pythonmodule import result # result is my numpy array
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cdef public void cfunc():
print 'I am in here!!!'
cdef np.ndarray[np.float64_t, ndim=2, mode='c'] res = result
print res
Once this is cythonized, I call:
pymain.c:
#include <Python.h>
#include <numpy/arrayobject.h>
#include "pymodule.h"
int main() {
Py_Initialize();
initpymodule();
test(2);
Py_Finalize();
}
int test(int a)
{
Py_Initialize();
initpymodule();
cfunc();
return 0;
}
I am getting a NameError for the result variable at C++. I have tried defining it with pointers and calling it indirectly from other functions, but the array remains invisible. I am pretty sure the answer is quite simple, but I just do not get it. Thanks for your help!
Short Answer
The NameError was cause by the fact that Python couldn't find the module, the working directory isn't automatically added to your PYTHONPATH. Using setenv with setenv("PYTHONPATH", ".", 1); in your C/C++ code fixes this.
Longer Answer
There's an easy way to do this, apparently. With a python module pythonmodule.py containing an already created array:
import numpy as np
result = np.arange(20, dtype=np.float).reshape((2, 10))
You can structure your pymodule.pyx to export that array by using the public keyword. By adding some auxiliary functions, you'll generally won't need to touch neither the Python, nor the Numpy C-API:
from pythonmodule import result
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
cdef public np.ndarray getNPArray():
""" Return array from pythonmodule. """
return <np.ndarray>result
cdef public int getShape(np.ndarray arr, int shape):
""" Return Shape of the Array based on shape par value. """
return <int>arr.shape[1] if shape else <int>arr.shape[0]
cdef public void copyData(float *** dst, np.ndarray src):
""" Copy data from src numpy array to dst. """
cdef float **tmp
cdef int i, j, m = src.shape[0], n=src.shape[1];
# Allocate initial pointer
tmp = <float **>malloc(m * sizeof(float *))
if not tmp:
raise MemoryError()
# Allocate rows
for j in range(m):
tmp[j] = <float *>malloc(n * sizeof(float))
if not tmp[j]:
raise MemoryError()
# Copy numpy Array
for i in range(m):
for j in range(n):
tmp[i][j] = src[i, j]
# Assign pointer to dst
dst[0] = tmp
Function getNPArray and getShape return the array and its shape, respectively. copyData was added in order to just extract the ndarray.data and copy it so you can then finalize Python and work without having the interpreter initialized.
A sample program (in C, C++ should look identical) would look like this:
#include <Python.h>
#include "numpy/arrayobject.h"
#include "pyxmod.h"
#include <stdio.h>
void printArray(float **arr, int m, int n);
void getArray(float ***arr, int * m, int * n);
int main(int argc, char **argv){
// Holds data and shapes.
float **data = NULL;
int m, n;
// Gets array and then prints it.
getArray(&data, &m, &n);
printArray(data, m, n);
return 0;
}
void getArray(float ***data, int * m, int * n){
// setenv is important, makes python find
// modules in working directory
setenv("PYTHONPATH", ".", 1);
// Initialize interpreter and module
Py_Initialize();
initpyxmod();
// Use Cython functions.
PyArrayObject *arr = getNPArray();
*m = getShape(arr, 0);
*n = getShape(arr, 1);
copyData(data, arr);
if (data == NULL){ //really redundant.
fprintf(stderr, "Data is NULL\n");
return ;
}
Py_DECREF(arr);
Py_Finalize();
}
void printArray(float **arr, int m, int n){
int i, j;
for(i=0; i < m; i++){
for(j=0; j < n; j++)
printf("%f ", arr[i][j]);
printf("\n");
}
}
Always remember to set:
setenv("PYTHONPATH", ".", 1);
before you call Py_Initialize so Python can find modules in the working directory.
The rest is pretty straight-forward. It might need some additional error-checking and definitely needs a function to free the allocated memmory.
Alternate Way w/o Cython:
Doing it the way you are attempting is way hassle than it's worth, you would probably be better off using numpy.save to save your array in a npy binary file and then use some C++ library that reads that file for you.
Related
Currently I'm learning about C types. My goal is to generate an numpy
array A in python from 0 to 4*pi in 500 steps. That array is passed to
C code which calculates the tangent of those values. The C code also
passes those values back to an numpy array B in python.
Yesterday I tried simply to convert one value from python to C and
(after some help) succeeded. Today I try to pass a whole array, not a
value.
I think it's an good idea to add another function to the C library to
process the array. The new function should in a loop pass each value
of A to the function tan1() and store that value in array B.
I have two issues:
writing the function that processes the numpy array A
Passing the numpy array between python and C code.
I read the following info:
https://nenadmarkus.com/p/numpy-to-native/
How to use NumPy array with ctypes?
Helpful, but I still don't know how to solve my problem.
C code (Only the piece that seems relevant):
double tan1(f) double f;
{
return sin1(f)/cos1(f);
}
void loop(double A, int n);
{
double *B;
B = (double*) malloc(n * sizeof(double));
for(i=0; i<= n, i++)
{
B[i] = tan1(A[i])
}
}
Python code:
import numpy as np
import ctypes
A = np.array(np.linspace(0,4*np.pi,500), dtype=np.float64)
testlib = ctypes.CDLL('./testlib.so')
testlib.loop.argtypes = ctypes.c_double,
testlib.loop.restype = ctypes.c_double
#print(testlib.tan1(3))
I'm aware that ctypes.c_double is wrong in this context, but that is what I had in the 1 value version and don't know yet for what to substitute.
Could I please get some feedback on how to achieve this goal?
You need to return the dynamically allocated memory, e.g. change your C code to something like:
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
double tan1(double f) {
return sin(f)/cos(f);
}
double *loop(double *arr, int n) {
double *b = malloc(n * sizeof(double));
for(int i = 0; i < n; i++) {
b[i] = tan(arr[i]);
}
return b;
}
void freeArray(double *b) {
free(b);
}
On the Python side you have to declare parameter and return types. As mentioned by others in comments, you should also free dynamically allocated memory. Note that on the C side, arrays always decay into pointers. Therefore, you need an additional parameter which tells you the number of elements in the array.
Also if you return a pointer to double to the Python page, you must specify the size of the array. With np.frombuffer you can work with the data without making a copy of it.
import numpy as np
from ctypes import *
testlib = ctypes.CDLL('./testlib.so')
n = 500
dtype = np.float64
input_array = np.array(np.linspace(0, 4 * np.pi, n), dtype=dtype)
input_ptr = input_array.ctypes.data_as(POINTER(c_double))
testlib.loop.argtypes = (POINTER(c_double), c_int)
testlib.loop.restype = POINTER(c_double * n)
testlib.freeArray.argtypes = POINTER(c_double * n),
result_ptr = testlib.loop(input_ptr, n)
result_array = np.frombuffer(result_ptr.contents)
# ...do some processing
for value in result_array:
print(value)
# free buffer
testlib.freeArray(result_ptr)
Is there a way to use AES-NI instructions within Cython code?
Closest I could find is how someone accessed SIMD instructions:
https://groups.google.com/forum/#!msg/cython-users/nTnyI7A6sMc/a6_GnOOsLuQJ
AES-NI in Python thread was not answered:
Python support for AES-NI
You should be able to just define the intrinsics as if they're normal C functions in Cython. Something like
cdef extern from "emmintrin.h": # I'm going off the microsoft documentation for where the headers are
# define the datatype as an opaque type
ctypedef struct __m128i:
pass
__m128i _mm_set_epi32 (int i3, int i2, int i1, int i0)
cdef extern from "wmmintrin.h":
__m128i _mm_aesdec_si128(__m128i v,__m128i rkey)
# then in some Cython function
def f():
cdef __m128i v = _mm_set_epi32(1,2,3,4)
cdef __m128i key = _mm_set_epi32(5,6,7,8)
cdef __m128i result = _mm_aesdec_si128(v,key)
The question "how do I apply this over a bytes array"? First, you get a char* of the bytes array. Then just iterate over it with range (being careful not to run off the end).
# assuming you already have an __m128i key
cdef __m128i v
cdef char* array = python_bytes_array # auto conversion
cdef int i, j
# you NEED to ensure that the byte array has a length divisible by
# 16, otherwise you'll probably get a segmentation fault.
for i in range(0,len(python_bytes_array),16):
# go over in chunks of 16
v = _mm_set_epi8(array[i+15],array[i+14],array[i+13],
# etc... fill in the rest
array[i+1], array[i])
cdef __m128 result = _mm_aesdec_si128(v,key)
# write back to the same place?
for j in range(16):
array[i+j] = _mm_extract_epi8(result,j)
Can anyone enlighten me how to pass a 2D array created in cython to a cdef function? I can do that with 1D array, but
not with 2D (or higher), let me illustrate the situation:
This is the C code that I would like to reproduce in cython:
#include <stdio.h>
void print_my_1Darray();
void print_my_2Darray();
int main(void){
int arr1D[] = {1,2,3,4,5,6,7,8,9,10,11,12};
int arr2D[3][4] = {{1,2,3,4},{5,6,7,8},{9,10,11,12}};
print_my_1Darray(arr1D);
printf("\n");
print_my_2Darray(arr2D);
printf("\n");
return 0;
}
void print_my_1Darray( int x[] ){
int i;
for(i=0; i < 12; i++){
printf("c[%d] = %d\n",i, x[i]);
}
}
void print_my_2Darray( int x[3][4] ){
int i, j;
for(i=0; i < 3; i++){
for(j=0; j < 4; j++){
printf("c[%d][%d] = %d\n",j, i, x[i][j]);
}
}
}
And then if I try to reproduce this in Cython like this:
cimport cython
import numpy as np
cimport numpy as cnp
def testfunc():
cdef int *arr1D = [1,2,3,4,5,6,7,8,9,10,11,12]
print_my_1D_array(arr1D)
cdef int *arr2D = [[1,2,3,4], [5,6,7,8], [9,10,11,12]] # <-- WRONG!
print_my_2D_array(arr2D)
cdef void print_my_1D_array(int c_arr[12]):
cdef int i
for i in range(4):
print c_arr[i]
cdef void print_my_2D_array(int c_arr[3][4]):
cdef int i, j
for i in range(3):
for j in range(4):
print c_arr[i][j]
and when I compile this pyx script I get the error:
cdef int *arr2D = [[1,2,3,4][5,6,7,8][9,10,11,12]]
print_my_2D_array(arr2D)
^
------------------------------------------------------------
test2.pyx:18:27: Cannot assign type 'int *' to 'int (*)[4]'
It seems that I can create something with the
"cdef int *arr2D = [[1,2,3,4][5,6,7,8][9,10,11,12]]"
line and it compiles ok until I try to pass it to a function or simply print it's members...
Can anyone explain what's happening there and how to create pure-c 2D/3D arrays in cython and how to pass them to c-level functions? Also, I am trying to avoid numpy arrays there to avoid python overhead, as my code will require very fast calculations on arrays.
You need to make your input array static:
cdef int arr2D[3][4]
arr2D[0][:] = [1, 2, 3, 4]
arr2D[1][:] = [5, 6, 7, 8]
arr2D[2][:] = [9, 10, 11, 12]
How does one properly initialize and return a Cython array? For instance:
cdef public double* cyTest(double[] input):
cdef double output[3]
for i in xrange(3):
output[i] = input[i]**2
print 'loop: ' + str(output[i])
return output
cdef double* test = [1,2,3]
cdef double* results = cyTest(test)
for i in xrange(3):
print 'return: ' + str(results[i])
This returns:
loop: 1.0->1.0
loop: 2.0->4.0
loop: 3.0->9.0
return: 1.88706086937e-299
return: 9.7051011575e+236
return: 1.88706086795e-299
So obviously, results still points only to garbage instead of the values it should point to. Admittedly, I am slightly confused with mixing the pointer and array syntax and which one is preferable/possible in a Cython context.
In the end, I want to call cyTest from a pure C++ function:
#include <iostream>
#include <Python.h>
#include "cyTest.h"
void main() {
Py_Initialize();
initcyTest();
double input[3] = {1,2,3};
double* output = cyTest(input);
for(int i = 0; i < 3; i++)
std::cout << "cout: " << output[i] << std::endl;
Py_Finalize();
}
This returns similar results:
loop: 1.0->1.0
loop: 2.0->4.0
loop: 3.0->9.0
cout: 1
cout: 6.30058e+077
cout: 6.39301e-308
Anyone care to explain what error I'm making? I'd like to keep it as simple as possible. It's just returning an array from Cython to C++ after all. I'll deal with dynamic memory allocation later, if not necessary.
You are returning reference to local array ( output ), which will not work.
Try changing your script to:
from cpython.mem cimport PyMem_Malloc
cdef public double * cyTest(double[] input):
cdef double * output = < double * >PyMem_Malloc( sizeof(double) * 3 )
for i in xrange(3):
output[i] = input[i]**2
print 'loop: ' + str(output[i])
return output
And in your c++ code,
after you done using double* output issue free( output );
If you want to use cdef double* results = cyTest(test) in your pyx script then don't forget to use PyMem_Free(results)
I am trying to speed up my Numpy code and decided that I wanted to implement one particular function where my code spent most of the time in C.
I'm actually a rookie in C, but I managed to write the function which normalizes every row in a matrix to sum to 1. I can compile it and I tested it with some data (in C) and it does what I want. At that point I was very proud of myself.
Now I'm trying to call my glorious function from Python where it should accept a 2d-Numpy array.
The various things I've tried are
SWIG
SWIG + numpy.i
ctypes
My function has the prototype
void normalize_logspace_matrix(size_t nrow, size_t ncol, double mat[nrow][ncol]);
So it takes a pointer to a variable-length array and modifies it in place.
I tried the following pure SWIG interface file:
%module c_utils
%{
extern void normalize_logspace_matrix(size_t, size_t, double mat[*][*]);
%}
extern void normalize_logspace_matrix(size_t, size_t, double** mat);
Then I would do (on Mac OS X 64bit):
> swig -python c-utils.i
> gcc -fPIC c-utils_wrap.c -o c-utils_wrap.o \
-I/Library/Frameworks/Python.framework/Versions/6.2/include/python2.6/ \
-L/Library/Frameworks/Python.framework/Versions/6.2/lib/python2.6/ -c
c-utils_wrap.c: In function ‘_wrap_normalize_logspace_matrix’:
c-utils_wrap.c:2867: warning: passing argument 3 of ‘normalize_logspace_matrix’ from incompatible pointer type
> g++ -dynamiclib c-utils.o -o _c_utils.so
In Python I then get the following error on importing my module:
>>> import c_utils
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: dynamic module does not define init function (initc_utils)
Next I tried this approach using SWIG + numpy.i:
%module c_utils
%{
#define SWIG_FILE_WITH_INIT
#include "c-utils.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
%apply ( int DIM1, int DIM2, DATA_TYPE* INPLACE_ARRAY2 )
{(size_t nrow, size_t ncol, double* mat)};
%include "c-utils.h"
However, I don't get any further than this:
> swig -python c-utils.i
c-utils.i:13: Warning 453: Can't apply (int DIM1,int DIM2,DATA_TYPE *INPLACE_ARRAY2). No typemaps are defined.
SWIG doesn't seem to find the typemaps defined in numpy.i, but I don't understand why, because numpy.i is in the same directory and SWIG doesn't complain that it can't find it.
With ctypes I didn't get very far, but got lost in the docs pretty quickly since I couldn't figure out how to pass it a 2d-array and then get the result back.
So could somebody show me the magic trick how to make my function available in Python/Numpy?
Unless you have a really good reason not to, you should use cython to interface C and python. (We are starting to use cython instead of raw C inside numpy/scipy themselves).
You can see a simple example in my scikits talkbox (since cython has improved quite a bit since then, I think you could write it better today).
def cslfilter(c_np.ndarray b, c_np.ndarray a, c_np.ndarray x):
"""Fast version of slfilter for a set of frames and filter coefficients.
More precisely, given rank 2 arrays for coefficients and input, this
computes:
for i in range(x.shape[0]):
y[i] = lfilter(b[i], a[i], x[i])
This is mostly useful for processing on a set of windows with variable
filters, e.g. to compute LPC residual from a signal chopped into a set of
windows.
Parameters
----------
b: array
recursive coefficients
a: array
non-recursive coefficients
x: array
signal to filter
Note
----
This is a specialized function, and does not handle other types than
double, nor initial conditions."""
cdef int na, nb, nfr, i, nx
cdef double *raw_x, *raw_a, *raw_b, *raw_y
cdef c_np.ndarray[double, ndim=2] tb
cdef c_np.ndarray[double, ndim=2] ta
cdef c_np.ndarray[double, ndim=2] tx
cdef c_np.ndarray[double, ndim=2] ty
dt = np.common_type(a, b, x)
if not dt == np.float64:
raise ValueError("Only float64 supported for now")
if not x.ndim == 2:
raise ValueError("Only input of rank 2 support")
if not b.ndim == 2:
raise ValueError("Only b of rank 2 support")
if not a.ndim == 2:
raise ValueError("Only a of rank 2 support")
nfr = a.shape[0]
if not nfr == b.shape[0]:
raise ValueError("Number of filters should be the same")
if not nfr == x.shape[0]:
raise ValueError, \
"Number of filters and number of frames should be the same"
tx = np.ascontiguousarray(x, dtype=dt)
ty = np.ones((x.shape[0], x.shape[1]), dt)
na = a.shape[1]
nb = b.shape[1]
nx = x.shape[1]
ta = np.ascontiguousarray(np.copy(a), dtype=dt)
tb = np.ascontiguousarray(np.copy(b), dtype=dt)
raw_x = <double*>tx.data
raw_b = <double*>tb.data
raw_a = <double*>ta.data
raw_y = <double*>ty.data
for i in range(nfr):
filter_double(raw_b, nb, raw_a, na, raw_x, nx, raw_y)
raw_b += nb
raw_a += na
raw_x += nx
raw_y += nx
return ty
As you can see, besides the usual argument checking you would do in python, it is almost the same thing (filter_double is a function which can be written in pure C in a separate library if you want to). Of course, since it is compiled code, failing to check your argument will crash your interpreter instead of raising exception (there are several levels of safety vs speed tradeoffs available with recent cython, though).
To answer the real question: SWIG doesn't tell you it can't find any typemaps. It tells you it can't apply the typemap (int DIM1,int DIM2,DATA_TYPE *INPLACE_ARRAY2), which is because there is no typemap defined for DATA_TYPE *. You need to tell it you want to apply it to a double*:
%apply ( int DIM1, int DIM2, double* INPLACE_ARRAY2 )
{(size_t nrow, size_t ncol, double* mat)};
First, are you sure that you were writing the fastest possible numpy code? If by normalise you mean divide the whole row by its sum, then you can write fast vectorised code which looks something like this:
matrix /= matrix.sum(axis=0)
If this is not what you had in mind and you are still sure that you need a fast C extension, I would strongly recommend you write it in cython instead of C. This will save you all the overhead and difficulties in wrapping code, and allow you to write something which looks like python code but which can be made to run as fast as C in most circumstances.
I agree with others that a little Cython is well worth learning.
But if you must write C or C++, use a 1d array which overlays the 2d, like this:
// sum1rows.cpp: 2d A as 1d A1
// Unfortunately
// void f( int m, int n, double a[m][n] ) { ... }
// is valid c but not c++ .
// See also
// http://stackoverflow.com/questions/3959457/high-performance-c-multi-dimensional-arrays
// http://stackoverflow.com/questions/tagged/multidimensional-array c++
#include <stdio.h>
void sum1( int n, double x[] ) // x /= sum(x)
{
float sum = 0;
for( int j = 0; j < n; j ++ )
sum += x[j];
for( int j = 0; j < n; j ++ )
x[j] /= sum;
}
void sum1rows( int nrow, int ncol, double A1[] ) // 1d A1 == 2d A[nrow][ncol]
{
for( int j = 0; j < nrow*ncol; j += ncol )
sum1( ncol, &A1[j] );
}
int main( int argc, char** argv )
{
int nrow = 100, ncol = 10;
double A[nrow][ncol];
for( int j = 0; j < nrow; j ++ )
for( int k = 0; k < ncol; k ++ )
A[j][k] = (j+1) * k;
double* A1 = &A[0][0]; // A as 1d array -- bad practice
sum1rows( nrow, ncol, A1 );
for( int j = 0; j < 2; j ++ ){
for( int k = 0; k < ncol; k ++ ){
printf( "%.2g ", A[j][k] );
}
printf( "\n" );
}
}
Added 8 Nov: as you probably know, numpy.reshape can overlay a numpy 2d array with a 1d view to pass to sum1rows, like this:
import numpy as np
A = np.arange(10).reshape((2,5))
A1 = A.reshape(A.size) # a 1d view of A, not a copy
# sum1rows( 2, 5, A1 )
A[1,1] += 10
print "A:", A
print "A1:", A1
SciPy has an extension tutorial with example code for arrays.
http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html