AES-NI intrinsics in Cython?

AES-NI intrinsics in Cython? - python

Is there a way to use AES-NI instructions within Cython code?
Closest I could find is how someone accessed SIMD instructions:
https://groups.google.com/forum/#!msg/cython-users/nTnyI7A6sMc/a6_GnOOsLuQJ
AES-NI in Python thread was not answered:
Python support for AES-NI

You should be able to just define the intrinsics as if they're normal C functions in Cython. Something like
cdef extern from "emmintrin.h": # I'm going off the microsoft documentation for where the headers are
# define the datatype as an opaque type
ctypedef struct __m128i:
pass
__m128i _mm_set_epi32 (int i3, int i2, int i1, int i0)
cdef extern from "wmmintrin.h":
__m128i _mm_aesdec_si128(__m128i v,__m128i rkey)
# then in some Cython function
def f():
cdef __m128i v = _mm_set_epi32(1,2,3,4)
cdef __m128i key = _mm_set_epi32(5,6,7,8)
cdef __m128i result = _mm_aesdec_si128(v,key)
The question "how do I apply this over a bytes array"? First, you get a char* of the bytes array. Then just iterate over it with range (being careful not to run off the end).
# assuming you already have an __m128i key
cdef __m128i v
cdef char* array = python_bytes_array # auto conversion
cdef int i, j
# you NEED to ensure that the byte array has a length divisible by
# 16, otherwise you'll probably get a segmentation fault.
for i in range(0,len(python_bytes_array),16):
# go over in chunks of 16
v = _mm_set_epi8(array[i+15],array[i+14],array[i+13],
# etc... fill in the rest
array[i+1], array[i])
cdef __m128 result = _mm_aesdec_si128(v,key)
# write back to the same place?
for j in range(16):
array[i+j] = _mm_extract_epi8(result,j)

Related

Storing data using PyArray_NewFromDescr

I use cython and I need to store the data as shown below. Earlier I used for loops to store the data from pus_image[0] into a 3D array but when running for n frames it created a bottleneck in performance. Hence I used PyArray_NewFromDescr to store which solves the bottleneck issue earlier faced. But the displayed images look different from the previous method, as I am not able to do increment _puc_image += aoiStride. Could anyone please help me solve this issue.
Code 1 :
def LiveAquisition(self,nframes,np.ndarray[np.uint16_t,ndim = 3,mode = 'c']data):
cdef:
int available
AT_64 sizeInBytes
AT_64 aoiStride
AT_WC string[20]
AT_WC string1[20]
AT_WC string2[20]
AT_WC string3[20]
unsigned char * pBuf
unsigned char * _puc_image
int BufSize
unsigned int i, j, k, l = 0
for i in range(nframes):
pBuf = <unsigned char *>calloc(sizeInBytes, sizeof(unsigned char))
AT_QueueBuffer(<AT_H>self.cameraHandle, pBuf, sizeInBytes)
print "Frame number is :",
print i
response_code = AT_WaitBuffer(<AT_H>self.cameraHandle, &pBuf, &BufSize, 500)
_puc_image = pBuf
pus_image = <unsigned short*>pBuf
for j in range(self.aoiWidth/self.hbin):
pus_image = <unsigned short*>(_puc_image)
for k in range(self.aoiHeight/self.vbin):
data[l][j][k] = pus_image[0]
pus_image += 1
_puc_image += aoiStride
free(pBuf)
return data
Code 2 : Using PyArray_NewFromDescr
Prior to which its defined as :
from cpython.ref cimport PyTypeObject
from python_ref cimport Py_INCREF
cdef extern from "<numpy/arrayobject.h>":
object PyArray_NewFromDescr(PyTypeObject *subtype, np.dtype descr,int nd, np.npy_intp* dims,np.npy_intp*strides,void* data, int flags, object obj)
def LiveAquisition(self,nframes,np.ndarray[np.uint16_t,ndim = 3,mode = 'c']data):
cdef:
int available
AT_64 sizeInBytes
AT_64 aoiStride
AT_WC string[20]
AT_WC string1[20]
AT_WC string2[20]
AT_WC string3[20]
unsigned char * pBuf
unsigned char * _puc_image
int BufSize
unsigned int i, j, k, l = 0
np.npy_intp dims[2]
np.dtype dtype = np.dtype('<B')
for i in range(nframes):
pBuf = <unsigned char *>calloc(sizeInBytes, sizeof(unsigned char))
AT_QueueBuffer(<AT_H>self.cameraHandle, pBuf, sizeInBytes)
print "Frame number is :",
print i
response_code = AT_WaitBuffer(<AT_H>self.cameraHandle, &pBuf, &BufSize, 500)
Py_INCREF(dtype)
dims[0] = self.aoiWidth
dims[1] = self.aoiHeight
data[i,:,:] = PyArray_NewFromDescr(<PyTypeObject *> np.ndarray, np.dtype('<B'), 2,dims, NULL,pBuf, np.NPY_C_CONTIGUOUS, None)
free(pBuf)
return data

There's a few large errors in the way you're doing this. However, what you're doing is totally unnecessary, and there's a much simpler approach.
You can simply allocate the data using Numpy, and get the address of the first element of that array:
# earlier
cdef unsigned char[:,::1] p
# in loop
p = np.array((self.aoiWidth,self.aoiHeight),dtype=np.uint8)
pbuf = &p[0,0] # address of first element of p
# code goes here
data[i,:,:] = p
Errors in what you're doing:
pBuf = <unsigned char *>calloc(sizeInBytes, sizeof(unsigned char))
Here, sizeInBytes is uninitialized, and therefore the size you allocate with be arbitrary.
PyArray_NewFromDescr steals a reference to the descr argument. This means that it does not increment the reference count of the argument. The line
PyArray_NewFromDescr(<PyTypeObject *> np.ndarray, np.dtype('<B'), ...)
will be translated as Cython to something like
temp_dtype = np.dtype('<B') # refcount 1
PyArray_NewFromDescr(<PyTypeObject *> np.ndarray, temp_dtype, ...)
# temp_dtype refcount is still 1
Py_DECREF(temp_dtype) # Cython's own cleanup
# temp_dtype has now been destroyed, but is still being used by your array
It looks like you copied some code that dealt with this correctly (Py_INCREF(dtype), which was then passed to PyArray_NewFromDescr), but chose to ignore that and create your own temporary object.
PyArray_NewFromDescr does not own the data. Therefore you are responsible for deallocating it once it has been used (and only when you're sure it's no longer needed). You only do one free, after the loop, so you are leaking almost all the memory you allocated. Either put the free in the loop, or modify the OWNDATA flag to give your new array ownership of your array.
In summary, unless you have a good understanding of the Python C API I recommend don't using PyArray_NewFromDescr and using numpy arrays to allocate your data instead.

Passing a numpy array to C++

I have some code writen in Python for which the output is a numpy array, and now I want to send that output to C++ code, where the heavy part of the calculations will be performed.
I have tried using cython's public cdef, but I am running on some issues. I would appreciate your help! Here goes my code:
pymodule.pyx:
from pythonmodule import result # result is my numpy array
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cdef public void cfunc():
print 'I am in here!!!'
cdef np.ndarray[np.float64_t, ndim=2, mode='c'] res = result
print res
Once this is cythonized, I call:
pymain.c:
#include <Python.h>
#include <numpy/arrayobject.h>
#include "pymodule.h"
int main() {
Py_Initialize();
initpymodule();
test(2);
Py_Finalize();
}
int test(int a)
{
Py_Initialize();
initpymodule();
cfunc();
return 0;
}
I am getting a NameError for the result variable at C++. I have tried defining it with pointers and calling it indirectly from other functions, but the array remains invisible. I am pretty sure the answer is quite simple, but I just do not get it. Thanks for your help!

Short Answer
The NameError was cause by the fact that Python couldn't find the module, the working directory isn't automatically added to your PYTHONPATH. Using setenv with setenv("PYTHONPATH", ".", 1); in your C/C++ code fixes this.
Longer Answer
There's an easy way to do this, apparently. With a python module pythonmodule.py containing an already created array:
import numpy as np
result = np.arange(20, dtype=np.float).reshape((2, 10))
You can structure your pymodule.pyx to export that array by using the public keyword. By adding some auxiliary functions, you'll generally won't need to touch neither the Python, nor the Numpy C-API:
from pythonmodule import result
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
cdef public np.ndarray getNPArray():
""" Return array from pythonmodule. """
return <np.ndarray>result
cdef public int getShape(np.ndarray arr, int shape):
""" Return Shape of the Array based on shape par value. """
return <int>arr.shape[1] if shape else <int>arr.shape[0]
cdef public void copyData(float *** dst, np.ndarray src):
""" Copy data from src numpy array to dst. """
cdef float **tmp
cdef int i, j, m = src.shape[0], n=src.shape[1];
# Allocate initial pointer
tmp = <float **>malloc(m * sizeof(float *))
if not tmp:
raise MemoryError()
# Allocate rows
for j in range(m):
tmp[j] = <float *>malloc(n * sizeof(float))
if not tmp[j]:
raise MemoryError()
# Copy numpy Array
for i in range(m):
for j in range(n):
tmp[i][j] = src[i, j]
# Assign pointer to dst
dst[0] = tmp
Function getNPArray and getShape return the array and its shape, respectively. copyData was added in order to just extract the ndarray.data and copy it so you can then finalize Python and work without having the interpreter initialized.
A sample program (in C, C++ should look identical) would look like this:
#include <Python.h>
#include "numpy/arrayobject.h"
#include "pyxmod.h"
#include <stdio.h>
void printArray(float **arr, int m, int n);
void getArray(float ***arr, int * m, int * n);
int main(int argc, char **argv){
// Holds data and shapes.
float **data = NULL;
int m, n;
// Gets array and then prints it.
getArray(&data, &m, &n);
printArray(data, m, n);
return 0;
}
void getArray(float ***data, int * m, int * n){
// setenv is important, makes python find
// modules in working directory
setenv("PYTHONPATH", ".", 1);
// Initialize interpreter and module
Py_Initialize();
initpyxmod();
// Use Cython functions.
PyArrayObject *arr = getNPArray();
*m = getShape(arr, 0);
*n = getShape(arr, 1);
copyData(data, arr);
if (data == NULL){ //really redundant.
fprintf(stderr, "Data is NULL\n");
return ;
}
Py_DECREF(arr);
Py_Finalize();
}
void printArray(float **arr, int m, int n){
int i, j;
for(i=0; i < m; i++){
for(j=0; j < n; j++)
printf("%f ", arr[i][j]);
printf("\n");
}
}
Always remember to set:
setenv("PYTHONPATH", ".", 1);
before you call Py_Initialize so Python can find modules in the working directory.
The rest is pretty straight-forward. It might need some additional error-checking and definitely needs a function to free the allocated memmory.
Alternate Way w/o Cython:
Doing it the way you are attempting is way hassle than it's worth, you would probably be better off using numpy.save to save your array in a npy binary file and then use some C++ library that reads that file for you.

Returning an array from a Cdef function

I want to make a pure function in c-style which take an array as an argument (pointer) and do something with it. But I cannot find out how to define an array argument for a cdef function. Here is some toy code I have made.
cdef void test(double[] array ) except? -2:
cdef int i,n
i = 0
n = len(array)
for i in range(0,n):
array[i] = array[i]+1.0
def ctest(a):
n = len(a)
#Make a C-array on the heap.
cdef double *v
v = <double *>malloc(n*sizeof(double))
#Copy in the python array
for i in range(n):
v[i] = float(a[i])
#Calling the C-function which do something with the array
test(v)
#Puttint the changed C-array back into python
for i in range(n):
a[i] = v[i]
free(v)
return a
The code will not compile. Have search for how to define C-arrays in Cython, but have not found how to do it. The double[] array does clearly not not work. Have also tried with:
cdef void test(double* array ) except? -2:
I can manage to do the same in pure c, but not in cython:(
D:\cython-test\ python setup.py build_ext --inplace
Compiling ctest.pyx because it changed.
[1/1] Cythonizing ctest.pyx
Error compiling Cython file:
------------------------------------------------------------
...
from libc.stdlib cimport malloc, free
cdef void test(double[] array):
cdef int i,n
n = len(array)
^
------------------------------------------------------------
ctest.pyx:5:17: Cannot convert 'double *' to Python object
Error compiling Cython file:
------------------------------------------------------------
...
from libc.stdlib cimport malloc, free
cdef void test(double[] array):
cdef int i,n
n = len(array)
for i in range(0,len(array)):
^
------------------------------------------------------------
ctest.pyx:6:30: Cannot convert 'double *' to Python object
Traceback (most recent call last):
File "setup.py", line 10, in <module>
ext_modules = cythonize("ctest.pyx"),
File "C:\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 877, i
n cythonize
cythonize_one(*args)
File "C:\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 997, i
n cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: ctest.pyx
E:\GD\UD\Software\BendStiffener\curvmom>
UPDATE:
Have updated my code after all advices and it compiles now:) But my array do still not update. I will expect that all entries should be updated with 5.0, but they do not
from libc.stdlib cimport malloc, free
cdef void test(double[] array):
cdef int i,n
n = sizeof(array)/sizeof(double)
for i in range(0,n):
array[i] = array[i]+5.0
def ctest(a):
n = len(a)
#Make a C-array on the heap.
cdef double* v
v = <double*>malloc(n*sizeof(double))
#Copy in the python array
for i in range(n):
v[i] = float(a[i])
#Calling the C-function which do something with the array
test(v)
#Puttint the changed C-array back into python
for i in range(n):
a[i] = v[i]
free(v)
for x in a:
print x
return a
Here are a python test program for testing my code:
import ctest
a = [0,0,0]
ctest.ctest(a)
So there is still something I am doing wrong. Any suggestion?

len() is a python function that works only on python objects. This is why it won't compile.
For a C-array you could replace n=len(array) by n = sizeof(array) / sizeof(double).

You might want to take a look at typed memoryviews and the buffer interface. These provide a nice interface to array like data structures like those underlying numpy arrays, but can also be used to work with C arrays. From the documentation:
For example, they can handle C arrays and the Cython array type (Cython arrays).
In your case this might help:
cdef test(double[:] array) except? -2:
...
The double[:] allows all 1d double arrays to be passed to the function. Those can then be modified. As the [:] defines a memoryview, all changes will be made in the array you created the memoryview on (the variable you passed as the parameter to test).

How to create empty char arrays in Cython without loops

Well, this seems easy, but I can't find a single reference on the web. In C we can create a char array of n null-characters as follows:
char arr[n] = "";
But when I try to do the same in Cython with
cdef char arr[n] = ""
I get this compilation error:
Error compiling Cython file:
------------------------------------------------------------
...
cdef char a[n] = ""
^
------------------------------------------------------------
Syntax error in C variable declaration
Obviously Cython doesn't allow to declare arrays this way, but is there an alternative? I don't want to manually set each item in the array, that is I'm not looking for something like this
cdef char a[10]
for i in range(0, 10, 1):
a[i] = b"\0"

You don't have to set each element to make a length-zero C string. It is sufficient to just zero the first element:
cdef char arr[n]
arr[0] = 0
Next, if you want to zero the whole char array, use memset
from libc.string cimport memset
cdef char arr[n]
memset(arr, 0, n)
And if C purists complain about the 0 instead of '\0', note that the '\0' is a Python string (unicode in Python 3) in Cython. '\0' is not a C char in Cython! memset expects an integer value for its second argument, not a Python string.
If you really want to know the int value of a C '\0' in Cython, you must write a helper function in C:
/* zerochar.h */
static int zerochar()
{
return '\0';
}
And now:
cdef extern from "zerochar.h":
int zerochar()
cdef char arr[n]
arr[0] = zerochar()
or
cdef extern from "zerochar.h":
int zerochar()
from libc.string cimport memset
cdef char arr[n]
memset(arr, zerochar(), n)

In C '' is used for a char, and "" for a string. But any 'empty char' does not really make sense, probably what you want is '\0' or just 0
Maybe:
import cython
from libc.stdlib cimport malloc, free
cdef char * test():
n = 10
cdef char *arr = <char *>malloc(n * sizeof(char))
for n in range(n):
arr[n] = '\0'
return arr
Edit
void *
calloc(size_t count, size_t size);
Does that for you,

How about:
cdef char *arr = ['\0']*n

Wrapping C function in Cython and NumPy

I'd like to call my C function from Python, in order to manipulate some NumPy arrays. The function is like this:
void c_func(int *in_array, int n, int *out_array);
where the results are supplied in out_array, whose size I know in advance (not my function, actually). I try to do in the corresponding .pyx file the following, in order to able to pass the input to the function from a NumPy array, and store the result in a NumPy array:
def pyfunc(np.ndarray[np.int32_t, ndim=1] in_array):
n = len(in_array)
out_array = np.zeros((512,), dtype = np.int32)
mymodule.c_func(<int *> in_array.data, n, <int *> out_array.data)
return out_array
But I get
"Python objects cannot be cast to pointers of primitive types" error for the output assignment. How do I accomplish this?
(If I require that the Python caller allocates the proper output array, then I can do
def pyfunc(np.ndarray[np.int32_t, ndim=1] in_array, np.ndarray[np.int32_t, ndim=1] out_array):
n = len(in_array)
mymodule.cfunc(<int *> in_array.data, n, <int*> out_array.data)
But can I do this in a way that the caller doesn't have to pre-allocate the appropriately sized output array?

You should add cdef np.ndarray before the out_array assignement:
def pyfunc(np.ndarray[np.int32_t, ndim=1] in_array):
cdef np.ndarray out_array = np.zeros((512,), dtype = np.int32)
n = len(in_array)
mymodule.c_func(<int *> in_array.data, n, <int *> out_array.data)
return out_array

Here is an example how to manipulate NumPy arrays using code written in C/C++ through ctypes.
I wrote a small function in C, taking the square of numbers from a first array and writing the result to a second array. The number of elements is given by a third parameter. This code is compiled as shared object.
squares.c compiled to squares.so:
void square(double* pin, double* pout, int n) {
for (int i=0; i<n; ++i) {
pout[i] = pin[i] * pin[i];
}
}
In python, you just load the library using ctypes and call the function. The array pointers are obtained from the NumPy ctypes interface.
import numpy as np
import ctypes
n = 5
a = np.arange(n, dtype=np.double)
b = np.zeros(n, dtype=np.double)
square = ctypes.cdll.LoadLibrary("./square.so")
aptr = a.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
bptr = b.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
square.square(aptr, bptr, n)
print b
This will work for any c-library, you just have to know which argument types to pass, possibly rebuilding c-structs in python using ctypes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

AES-NI intrinsics in Cython? - python

Is there a way to use AES-NI instructions within Cython code? Closest I could find is how someone accessed SIMD instructions: https://groups.google.com/forum/#!msg/cython-users/nTnyI7A6sMc/a6_GnOOsLuQJ AES-NI in Python thread was not answered: Python support for AES-NI

Related

Storing data using PyArray_NewFromDescr

Passing a numpy array to C++

Returning an array from a Cdef function

How to create empty char arrays in Cython without loops

Wrapping C function in Cython and NumPy

Categories

Resources