Cython Memoryview as return value - python

Consider this dummy Cython code:
#!python
#cython: boundscheck=False
#cython: wraparound=False
#cython: initializedcheck=False
#cython: cdivision=True
#cython: nonecheck=False
import numpy as np
# iterator function
cdef double[:] f(double[:] data):
data[0] *= 1.01
data[1] *= 1.02
return data
# looping function
cdef double[:] _call_me(int bignumber, double[:] data):
cdef int ii
for ii in range(bignumber):
data = f(data)
return data
# helper function to allow calls from Python
def call_me(bignumber):
cdef double[:] data = np.ones(2)
return _call_me(bignumber, data)
Now, if I do a cython -a on this, it shows the return statements in yellow. I'm doing something similar in a very performance-critical program, and according to profiling this is really slowing my code down. So, why does cython need python for these return statements? The annotated file gives a hint:
PyErr_SetString(PyExc_TypeError,"Memoryview return value is not initialized");
Amazingly, a google search for cython "Memoryview return value is not initialized" gives zero results.

The slow part isn't what you think it is. The slow part is (well... primarily)
data = f(data)
Not the f(data). The data =.
This assigns a struct, which is defined as so
typedef struct {
struct __pyx_memoryview_obj *memview;
char *data;
Py_ssize_t shape[8];
Py_ssize_t strides[8];
Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;
and the assignment mentioned does
__pyx_t_3 = __pyx_f_3cyt_f(__pyx_v_data);
where __pyx_t_3 is of that type. If this is done heavily in a loop as it is, it takes far longer to copy the structs than to do the trivial body of the function. I've done a timing in pure C and it gives similar numbers.
(Edit note: The assigning is actually primarily a problem because it also causes generation of structs and other copies to not be optimised out.)
However, the whole thing seems silly. The only reason to copy the struct is for if something has changed, but nothing has. The memory points at the same place, the data points at the same place and the shape, strides and offsets are the same.
The only way I see to avoid the struct copy is to not change any of what it references (aka. always return the memoryview given in). That's only possible in circumstances where returning is pointless anyway, like here. Or you can hack at the C, I guess, like I was. Just don't cry if you break something.
Also note that you can make your function nogil, so it can't have anything to do with harking back to Python.
EDIT
C's optimising compiler was throwing me slightly off. Basically, I removed some assigning and it removed loads of other things. Basically the slow path is this:
#include<stdio.h>
struct __pyx_memoryview_obj;
typedef struct {
struct __pyx_memoryview_obj *memview;
char *data;
ssize_t shape[8];
ssize_t strides[8];
ssize_t suboffsets[8];
} __Pyx_memviewslice;
static __Pyx_memviewslice __pyx_f_3cyt_f(__Pyx_memviewslice __pyx_v_data) {
__Pyx_memviewslice __pyx_r = { 0, 0, { 0 }, { 0 }, { 0 } };
__pyx_r = __pyx_v_data;
return __pyx_r;
}
main() {
int i;
__Pyx_memviewslice __pyx_v_data = {0, 0, { 0 }, { 0 }, { 0 }};
for (i=0; i<10000000; i++) {
__pyx_v_data = __pyx_f_3cyt_f(__pyx_v_data);
}
}
(compile with no optimisations). I'm no C programmer, so apologies if what I've done sucks in some way not directly linked to the fact I've copied computer-generated code.
I know this doesn't help, but I did my best, OK?

Related

Convert Rust vector of tuples to a C compatible structure

Following these answers, I've currently defined a Rust 1.0 function as follows, in order to be callable from Python using ctypes:
use std::vec;
extern crate libc;
use libc::{c_int, c_float, size_t};
use std::slice;
#[no_mangle]
pub extern fn convert_vec(input_lon: *const c_float,
lon_size: size_t,
input_lat: *const c_float,
lat_size: size_t) -> Vec<(i32, i32)> {
let input_lon = unsafe {
slice::from_raw_parts(input_lon, lon_size as usize)
};
let input_lat = unsafe {
slice::from_raw_parts(input_lat, lat_size as usize)
};
let combined: Vec<(i32, i32)> = input_lon
.iter()
.zip(input_lat.iter())
.map(|each| convert(*each.0, *each.1))
.collect();
return combined
}
And I'm setting up the Python part like so:
from ctypes import *
class Int32_2(Structure):
_fields_ = [("array", c_int32 * 2)]
rust_bng_vec = lib.convert_vec_py
rust_bng_vec.argtypes = [POINTER(c_float), c_size_t,
POINTER(c_float), c_size_t]
rust_bng_vec.restype = POINTER(Int32_2)
This seems to be OK, but I'm:
Not sure how to transform combined (a Vec<(i32, i32)>) to a C-compatible structure, so it can be returned to my Python script.
Not sure whether I should be returning a reference (return &combined?) and how I would have to annotate the function with the appropriate lifetime specifier if I did
The most important thing to note is that there is no such thing as a tuple in C. C is the lingua franca of library interoperability, and you will be required to restrict yourself to abilities of this language. It doesn't matter if you are talking between Rust and another high-level language; you have to speak C.
There may not be tuples in C, but there are structs. A two-element tuple is just a struct with two members!
Let's start with the C code that we would write:
#include <stdio.h>
#include <stdint.h>
typedef struct {
uint32_t a;
uint32_t b;
} tuple_t;
typedef struct {
void *data;
size_t len;
} array_t;
extern array_t convert_vec(array_t lat, array_t lon);
int main() {
uint32_t lats[3] = {0, 1, 2};
uint32_t lons[3] = {9, 8, 7};
array_t lat = { .data = lats, .len = 3 };
array_t lon = { .data = lons, .len = 3 };
array_t fixed = convert_vec(lat, lon);
tuple_t *real = fixed.data;
for (int i = 0; i < fixed.len; i++) {
printf("%d, %d\n", real[i].a, real[i].b);
}
return 0;
}
We've defined two structs — one to represent our tuple, and another to represent an array, as we will be passing those back and forth a bit.
We will follow this up by defining the exact same structs in Rust and define them to have the exact same members (types, ordering, names). Importantly, we use #[repr(C)] to let the Rust compiler know to not do anything funky with reordering the data.
extern crate libc;
use std::slice;
use std::mem;
#[repr(C)]
pub struct Tuple {
a: libc::uint32_t,
b: libc::uint32_t,
}
#[repr(C)]
pub struct Array {
data: *const libc::c_void,
len: libc::size_t,
}
impl Array {
unsafe fn as_u32_slice(&self) -> &[u32] {
assert!(!self.data.is_null());
slice::from_raw_parts(self.data as *const u32, self.len as usize)
}
fn from_vec<T>(mut vec: Vec<T>) -> Array {
// Important to make length and capacity match
// A better solution is to track both length and capacity
vec.shrink_to_fit();
let array = Array { data: vec.as_ptr() as *const libc::c_void, len: vec.len() as libc::size_t };
// Whee! Leak the memory, and now the raw pointer (and
// eventually C) is the owner.
mem::forget(vec);
array
}
}
#[no_mangle]
pub extern fn convert_vec(lon: Array, lat: Array) -> Array {
let lon = unsafe { lon.as_u32_slice() };
let lat = unsafe { lat.as_u32_slice() };
let vec =
lat.iter().zip(lon.iter())
.map(|(&lat, &lon)| Tuple { a: lat, b: lon })
.collect();
Array::from_vec(vec)
}
We must never accept or return non-repr(C) types across the FFI boundary, so we pass across our Array. Note that there's a good amount of unsafe code, as we have to convert an unknown pointer to data (c_void) to a specific type. That's the price of being generic in C world.
Let's turn our eye to Python now. Basically, we just have to mimic what the C code did:
import ctypes
class FFITuple(ctypes.Structure):
_fields_ = [("a", ctypes.c_uint32),
("b", ctypes.c_uint32)]
class FFIArray(ctypes.Structure):
_fields_ = [("data", ctypes.c_void_p),
("len", ctypes.c_size_t)]
# Allow implicit conversions from a sequence of 32-bit unsigned
# integers.
#classmethod
def from_param(cls, seq):
return cls(seq)
# Wrap sequence of values. You can specify another type besides a
# 32-bit unsigned integer.
def __init__(self, seq, data_type = ctypes.c_uint32):
array_type = data_type * len(seq)
raw_seq = array_type(*seq)
self.data = ctypes.cast(raw_seq, ctypes.c_void_p)
self.len = len(seq)
# A conversion function that cleans up the result value to make it
# nicer to consume.
def void_array_to_tuple_list(array, _func, _args):
tuple_array = ctypes.cast(array.data, ctypes.POINTER(FFITuple))
return [tuple_array[i] for i in range(0, array.len)]
lib = ctypes.cdll.LoadLibrary("./target/debug/libtupleffi.dylib")
lib.convert_vec.argtypes = (FFIArray, FFIArray)
lib.convert_vec.restype = FFIArray
lib.convert_vec.errcheck = void_array_to_tuple_list
for tupl in lib.convert_vec([1,2,3], [9,8,7]):
print tupl.a, tupl.b
Forgive my rudimentary Python. I'm sure an experienced Pythonista could make this look a lot prettier! Thanks to #eryksun for some nice advice on how to make the consumer side of calling the method much nicer.
A word about ownership and memory leaks
In this example code, we've leaked the memory allocated by the Vec. Theoretically, the FFI code now owns the memory, but realistically, it can't do anything useful with it. To have a fully correct example, you'd need to add another method that would accept the pointer back from the callee, transform it back into a Vec, then allow Rust to drop the value. This is the only safe way, as Rust is almost guaranteed to use a different memory allocator than the one your FFI language is using.
Not sure whether I should be returning a reference and how I would have to annotate the function with the appropriate lifetime specifier if I did
No, you don't want to (read: can't) return a reference. If you could, then the ownership of the item would end with the function call, and the reference would point to nothing. This is why we need to do the two-step dance with mem::forget and returning a raw pointer.

Pass Python list to embedded Rust function

I am learning how to embed Rust functions in Python, and everything works fine if my inputs are ints, but not list.
If my lib.rs file is:
#[no_mangle]
pub extern fn my_func(x: i32, y: i32) -> i32 {
return x + y;
}
I can use this as follows:
In [1]: from ctypes import cdll
In [2]: lib = cdll.LoadLibrary("/home/user/RustStuff/embed/target/release/libembed.so")
In [3]: lib.my_func(5,6)
Out[3]: 11
However if I change my lib.rs to:
#[no_mangle]
pub extern fn my_func(my_vec: Vec<i32>) -> i32 {
let mut my_sum = 0;
for i in my_vec {
my_sum += i;
}
return my_sum;
}
I can no longer use it in Python (this compiled fine):
In [1]: from ctypes import cdll
In [2]: lib = cdll.LoadLibrary("/home/user/RustStuff/embed/target/release/libembed.so")
In [3]: lib.my_func([2,3,4])
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-3-454ffc5ba9dd> in <module>()
----> 1 lib.my_func([2,3,4])
ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
The reason, I though this could work is that Python's list and Rust's Vec are the both dynamic arrays, but apparently I am missing something here...
Why does my attempt not work? What should I do to fix it?
Don't do this:
#[no_mangle]
pub extern fn my_func(my_vec: Vec<i32>) -> i32 { ... }
You basically never want to accept or return an arbitrary Rust object in an extern function, only ones that are Repr. Instead, you should accept something that is representable by C. As 6502 says, the best idea for this particular case would be to accept a pointer and a length.
Rust's Vec is conceptually a pointer to data, a count, and a capacity. You are able to modify a Vec by adding or removing objects, which can cause reallocation to happen. This is doubly bad because it is likely that Python and Rust use different allocators that are not compatible with each other. Segfaults lie this way! You really want a slice.
Instead, do something like this on the Rust side:
extern crate libc;
use libc::{size_t,int32_t};
use std::slice;
#[no_mangle]
pub extern fn my_func(data: *const int32_t, length: size_t) -> int32_t {
let nums = unsafe { slice::from_raw_parts(data, length as usize) };
nums.iter().fold(0, |acc, i| acc + i)
}
Namely, you are using the guaranteed-to-match C types, and then converting the pointer and length to something Rust knows how to deal with.
I'm no Pythonista, but this cobbled-together code (with help from How do I convert a Python list into a C array by using ctypes?) seems to work with the Rust I have above:
import ctypes
lib = ctypes.cdll.LoadLibrary("./target/debug/libpython.dylib")
lib.my_func.argtypes = (ctypes.POINTER(ctypes.c_int32), ctypes.c_size_t)
list_to_sum = [1,2,3,4]
c_array = (ctypes.c_int32 * len(list_to_sum))(*list_to_sum)
print lib.my_func(c_array, len(list_to_sum))
Of course, you probably want to wrap that to make it nicer for the caller of your code.
ctypes is about C bindings and in C there's no such a thing as a dynamic array.
The closest object you can pass to a C function is a pointer to integer that however is not a dynamic array because
It doesn't carry the size information
You cannot grow or shrink the area, just access existing elements
A simple alternative to passing pointers (and being very careful about not getting past the size) you could use instead is a function-based API.
For example:
getNumberOfThings() -> number
getThing(index) -> thing
but the Python code would then become like
def func():
n = getNumberOfThings()
return [getThing(i) for i in range(n)]
The counterpart (passing a variable number of elements) would be
def func2(L):
setNumberOfThings(len(L))
for i, x in enumerate(L):
setThing(i, x)

why does SWIG allow int pointer argument in place of void*, and how to do same for arrays?

I have this class with an int and int[2] members, and I have a getMember accessor method that takes an index of a member and void* and fills the (pre-allocated) space after void* with the member:
foobar.h:
class Foobar {
public:
void getMember(int index, void* data) {
switch (index) {
case 0:
*(int *) data = member0;
break;
case 1:
*(int*) data = member1[0];
*((int*) data + 1) = member1[1];
break;
}
}
int member0;
int member1[2];
};
I can then write a SWIG interface to this:
%{
#include "foobar.h"
%}
%include "foobar.h"
Now, if I also add
%include <cpointer.i>
%pointer_functions(int, intp)
I can then do the following in Python:
>>>p = new_intp()
>>>f = Foobar()
>>>f.member0 = 2
>>>f.getMember(0, p)
>>>intp_value(p)
2
Question 1. I have a void* declared and I am passing intp and yet the whole thing works. Why??
Question 2. Assuming you explain to me how the above works, then how do I accomplish the same for member1 ?? That is, I added the pointer_functions code to make the above work (magically). Then what similar thing I need to add and what pointer p1 to pass so that
>>>f.getMember(1, p1)
works?
Well, I still can't answer Question 1, nevertheless I found the new "magic" way to answer Question 2:
%include <carrays.i>
%array_functions(int, inta);
Now Question 1 is unchanged, and Question 2 becomes, why does it work?
I'd have to check the code generated by SWIG, but my guess is that for void* function parameter declaration, SWIG can't do any type checking so any type given to the function will be accepted and passed on to the function. Then in the code, you cast the void* to an int* so as long as the type really is an int*, all is good. If you passed something not an int* then you would get undefined behavior such as crash since you would be overwriting part of an object or, worse, a short or such.
So this should tell you that what you are doing is rather dangerous. I don't see why you can declare your function to take an int* since the pointer can refer to one item or an array:
void getMember(int index, int* data) {
switch (index) {
case 0:
*data = member0;
break;
case 1:
*data = member1[0];
*(data + 1) = member1[1];
break;
}
}
Then SWIG will generate code that will check that the passed-in type is int* and will throw otherwise. I don't know if SWIG will know that your inta type is compatible with intp type. If not, you could %extend Foo with an adapter in your .i file:
%extend Foobar %{
void getMember(int index, int[2] data) {
getMember(index, data); // C++ knows this is ok
}
}

Python ctype - How to pass data between C functions

I have a self-made C library that I want to access using python. The problem is that the code consists essentially of two parts, an initialization to read in data from a number of files and a few calculations that need to be done only once. The other part is called in a loop and uses the data generated before repeatedly. To this function I want to pass parameters from python.
My idea was to write two C wrapper functions, "init" and "loop" - "init" reads the data and returns a void pointer to a structure that "loop" can use together with additional parameters that I can pass on from python. Something like
void *init() {
struct *mystruct ret = (mystruct *)malloc(sizeof(mystruct));
/* Fill ret with data */
return ret;
}
float loop(void *data, float par1, float par2) {
/* do stuff with data, par1, par2, return result */
}
I tried calling "init" from python as a c_void_p, but since "loop" changes some of the contents of "data" and ctypes' void pointers are immutable, this did not work.
Other solutions to similar problems I saw seem to require knowledge of how much memory "init" would use, and I do not know that.
Is there a way to pass data from one C function to another through python without telling python exactly what or how much it is? Or is there another way to solve my problem?
I tried (and failed) to write a minimum crashing example, and after some debugging it turned out there was a bug in my C code. Thanks to everyone who replied!
Hoping that this might help other people, here is a sort-of-minimal working version (still without separate 'free' - sorry):
pybug.c:
#include <stdio.h>
#include <stdlib.h>
typedef struct inner_struct_s {
int length;
float *array;
} inner_struct_t;
typedef struct mystruct_S {
int id;
float start;
float end;
inner_struct_t *inner;
} mystruct_t;
void init(void **data) {
int i;
mystruct_t *mystruct = (mystruct_t *)malloc(sizeof(mystruct_t));
inner_struct_t *inner = (inner_struct_t *)malloc(sizeof(inner_struct_t));
inner->length = 10;
inner->array = calloc(inner->length, sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = 2*i;
mystruct->id = 0;
mystruct->start = 0;
mystruct->end = inner->length;
mystruct->inner = inner;
*data = mystruct;
}
float loop(void *data, float par1, float par2, int newsize) {
mystruct_t *str = data;
inner_struct_t *inner = str->inner;
int i;
inner->length = newsize;
inner->array = realloc(inner->array, newsize * sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = par1 + i * par2;
return inner->array[inner->length-1];
}
compile as
cc -c -fPIC pybug.c
cc -shared -o libbug.so pybug.o
Run in python:
from ctypes import *
sl = CDLL('libbug.so')
# What arguments do functions take / return?
sl.init.argtype = c_void_p
sl.loop.restype = c_float
sl.loop.argtypes = [c_void_p, c_float, c_float, c_int]
# Init takes a pointer to a pointer
px = c_void_p()
sl.init(byref(px))
# Call the loop a couple of times
for i in range(10):
print sl.loop(px, i, 5, 10*i+5)
You should have a corresponding function to free the data buffer when the caller is done. Otherwise I don't see the issue. Just pass the pointer to loop that you get from init.
init.restype = c_void_p
loop.argtypes = [c_void_p, c_float, c_float]
loop.restype = c_float
I'm not sure what you mean by "ctypes' void pointers are immutable", unless you're talking about c_char_p and c_wchar_p. The issue there is if you pass a Python string as an argument it uses Python's private pointer to the string buffer. If a function can change the string, you should first copy it to a c_char or c_wchar array.
Here's a simple example showing the problem of passing a Python string (2.x byte string) as an argument to a function that modifies it. In this case it changes index 0 to '\x00':
>>> import os
>>> from ctypes import *
>>> open('tmp.c', 'w').write("void f(char *s) {s[0] = 0;}")
>>> os.system('gcc -shared -fPIC -o tmp.so tmp.c')
0
>>> tmp = CDLL('./tmp.so')
>>> tmp.f.argtypes = [c_void_p]
>>> tmp.f.restype = None
>>> tmp.f('a')
>>> 'a'
'\x00'
>>> s = 'abc'
>>> tmp.f(s)
>>> s
'\x00bc'
This is specific to passing Python strings as arguments. It isn't a problem to pass pointers to data structures that are intended to be mutable, either ctypes data objects such as a Structure, or pointers returned by libraries.
Is your C code in a DLL? If so can might consider creating a global pointer in there. init() will do any initialization required and set the pointer equal to newly allocated memory and loop() will operate on that memory. Also don't forget to free it up with a close() function

Is there any way to use pythonappend with SWIG's new builtin feature?

I have a little project that works beautifully with SWIG. In particular, some of my functions return std::vectors, which get translated to tuples in Python. Now, I do a lot of numerics, so I just have SWIG convert these to numpy arrays after they're returned from the c++ code. To do this, I use something like the following in SWIG.
%feature("pythonappend") My::Cool::Namespace::Data() const %{ if isinstance(val, tuple) : val = numpy.array(val) %}
(Actually, there are several functions named Data, some of which return floats, which is why I check that val is actually a tuple.) This works just beautifully.
But, I'd also like to use the -builtin flag that's now available. Calls to these Data functions are rare and mostly interactive, so their slowness is not a problem, but there are other slow loops that speed up significantly with the builtin option.
The problem is that when I use that flag, the pythonappend feature is silently ignored. Now, Data just returns a tuple again. Is there any way I could still return numpy arrays? I tried using typemaps, but it turned into a giant mess.
Edit:
Borealid has answered the question very nicely. Just for completeness, I include a couple related but subtly different typemaps that I need because I return by const reference and I use vectors of vectors (don't start!). These are different enough that I wouldn't want anyone else stumbling around trying to figure out the minor differences.
%typemap(out) std::vector<int>& {
npy_intp result_size = $1->size();
npy_intp dims[1] = { result_size };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(1, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) { dat[i] = (*$1)[i]; }
$result = PyArray_Return(npy_arr);
}
%typemap(out) std::vector<std::vector<int> >& {
npy_intp result_size = $1->size();
npy_intp result_size2 = (result_size>0 ? (*$1)[0].size() : 0);
npy_intp dims[2] = { result_size, result_size2 };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(2, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) { for (size_t j = 0; j < result_size2; ++j) { dat[i*result_size2+j] = (*$1)[i][j]; } }
$result = PyArray_Return(npy_arr);
}
Edit 2:
Though not quite what I was looking for, similar problems may also be solved using #MONK's approach (explained here).
I agree with you that using typemap gets a little messy, but it is the right way to accomplish this task. You are also right that the SWIG documentation does not directly say that %pythonappend is incompatible with -builtin, but it is strongly implied: %pythonappend adds to the Python proxy class, and the Python proxy class does not exist at all in conjunction with the -builtin flag.
Before, what you were doing was having SWIG convert the C++ std::vector objects into Python tuples, and then passing those tuples back down to numpy - where they were converted again.
What you really want to do is convert them once, at the C level.
Here's some code which will turn all std::vector<int> objects into NumPy integer arrays:
%{
#include "numpy/arrayobject.h"
%}
%init %{
import_array();
%}
%typemap(out) std::vector<int> {
npy_intp result_size = $1.size();
npy_intp dims[1] = { result_size };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(1, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) {
dat[i] = $1[i];
}
$result = PyArray_Return(npy_arr);
}
This uses the C-level numpy functions to construct and return an array. In order, it:
Ensures NumPy's arrayobject.h file is included in the C++ output file
Causes import_array to be called when the Python module is loaded (otherwise, all NumPy methods will segfault)
Maps any returns of std::vector<int> into NumPy arrays with a typemap
This code should be placed before you %import the headers which contain the functions returning std::vector<int>. Other than that restriction, it's entirely self-contained, so it shouldn't add too much subjective "mess" to your codebase.
If you need other vector types, you can just change the NPY_INT and all the int* and int bits, otherwise duplicating the function above.

Categories

Resources