I am learning how to embed Rust functions in Python, and everything works fine if my inputs are ints, but not list.
If my lib.rs file is:
#[no_mangle]
pub extern fn my_func(x: i32, y: i32) -> i32 {
return x + y;
}
I can use this as follows:
In [1]: from ctypes import cdll
In [2]: lib = cdll.LoadLibrary("/home/user/RustStuff/embed/target/release/libembed.so")
In [3]: lib.my_func(5,6)
Out[3]: 11
However if I change my lib.rs to:
#[no_mangle]
pub extern fn my_func(my_vec: Vec<i32>) -> i32 {
let mut my_sum = 0;
for i in my_vec {
my_sum += i;
}
return my_sum;
}
I can no longer use it in Python (this compiled fine):
In [1]: from ctypes import cdll
In [2]: lib = cdll.LoadLibrary("/home/user/RustStuff/embed/target/release/libembed.so")
In [3]: lib.my_func([2,3,4])
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-3-454ffc5ba9dd> in <module>()
----> 1 lib.my_func([2,3,4])
ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
The reason, I though this could work is that Python's list and Rust's Vec are the both dynamic arrays, but apparently I am missing something here...
Why does my attempt not work? What should I do to fix it?
Don't do this:
#[no_mangle]
pub extern fn my_func(my_vec: Vec<i32>) -> i32 { ... }
You basically never want to accept or return an arbitrary Rust object in an extern function, only ones that are Repr. Instead, you should accept something that is representable by C. As 6502 says, the best idea for this particular case would be to accept a pointer and a length.
Rust's Vec is conceptually a pointer to data, a count, and a capacity. You are able to modify a Vec by adding or removing objects, which can cause reallocation to happen. This is doubly bad because it is likely that Python and Rust use different allocators that are not compatible with each other. Segfaults lie this way! You really want a slice.
Instead, do something like this on the Rust side:
extern crate libc;
use libc::{size_t,int32_t};
use std::slice;
#[no_mangle]
pub extern fn my_func(data: *const int32_t, length: size_t) -> int32_t {
let nums = unsafe { slice::from_raw_parts(data, length as usize) };
nums.iter().fold(0, |acc, i| acc + i)
}
Namely, you are using the guaranteed-to-match C types, and then converting the pointer and length to something Rust knows how to deal with.
I'm no Pythonista, but this cobbled-together code (with help from How do I convert a Python list into a C array by using ctypes?) seems to work with the Rust I have above:
import ctypes
lib = ctypes.cdll.LoadLibrary("./target/debug/libpython.dylib")
lib.my_func.argtypes = (ctypes.POINTER(ctypes.c_int32), ctypes.c_size_t)
list_to_sum = [1,2,3,4]
c_array = (ctypes.c_int32 * len(list_to_sum))(*list_to_sum)
print lib.my_func(c_array, len(list_to_sum))
Of course, you probably want to wrap that to make it nicer for the caller of your code.
ctypes is about C bindings and in C there's no such a thing as a dynamic array.
The closest object you can pass to a C function is a pointer to integer that however is not a dynamic array because
It doesn't carry the size information
You cannot grow or shrink the area, just access existing elements
A simple alternative to passing pointers (and being very careful about not getting past the size) you could use instead is a function-based API.
For example:
getNumberOfThings() -> number
getThing(index) -> thing
but the Python code would then become like
def func():
n = getNumberOfThings()
return [getThing(i) for i in range(n)]
The counterpart (passing a variable number of elements) would be
def func2(L):
setNumberOfThings(len(L))
for i, x in enumerate(L):
setThing(i, x)
Related
I am new to both C and ctypes, but I cannot seem to find an answer on how to do this, particularly with a numpy array.
C Code
// Import/Export Macros
#define DllImport __declspec( dllimport )
#define DllExport __declspec( dllexport )
// Test function for receiving and transmitting arrays
extern "C"
DllExport void c_fun(char **string_array)
{
string_array[0] = "foo";
string_array[1] = "bar";
string_array[2] = "baz";
}
Python Code
import numpy as np
import ctypes
# Load the DLL library...
# Define function argtypes
lib.c_fun.argtypes = [np.ctypeslib.ndpointer(ctypes.c_char, ndim = 2, flags="C_CONTIGUOUS")]
# Initialize, call, and print
string_array = np.empty((3,10),dtype=ctypes.c_char)
lib.c_fun(string_array)
print(string_array)
I am sure there is some encoding/decoding that needs to happen as well, but I am not sure how/which. Thanks!
Addressing the C code part of the question only...
As noted in comments, C does not allow assignments in this way if the three variables shown are defined as char arrays:
string_array[0] = "foo";
string_array[1] = "bar";
string_array[2] = "baz";
Use the following:
strcpy(string_array[0], "foo");
strcpy(string_array[1], "bar");
strcpy(string_array[2], "baz");
And as long as the caller to this function is pre-allocating and freeing memory for the buffers, this part of the solution is now at least syntactically correct.
But if the strings do indeed need to be immutable to be compatible with Python, then in the caller function allocate memory to create char **string_array such that you can pass an array of 3 pointers as the argument. For example:
char **string_array = malloc(3*sizeof(*string_array));//creates array of 3 pointers.
Then call it as:
c_fun(string_array);
This allows use of the DLL call just as shown in in your original post.:
DllExport void c_fun(char **string_array)
{
//array of pointers being assigned to addresses of 3 string literals
string_array[0] = "foo";//these will now be immutable strings
string_array[1] = "bar";
string_array[2] = "baz";
}
An apparent calling convention mismatch exists where the position and contents of arguments are incorrect when loading a small function using Python's Ctypes module.
In the example I built up while trying to get something working, one positional argument gets another's value while the other gets garbage.
The Ctypes docs state that cdll.LoadLibrary expects the cdecl convention. Resulting standard boilerplate:
# Tell Rustc to output a dynamically linked library
crate-type = ["cdylib"]
// Specify clean symbol and cdecl calling convention
#[no_mangle]
pub extern "cdecl" fn boring_function(
n: *mut size_t,
in_data: *mut [c_ulong],
out_data: *mut [c_double],
garbage: *mut [c_double],
) -> c_int {
//...
Loading our library after build...
lib = ctypes.CDLL("nothing/lib/playtoys.so")
lib.boring_function.restype = ctypes.c_int
Load the result into Python and call it with some initialized data
data_len = 8
in_array_t = ctypes.c_ulong * data_len
out_array_t = ctypes.c_double * data_len
in_array = in_array_t(7, 7, 7, 7, 7, 8, 7, 7)
out_array = out_array_t(10000.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9)
val = ctypes.c_size_t(data_len)
in_array_p = ctypes.byref(in_array)
out_array_p = ctypes.byref(out_array)
n_p = ctypes.byref(val)
garbage = n_p
res = boring_function(n_p,
in_array_p,
# garbage cannot be observed in any callee arg
ctypes.cast(garbage, ctypes.POINTER(out_array_t)),
out_array_p)
Notice the garbage parameter. It is so-named because it winds up containing a garbage address. Note that its position is swapped with out_array_p in the Python call and the Rust declaration.
[src/hello.rs:29] n = 0x00007f56dbce5bc0
[src/hello.rs:30] in_data = 0x00007f56f81e3270
[src/hello.rs:31] out_data = 0x00007f56f81e3230
[src/hello.rs:32] garbage = 0x000000000000000a
in_data, out_data, and n print the correct values in this configuration. The positional swap between garbage and out_data makes this possible.
Other examples using more or less arguments reveal similar patterns of intermediate ordered variables containing odd values that resemble addresses earlier in the program or unrelated garbage.
Either I'm missing something in how I set up the calling convention or some special magic in argtypes must be missing. So far I had no luck with changing the declared calling conventions or explicit argtypes. Are there any other knobs I should try turning?
in_data: *mut [c_ulong],
A slice is not a FFI-safe data type. Namely, Rust's slices use fat pointers, which take up two pointer-sized values.
You need to pass the data pointer and length as two separate arguments.
See also:
Why can comparing two seemingly equal pointers with == return false?
Rust functions with slice arguments in The Rust FFI Omnibus
The complete example from the Omnibus:
extern crate libc;
use libc::{uint32_t, size_t};
use std::slice;
#[no_mangle]
pub extern fn sum_of_even(n: *const uint32_t, len: size_t) -> uint32_t {
let numbers = unsafe {
assert!(!n.is_null());
slice::from_raw_parts(n, len as usize)
};
let sum =
numbers.iter()
.filter(|&v| v % 2 == 0)
.fold(0, |acc, v| acc + v);
sum as uint32_t
}
#!/usr/bin/env python3
import sys, ctypes
from ctypes import POINTER, c_uint32, c_size_t
prefix = {'win32': ''}.get(sys.platform, 'lib')
extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')
lib = ctypes.cdll.LoadLibrary(prefix + "slice_arguments" + extension)
lib.sum_of_even.argtypes = (POINTER(c_uint32), c_size_t)
lib.sum_of_even.restype = ctypes.c_uint32
def sum_of_even(numbers):
buf_type = c_uint32 * len(numbers)
buf = buf_type(*numbers)
return lib.sum_of_even(buf, len(numbers))
print(sum_of_even([1,2,3,4,5,6]))
Disclaimer: I am the primary author of the Omnibus
Following these answers, I've currently defined a Rust 1.0 function as follows, in order to be callable from Python using ctypes:
use std::vec;
extern crate libc;
use libc::{c_int, c_float, size_t};
use std::slice;
#[no_mangle]
pub extern fn convert_vec(input_lon: *const c_float,
lon_size: size_t,
input_lat: *const c_float,
lat_size: size_t) -> Vec<(i32, i32)> {
let input_lon = unsafe {
slice::from_raw_parts(input_lon, lon_size as usize)
};
let input_lat = unsafe {
slice::from_raw_parts(input_lat, lat_size as usize)
};
let combined: Vec<(i32, i32)> = input_lon
.iter()
.zip(input_lat.iter())
.map(|each| convert(*each.0, *each.1))
.collect();
return combined
}
And I'm setting up the Python part like so:
from ctypes import *
class Int32_2(Structure):
_fields_ = [("array", c_int32 * 2)]
rust_bng_vec = lib.convert_vec_py
rust_bng_vec.argtypes = [POINTER(c_float), c_size_t,
POINTER(c_float), c_size_t]
rust_bng_vec.restype = POINTER(Int32_2)
This seems to be OK, but I'm:
Not sure how to transform combined (a Vec<(i32, i32)>) to a C-compatible structure, so it can be returned to my Python script.
Not sure whether I should be returning a reference (return &combined?) and how I would have to annotate the function with the appropriate lifetime specifier if I did
The most important thing to note is that there is no such thing as a tuple in C. C is the lingua franca of library interoperability, and you will be required to restrict yourself to abilities of this language. It doesn't matter if you are talking between Rust and another high-level language; you have to speak C.
There may not be tuples in C, but there are structs. A two-element tuple is just a struct with two members!
Let's start with the C code that we would write:
#include <stdio.h>
#include <stdint.h>
typedef struct {
uint32_t a;
uint32_t b;
} tuple_t;
typedef struct {
void *data;
size_t len;
} array_t;
extern array_t convert_vec(array_t lat, array_t lon);
int main() {
uint32_t lats[3] = {0, 1, 2};
uint32_t lons[3] = {9, 8, 7};
array_t lat = { .data = lats, .len = 3 };
array_t lon = { .data = lons, .len = 3 };
array_t fixed = convert_vec(lat, lon);
tuple_t *real = fixed.data;
for (int i = 0; i < fixed.len; i++) {
printf("%d, %d\n", real[i].a, real[i].b);
}
return 0;
}
We've defined two structs — one to represent our tuple, and another to represent an array, as we will be passing those back and forth a bit.
We will follow this up by defining the exact same structs in Rust and define them to have the exact same members (types, ordering, names). Importantly, we use #[repr(C)] to let the Rust compiler know to not do anything funky with reordering the data.
extern crate libc;
use std::slice;
use std::mem;
#[repr(C)]
pub struct Tuple {
a: libc::uint32_t,
b: libc::uint32_t,
}
#[repr(C)]
pub struct Array {
data: *const libc::c_void,
len: libc::size_t,
}
impl Array {
unsafe fn as_u32_slice(&self) -> &[u32] {
assert!(!self.data.is_null());
slice::from_raw_parts(self.data as *const u32, self.len as usize)
}
fn from_vec<T>(mut vec: Vec<T>) -> Array {
// Important to make length and capacity match
// A better solution is to track both length and capacity
vec.shrink_to_fit();
let array = Array { data: vec.as_ptr() as *const libc::c_void, len: vec.len() as libc::size_t };
// Whee! Leak the memory, and now the raw pointer (and
// eventually C) is the owner.
mem::forget(vec);
array
}
}
#[no_mangle]
pub extern fn convert_vec(lon: Array, lat: Array) -> Array {
let lon = unsafe { lon.as_u32_slice() };
let lat = unsafe { lat.as_u32_slice() };
let vec =
lat.iter().zip(lon.iter())
.map(|(&lat, &lon)| Tuple { a: lat, b: lon })
.collect();
Array::from_vec(vec)
}
We must never accept or return non-repr(C) types across the FFI boundary, so we pass across our Array. Note that there's a good amount of unsafe code, as we have to convert an unknown pointer to data (c_void) to a specific type. That's the price of being generic in C world.
Let's turn our eye to Python now. Basically, we just have to mimic what the C code did:
import ctypes
class FFITuple(ctypes.Structure):
_fields_ = [("a", ctypes.c_uint32),
("b", ctypes.c_uint32)]
class FFIArray(ctypes.Structure):
_fields_ = [("data", ctypes.c_void_p),
("len", ctypes.c_size_t)]
# Allow implicit conversions from a sequence of 32-bit unsigned
# integers.
#classmethod
def from_param(cls, seq):
return cls(seq)
# Wrap sequence of values. You can specify another type besides a
# 32-bit unsigned integer.
def __init__(self, seq, data_type = ctypes.c_uint32):
array_type = data_type * len(seq)
raw_seq = array_type(*seq)
self.data = ctypes.cast(raw_seq, ctypes.c_void_p)
self.len = len(seq)
# A conversion function that cleans up the result value to make it
# nicer to consume.
def void_array_to_tuple_list(array, _func, _args):
tuple_array = ctypes.cast(array.data, ctypes.POINTER(FFITuple))
return [tuple_array[i] for i in range(0, array.len)]
lib = ctypes.cdll.LoadLibrary("./target/debug/libtupleffi.dylib")
lib.convert_vec.argtypes = (FFIArray, FFIArray)
lib.convert_vec.restype = FFIArray
lib.convert_vec.errcheck = void_array_to_tuple_list
for tupl in lib.convert_vec([1,2,3], [9,8,7]):
print tupl.a, tupl.b
Forgive my rudimentary Python. I'm sure an experienced Pythonista could make this look a lot prettier! Thanks to #eryksun for some nice advice on how to make the consumer side of calling the method much nicer.
A word about ownership and memory leaks
In this example code, we've leaked the memory allocated by the Vec. Theoretically, the FFI code now owns the memory, but realistically, it can't do anything useful with it. To have a fully correct example, you'd need to add another method that would accept the pointer back from the callee, transform it back into a Vec, then allow Rust to drop the value. This is the only safe way, as Rust is almost guaranteed to use a different memory allocator than the one your FFI language is using.
Not sure whether I should be returning a reference and how I would have to annotate the function with the appropriate lifetime specifier if I did
No, you don't want to (read: can't) return a reference. If you could, then the ownership of the item would end with the function call, and the reference would point to nothing. This is why we need to do the two-step dance with mem::forget and returning a raw pointer.
I have a C function that must be callable from C and Python.
I'm having trouble figuring out how to pass a python list of c-type structs,
each of which contains several nested structs, to the c function.
A single one of these structs looks like this in python:
class STATION_MM_NODE(ctypes.Structure):
_fields_ = [
("signal", MM_STRUCT),
("noise", MM_STRUCT),
("signalWindowLen", ctypes.c_double),
("metadata", SAC_PZ)
]
And like this in C:
typedef struct stationMMnode {
struct mantleMagStruct *signal;
struct mantleMagStruct *noise;
double signalWindowLen;
SAC_PZ metadata;
} stationMMnode_t;
The c function that takes an array of stationMMnode structs is callable as:
double magnitudeCompute_Mw_Mm_Event(stationMMnode_t **stationMMarray, int numStations);
For instance, I can call it purely from C as in:
int testfunc() {
stationMMnode_t *node1 = malloc(sizeof(struct stationMMnode));
node1->signalWindowLen = 500;
stationMMnode_t *node2 = malloc(sizeof(struct stationMMnode));
node2->signalWindowLen = 100;
struct stationMMnode *nodes[2];
nodes[0] = node1;
nodes[1] = node2;
magnitudeCompute_Mw_Mm_Event(nodes, 2); // Works!
}
In python, I can create a list of nodes that looks similar to the c array of structs:
stationMMnodes = []
...
node = get_stationMMnode() # Returns a STATION_MM_NODE
node.signal = mm_signal
node.noise = mm_noise
node.metadata = sacPoleZero
stationMMnodes.append(node)
...
wrap_lib.magnitudeCompute_Mw_Mm_Event(stationMMnodes, numStations) # Does NOT work
where I've defined the argtypes as:
wrap_lib.magnitudeCompute_Mw_Mm_Event.argtypes =
[ctypes.POINTER(STATION_MM_NODE), ctypes.c_int ]
The model I'm using above (passing a ctype pointer to a c-style struct to a c function that takes a pointer to struct) seems to work fine when I am passing in a pointer to a single struct, however, for a pointer to an array of structs, it seems to break down. In addition, I am uncertain of what the python memory layout is for a list of structs versus an array of pointers to struct (as the C function is expecting).
Any help would be greatly appreciated!
Update: I found the following link very helpful:
python ctypes array of structs
I solved my problem by:
1. Declaring an array of pointers to my struct:
nodeArrayType = ctypes.POINTER(STATION_MM_NODE) * 1024
nodeArray = nodeArrayType()
nstn = 0
2. Writing a C function to join the member structs into a larger struct (=a node) and return a pointer to that struct - which is stored in nodeArray[].
nodeArray[nstn] = wrap_libmth.libmth.makeNode(node.signal, node.noise, node.metadata)
nstn += 1
3. Fixing the argtype of the C function that receives the pointer to struct array:
wrap_libmth.libmth.magnitudeCompute_Mw_Mm_Event.argtypes = [ctypes.POINTER(ctypes.POINTER(STATION_MM_NODE)), ctypes.c_int]
So ... I have it working, but like most thing with Python, I feel like I'm holding the tiger by the tail as I don't fully understand exactly why it works and what (better) alternatives would be (e.g., the C hack makeNode() to return a pointer to a STATION_MM_NODE struct is less than satisfactory - it would be better to generate this struct fully in python).
I have a self-made C library that I want to access using python. The problem is that the code consists essentially of two parts, an initialization to read in data from a number of files and a few calculations that need to be done only once. The other part is called in a loop and uses the data generated before repeatedly. To this function I want to pass parameters from python.
My idea was to write two C wrapper functions, "init" and "loop" - "init" reads the data and returns a void pointer to a structure that "loop" can use together with additional parameters that I can pass on from python. Something like
void *init() {
struct *mystruct ret = (mystruct *)malloc(sizeof(mystruct));
/* Fill ret with data */
return ret;
}
float loop(void *data, float par1, float par2) {
/* do stuff with data, par1, par2, return result */
}
I tried calling "init" from python as a c_void_p, but since "loop" changes some of the contents of "data" and ctypes' void pointers are immutable, this did not work.
Other solutions to similar problems I saw seem to require knowledge of how much memory "init" would use, and I do not know that.
Is there a way to pass data from one C function to another through python without telling python exactly what or how much it is? Or is there another way to solve my problem?
I tried (and failed) to write a minimum crashing example, and after some debugging it turned out there was a bug in my C code. Thanks to everyone who replied!
Hoping that this might help other people, here is a sort-of-minimal working version (still without separate 'free' - sorry):
pybug.c:
#include <stdio.h>
#include <stdlib.h>
typedef struct inner_struct_s {
int length;
float *array;
} inner_struct_t;
typedef struct mystruct_S {
int id;
float start;
float end;
inner_struct_t *inner;
} mystruct_t;
void init(void **data) {
int i;
mystruct_t *mystruct = (mystruct_t *)malloc(sizeof(mystruct_t));
inner_struct_t *inner = (inner_struct_t *)malloc(sizeof(inner_struct_t));
inner->length = 10;
inner->array = calloc(inner->length, sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = 2*i;
mystruct->id = 0;
mystruct->start = 0;
mystruct->end = inner->length;
mystruct->inner = inner;
*data = mystruct;
}
float loop(void *data, float par1, float par2, int newsize) {
mystruct_t *str = data;
inner_struct_t *inner = str->inner;
int i;
inner->length = newsize;
inner->array = realloc(inner->array, newsize * sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = par1 + i * par2;
return inner->array[inner->length-1];
}
compile as
cc -c -fPIC pybug.c
cc -shared -o libbug.so pybug.o
Run in python:
from ctypes import *
sl = CDLL('libbug.so')
# What arguments do functions take / return?
sl.init.argtype = c_void_p
sl.loop.restype = c_float
sl.loop.argtypes = [c_void_p, c_float, c_float, c_int]
# Init takes a pointer to a pointer
px = c_void_p()
sl.init(byref(px))
# Call the loop a couple of times
for i in range(10):
print sl.loop(px, i, 5, 10*i+5)
You should have a corresponding function to free the data buffer when the caller is done. Otherwise I don't see the issue. Just pass the pointer to loop that you get from init.
init.restype = c_void_p
loop.argtypes = [c_void_p, c_float, c_float]
loop.restype = c_float
I'm not sure what you mean by "ctypes' void pointers are immutable", unless you're talking about c_char_p and c_wchar_p. The issue there is if you pass a Python string as an argument it uses Python's private pointer to the string buffer. If a function can change the string, you should first copy it to a c_char or c_wchar array.
Here's a simple example showing the problem of passing a Python string (2.x byte string) as an argument to a function that modifies it. In this case it changes index 0 to '\x00':
>>> import os
>>> from ctypes import *
>>> open('tmp.c', 'w').write("void f(char *s) {s[0] = 0;}")
>>> os.system('gcc -shared -fPIC -o tmp.so tmp.c')
0
>>> tmp = CDLL('./tmp.so')
>>> tmp.f.argtypes = [c_void_p]
>>> tmp.f.restype = None
>>> tmp.f('a')
>>> 'a'
'\x00'
>>> s = 'abc'
>>> tmp.f(s)
>>> s
'\x00bc'
This is specific to passing Python strings as arguments. It isn't a problem to pass pointers to data structures that are intended to be mutable, either ctypes data objects such as a Structure, or pointers returned by libraries.
Is your C code in a DLL? If so can might consider creating a global pointer in there. init() will do any initialization required and set the pointer equal to newly allocated memory and loop() will operate on that memory. Also don't forget to free it up with a close() function