Dereference FFI pointer in Python to get underlying array

Dereference FFI pointer in Python to get underlying array - python

I have a C FFI written in Rust, called src/lib.rs that looks like the following:
// compile with $ cargo build
extern crate libc;
use self::libc::{size_t, int32_t};
use std::cmp::min;
use std::slice;
#[no_mangle]
pub extern "C" fn rle_new(values_data: *const int32_t, values_length: size_t) -> *mut Rle {
let values = unsafe { slice::from_raw_parts(values_data, values_length as usize).to_vec() };
return Box::into_raw(Box::new(Rle::new(values)));
}
#[no_mangle]
pub extern "C" fn rle_free(ptr: *mut Rle) {
if ptr.is_null() {
return;
}
unsafe {
Box::from_raw(ptr);
}
}
#[no_mangle]
pub extern "C" fn rle_values_size(rle: *mut Rle) -> int32_t {
unsafe { (*rle).values.len() as i32 }
}
#[no_mangle]
pub extern "C" fn rle_values(rle: *mut Rle) -> *mut int32_t {
unsafe { &mut (*rle).values[0] }
}
#[derive(Debug, PartialEq)]
pub struct Rle {
pub values: Vec<i32>,
}
impl Rle {
pub fn new(values: Vec<i32>) -> Self {
return Rle { values: values };
}
}
This is my Cargo.toml in the project base folder:
[package]
name = "minimal_example"
version = "0.1.0"
authors = ["Dumbass"]
[dependencies]
libc = "0.2.16"
[lib]
crate-type = ["dylib"] # you might need a different type on linux/windows ?
This is the Python code calling Rust, also put in the base folder:
import os
import sys, ctypes
from ctypes import c_char_p, c_uint32, Structure, POINTER, c_int32, c_size_t, pointer
class RleS(Structure):
pass
prefix = {'win32': ''}.get(sys.platform, 'lib')
extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')
libpath = os.environ.get("LD_LIBRARY_PATH", "target/debug") + "/"
libpath = libpath + prefix + "minimal_example" + extension
try:
lib = ctypes.cdll.LoadLibrary(libpath)
except OSError:
print("Library not found at " + libpath)
sys.exit()
lib.rle_new.restype = POINTER(RleS)
lib.rle_free.argtypes = (POINTER(RleS), )
lib.rle_values.argtypes = (POINTER(RleS), )
lib.rle_values.restypes = POINTER(c_int32)
lib.rle_values_size.argtypes = (POINTER(RleS), )
lib.rle_values_size.restypes = c_int32
class Rle:
def __init__(self, values):
values_length = len(values)
values_array = (c_int32 * len(values))(*values)
self.obj = lib.rle_new(values_array, c_size_t(values_length))
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
lib.rle_free(self.obj)
def __str__(self):
values_size = lib.rle_values_size(self.obj)
print(values_size, "values_size") # prints correct value
values_pointer = lib.rle_values(self.obj)
print("values_pointer:", values_pointer)
ar = ctypes.cast(values_pointer, ctypes.POINTER(ctypes.c_int32)).contents
print(ar) # segfaults!
rle = Rle([1, 1, 2] * 10)
print(rle)
I have good reason to believe that the C code is correct, since the rle_values_size and rle_values refer to the same object, namely a Rust vector within a struct, and the rle_values_size function works.
However, when I try to dereference the pointer given by rle_values and read it as an array I get segfaults.
I have tried every single permutation of code snippets I have found on Stack Overflow, but it segfaults.
Why is this crashing? What am I doing wrong?
I added the Rust tag since I might be getting the address of the vector in the wrong way.
Ps. If somebody also knows how to read this directly into a numpy array I would upvote that too.
Background info: How do I return an array in a pub extern "C" fn?

The cast should be the first warning sign. Why do you have to cast from the type to what should be the same type? This is because there are simple typos:
lib.rle_values.restype = POINTER(c_int32)
lib.rle_values_size.restype = c_int32
Note that it's supposed to be restype, not restypes.
def __str__(self):
values_size = lib.rle_values_size(self.obj)
print(values_size, "values_size")
values_pointer = lib.rle_values(self.obj)
print("values_pointer:", values_pointer)
thing = values_pointer[:values_size]
return str(thing)
It's also better to use as_mut_ptr:
#[no_mangle]
pub extern "C" fn rle_values(rle: *mut Rle) -> *mut int32_t {
let mut rle = unsafe { &mut *rle };
rle.values.as_mut_ptr()
}
Running the program appears to work:
$ LD_LIBRARY_PATH=$PWD/target/debug/ python3 main.py
new
30 values_size
values_pointer: <__main__.LP_c_int object at 0x10f124048>
[1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2]
I'd also recommend:
the default ctypes return value is a cint. Not specifying a return type for free is probably not a good idea, as it should be void.
return an unsigned number for the length of the data; what would -53 items mean?
reduce the scope of the unsafe blocks to just the part that is unsafe and the code that ensures it is actually safe.
speaking of which, you could check for NULL pointers in each function.
#[no_mangle]
pub extern "C" fn rle_values_size(rle: *mut Rle) -> int32_t {
match unsafe { rle.as_ref() } {
Some(rle) => rle.values.len() as i32,
None => 0,
}
}
#[no_mangle]
pub extern "C" fn rle_values(rle: *mut Rle) -> *mut int32_t {
match unsafe { rle.as_mut() } {
Some(mut rle) => rle.values.as_mut_ptr(),
None => ptr::null_mut(),
}
}

Related

How can I access a Rust Iterator from Python using PyO3?

I'm quite new with Rust, and my first 'serious' project has involved writing a Python wrapper for a small Rust library using PyO3. This has mostly been quite painless, but I'm struggling to work out how to expose lazy iterators over Rust Vecs to Python code.
So far, I have been collecting the values produced by the iterator and returning a list, which obviously isn't the best solution. Here's some code which illustrates my problem:
use pyo3::prelude::*;
// The Rust Iterator, from the library I'm wrapping.
pub struct RustIterator<'a> {
position: usize,
view: &'a Vec<isize>
}
impl<'a> Iterator for RustIterator<'a> {
type Item = &'a isize;
fn next(&mut self) -> Option<Self::Item> {
let result = self.view.get(self.position);
if let Some(_) = result { self.position += 1 };
result
}
}
// The Rust struct, from the library I'm wrapping.
struct RustStruct {
v: Vec<isize>
}
impl RustStruct {
fn iter(&self) -> RustIterator {
RustIterator{ position: 0, view: &self.v }
}
}
// The Python wrapper class, which exposes the
// functions of RustStruct in a Python-friendly way.
#[pyclass]
struct PyClass {
rust_struct: RustStruct,
}
#[pymethods]
impl PyClass {
#[new]
fn new(v: Vec<isize>) -> Self {
let rust_struct = RustStruct { v };
Self{ rust_struct }
}
// This is what I'm doing so far, which works
// but doesn't iterate lazily.
fn iter(&self) -> Vec<isize> {
let mut output_v = Vec::new();
for item in self.rust_struct.iter() {
output_v.push(*item);
}
output_v
}
}
I've tried to wrap the RustIterator class with a Python wrapper, but I can't use PyO3's #[pyclass] proc. macro with lifetime parameters. I looked into pyo3::types::PyIterator but this looks like a way to access a Python iterator from Rust rather than the other way around.
How can I access a lazy iterator over RustStruct.v in Python? It's safe to assume that the type contained in the Vec always derives Copy and Clone, and answers which require some code on the Python end are okay (but less ideal).

You can make your RustIterator a pyclass and then implement the proper trait (PyIterProtocol) using the rust iter itself.
Not tested, but something like:
#[pyclass]
pub struct RustIterator<'a> {
position: usize,
view: &'a Vec<isize>
}
impl<'a> Iterator for RustIterator<'a> {
type Item = &'a isize;
fn next(&mut self) -> Option<Self::Item> {
let result = self.view.get(self.position);
if let Some(_) = result { self.position += 1 };
result
}
}
#[pyproto]
impl PyIterProtocol for Iter {
fn __next__(mut slf: PyRefMut<Self>) -> IterNextOutput<usize, &'static str> {
match self.next() {
Some(value) => IterNextOutput::Yield(value),
None => IterNextOutput::Return("Ended")
}
}
}

Is there a way to use DATA_BLOB in python?

so I created a C++ DLL file and I'd like to use it in python that's not my problem. My problem is the return type of the DLL functions, I have 2 function, both of them return DATA_BLOB and I didn't find a way that I could use DATA_BLOB in python, my question is how can I use DATA_BLOB in python.

The return type of a DLL function is generally the result of a function execution. If you need to return data, put it in the parameters, and use the pointer.
DLL Sample:
BOOL DLLCall1(PDATA_BLOB DataOut) //return DATA_BLOB pointer
{
DATA_BLOB DataIn;
BYTE* pbDataInput = (BYTE*)"Hello world of data protection.";
DWORD cbDataInput = strlen((char*)pbDataInput) + 1;
//--------------------------------------------------------------------
// Initialize the DataIn structure.
DataIn.pbData = pbDataInput;
DataIn.cbData = cbDataInput;
CryptProtectData(
&DataIn,
L"This is the description string.", // A description string
// to be included with the
// encrypted data.
NULL, // Optional entropy not used.
NULL, // Reserved.
NULL, // Pass NULL for the
// prompt structure.
0,
DataOut);
return 1;
}
BOOL DLLCall2(DATA_BLOB DataOut)
{
LPWSTR pDescrOut = NULL;
DATA_BLOB DataVerify;
CRYPTPROTECT_PROMPTSTRUCT PromptStruct;
ZeroMemory(&PromptStruct, sizeof(PromptStruct));
PromptStruct.cbSize = sizeof(PromptStruct);
PromptStruct.dwPromptFlags = CRYPTPROTECT_PROMPT_ON_PROTECT;
PromptStruct.szPrompt = L"This is a user prompt.";
if (CryptUnprotectData(
&DataOut,
&pDescrOut,
NULL, // Optional entropy
NULL, // Reserved
&PromptStruct, // Optional PromptStruct
0,
&DataVerify))
{
printf("The decrypted data is: %s\n", DataVerify.pbData);
printf("The description of the data was: %S\n", pDescrOut);
return 1;
}
return 0;
}
python:
from ctypes import *
from ctypes.wintypes import DWORD
import ctypes
class DATA_BLOB(Structure):
_fields_ = [
("cbData", DWORD),
("pbData", POINTER(c_char)),
]
lib = cdll.LoadLibrary("C:\\Test.dll")
blobOut = DATA_BLOB()
lib.DLLCall1(byref(blobOut))
lib.DLLCall2(blobOut)

Memory leak when trying to free array of CString

The following (MCVE if compiled as a cdylib called libffitest, requires libc as a dependency) demonstrates the problem:
use libc::{c_char, c_void, size_t};
use std::ffi::CString;
use std::mem;
use std::slice;
#[repr(C)]
#[derive(Clone)]
pub struct Array {
pub data: *const c_void,
pub len: size_t,
}
#[no_mangle]
pub unsafe extern "C" fn bar() -> Array {
let v = vec![
CString::new("Hi There").unwrap().into_raw(),
CString::new("Hi There").unwrap().into_raw(),
];
v.into()
}
#[no_mangle]
pub extern "C" fn drop_word_array(arr: Array) {
if arr.data.is_null() {
return;
}
// Convert incoming data to Vec so we own it
let mut f: Vec<c_char> = arr.into();
// Deallocate the underlying c_char data by reconstituting it as a CString
let _: Vec<CString> = unsafe { f.iter_mut().map(|slice| CString::from_raw(slice)).collect() };
}
// Transmute to array for FFI
impl From<Vec<*mut c_char>> for Array {
fn from(sl: Vec<*mut c_char>) -> Self {
let array = Array {
data: sl.as_ptr() as *const c_void,
len: sl.len() as size_t,
};
mem::forget(sl);
array
}
}
// Reconstitute from FFI
impl From<Array> for Vec<c_char> {
fn from(arr: Array) -> Self {
unsafe { slice::from_raw_parts_mut(arr.data as *mut c_char, arr.len).to_vec() }
}
}
I thought that by reconstituting the incoming Array as a slice, taking ownership of it as a Vec, then reconstituting the elements as CString, I was freeing any allocated memory, but I'm clearly doing something wrong. Executing this Python script tells me that it's trying to free a pointer that was not allocated:
python(85068,0x10ea015c0) malloc: *** error for object 0x7ffdaa512ca1: pointer being freed was not allocated
import sys
import ctypes
from ctypes import c_void_p, Structure, c_size_t, cast, POINTER, c_char_p
class _FFIArray(Structure):
"""
Convert sequence of structs to C-compatible void array
"""
_fields_ = [("data", c_void_p),
("len", c_size_t)]
def _arr_to_wordlist(res, _func, _args):
ls = cast(res.data, POINTER(c_char_p * res.len))[0][:]
print(ls)
_drop_wordarray(res)
prefix = {"win32": ""}.get(sys.platform, "lib")
extension = {"darwin": ".dylib", "win32": ".dll"}.get(sys.platform, ".so")
lib = ctypes.cdll.LoadLibrary(prefix + "ffitest" + extension)
lib.bar.argtypes = ()
lib.bar.restype = _FFIArray
lib.bar.errcheck = _arr_to_wordlist
_drop_wordarray = lib.drop_word_array
if __name__ == "__main__":
lib.bar()

Well, that was a fun one to go through.
Your biggest problem is the following conversion:
impl From<Array> for Vec<c_char> {
fn from(arr: Array) -> Self {
unsafe { slice::from_raw_parts_mut(arr.data as *mut c_char, arr.len).to_vec() }
}
}
You start with what comes out of the FFI boundary as an array of strings (i.e. *mut *mut c_char). For some reason, you decide that all of a sudden, it is a Vec<c_char> and not a Vec<*const c_char> as you would expect for the CString conversion. That's UB #1 - and the cause of your use-after-free.
The unnecessarily convoluted conversions made things even muddier due to the constant juggling between types. If your FFI boundary is Vec<CString>, why do you split the return into two separate calls? That's literally calling for disaster, as it happened.
Consider the following:
impl From<Array> for Vec<CString> {
fn from(arr: Array) -> Self {
unsafe {
slice::from_raw_parts(
arr.data as *mut *mut c_char,
arr.len
)
.into_iter().map(|r| CString::from_raw(*r))
.collect()
}
}
}
This gives you a one-step FFI boundary conversion (without the necessity for the second unsafe block in your method), clean types and no leaks.

What is the proper syntax to bind a typedef type using pybind11?

I have a struct very similar to this:
struct data_point
{
data_point() = delete;
data_point(const int& data) :
m_data(data)
{}
int m_data;
};
I also have this type declared as such.
typedef std::vector<data_point> data_list;
The binding for this struct is defined:
PYBIND11_MODULE(data_lib, ref)
{
py::class_<data_point> dp(ref, "data_point");
dp.def(py::init<const int&>());
dp.def_readwrite("m_data", &data_point::m_data);
}
How do I define a binding for the typedef list type?
Its not clear to me how to do this in the pybind documentation.

For this specific issue, pybind will automatically interpret an std::vector type as a python list when you include "pybind11/stl.h" . Thus, a binding for this type is unnecessary.
Ex:
#include "pybind11.h"
#include "pybind11/stl.h"
struct data_point
{
data_point() = delete;
data_point(const int& data) :
m_data(data)
{}
int m_data;
};
std::vector<data_point> make_vec(){
return {data_point(20), data_point(18)};
}
PYBIND11_MODULE(data_lib, ref)
{
py::class_<data_point> dp(ref, "data_point");
dp.def(py::init<const int&>());
dp.def_readwrite("m_data", &data_point::m_data);
ref.def("make_vec", &make_vec, "A function that returns a vector of data_points");
}
In python, when you import the data_lib library you will be able to use functions that return lists of data_point.
import data_lib
p = data_lib.make_vec()
print len(p)
output: 2

Translate the code from Python to C++

Now when I understand how the code works, I would like to translate it to C++.
The original Python code:
def recv_all_until(s, crlf):
data = ""
while data[-len(crlf):] != crlf:
data += s.recv(1)
return data
Here's what I tried:
std::string recv_all_until(int socket, std::string crlf)
{
std::string data = "";
char buffer[1];
memset(buffer, 0, 1);
while(data.substr(data.length()-2, data.length()) != crlf)
{
if ((recv(socket, buffer, 1, 0)) == 0)
{
if (errno != 0)
{
close(socket);
perror("recv");
exit(1);
}
}
data = data + std::string(buffer);
memset(buffer, 0, 1);
}
return data;
}
But it shows:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
I understand that the problem is inside the while loop since at first the data string is empty. So how to improve this to make it work the same as it works in Python? Thank you.

You have the problem in the first iteration of your while loop:
Since the data is an empty string, data.length() is equal to 0, and thus you're calling data.substr(-2, 0).
To fix this, you need to add a check for the line length to the while statement.
Also, there's a method of finding such mistakes faster than writing a stackoverflow question about it. Consider reading this article.

If we first change your Python code a bit:
def recv_all_until(s, crlf):
data = ""
while not data.endswith(crlf):
data += s.recv(1)
return data
What we need to do in C++ becomes much clearer:
bool ends_with(const std::string& str, const std::string& suffix)
{
return str.size() >= suffix.size() &&
std::equal(suffix.rbegin(), suffix.rend(), str.rbegin());
}
std::string recv_all_until(int socket, const std::string& crlf)
{
std::string data = "";
char buffer[1];
memset(buffer, 0, 1);
while (!ends_with(data, crlf))
{
if ((recv(socket, buffer, 1, 0)) == 0)
{
if (errno != 0)
{
close(socket);
perror("recv");
exit(1);
}
}
data = data + std::string(buffer);
memset(buffer, 0, 1);
}
return data;
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dereference FFI pointer in Python to get underlying array - python

Related

How can I access a Rust Iterator from Python using PyO3?

Is there a way to use DATA_BLOB in python?

Memory leak when trying to free array of CString

What is the proper syntax to bind a typedef type using pybind11?

Translate the code from Python to C++

Categories

Resources