I am experimenting with writing a library in Rust that I can call from Python code. I would like to be able to pass a void pointer back to Python so that I can hold state between calls into Rust. However, I get a segfault in Rust when trying to access the pointer again.
Full code samples and crash report: https://gist.github.com/robyoung/3644f13a05c95cb1b947
The code
#![feature(libc)]
#![feature(alloc)]
extern crate libc;
use std::boxed;
pub struct Point {
x: i64,
y: i32,
}
#[no_mangle]
pub extern "C" fn start_state() -> *mut Point {
let point = Box::new(Point{x: 0, y: 10});
let raw = unsafe { boxed::into_raw(point) };
println!("{:?}", raw);
raw
}
#[no_mangle]
pub extern "C" fn continue_state(point: *mut Point) -> i32 {
println!("{:?}", point);
let p = unsafe { Box::from_raw(point) };
println!("{} {}", p.x, p.y);
0
}
import ctypes
lib = ctypes.cdll.LoadLibrary('target/libpytesttype.so')
lib.start_state.restype = ctypes.c_void_p
pointer = lib.start_state()
print("{:x}".format(pointer))
lib.continue_state(pointer)
The output
0xdc24000
10dc24000
0xdc24000
[1] 64006 segmentation fault python src/main.py
What am I doing wrong?
eryksun nailed it:
On the Python side, you're missing lib.continue_state.argtypes = (ctypes.c_void_p,). Without defining the parameter as a pointer, ctypes uses the default conversion for a Python integer, which truncates the value to 32-bit, e.g. 0x0dc24000. If you're lucky accessing that address triggers a segfault immediately.
My output (with my own padding) was:
0x103424000
103424000
0x 3424000
So the Debug formatter for pointers should be fine. Not sure why your output differs.
After adding
lib.continue_state.argtypes = (ctypes.c_void_p,)
The program ran just fine.
Related
I am trying to open a DLG box present in MFC using python. I have tried two methods within extern C.
1st method
extern "C"
{ DLLEXPORT void open_dlg_32k_trim()
{
IDD_32KHZ_CTRL_CODE *obj=new IDD_32KHZ_CTRL_CODE;
//customer function to store first dialig object
obj->Create(IDD_32K_CONTROL,obj);
obj->ShowWindow(SW_NORMAL);
}
}
2nd method
extern "C"
{ DLLEXPORT void open_dlg_32k_trim()
{
IDD_32KHZ_CTRL_CODE ob;
ob.DoModal();
}
}
The code compiles successfully on dll side. But while calling the function from the python the code it is throwing error
enter image description here
Can you help me out where I am doing wrong and what is the first thing to look into .
I was wondering if it's possible to actually write in a file an address of a numpy array, via e.g. ctypeslib.ndpointer or something similar and then open this file in a C++ function, also called through ctypes in the same python process and read this address, convert it to e.g. C++ double array.
This will all be happening in the same python process.
I am aware that it's possible to pass it as a function argument and that works, but that isn't something I'd need.
This is how the code would look like, don't mind the syntax errors:
test.py
with open(path) as f:
f.write(matrix.ctypes.data_as(np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, flags='C_CONTIGUOUS')))
and cpp:
void function()
{
... read file, get address stored into double* array;
e.g. then print out the values
}
Where could I be wrong?
I work on a project where we are writing np array to a file and then reading that file in cpp, which is wasteful. I want to try adjusting it to write and later on read just this address. Sending a ndpointer or something else as a function argument wont work, as that would require editing big partion of the project.
I think that the data of your np.array will be lost once the python program terminates therefore you will not be able to access its memory location once the program ends.
Unfortunately, I don't know how to do it using ctypes but only using the C-API Extention.
With it, you access directly the python variable from c. It is represented by a pointer therefore you could access the address of any python object( therefore also ndarrays).
in python you would write:
import c_module
import NumPy as np
...
a = np.array([...])
#generate the numpy array
...
c_module.c_fun(a)
and then in your c++ code, you will receive the memory address
static PyObject* py_f_roots(PyObject* self, PyObject* args) {
PyObject *np_array_py;
if (!PyArg_ParseTuple(args, "OO", &np_array_py))
return NULL;
//now np_array_py points to the memory cell of the python numpy array a
//if you want to access it you need to cast it to a PyArrayObject *
PyArrayObject *np_array = (PyArrayObject *) np_array_py;
//you can access the data
double *data = (double *) PyArray_DATA(np_array);
return Py_None;
}
The documentation for numpy c API
The reference manual for c python extention
If the Python and C code are run in the same process, then the address you write from Python will be valid in C. I think you want the following:
test.py
import ctypes as ct
import numpy as np
matrix = np.array([1.1,2.2,3.3,4.4,5.5])
# use binary to write the address
with open('addr.bin','wb') as f:
# type of pointer doesn't matter just need the address
f.write(matrix.ctypes.data_as(ct.c_void_p))
# test function to receive the filename
dll = ct.CDLL('./test')
dll.func.argtypes = ct.c_char_p,
dll.func.restype = None
dll.func(b'addr.bin')
test.c
#include <stdio.h>
__declspec(dllexport)
void func(const char* file) {
double* p;
FILE* fp = fopen(file,"rb"); // read the pointer
fread(&p, 1, sizeof(p), fp);
fclose(fp);
for(int i = 0; i < 5; ++i) // dump the elements
printf("%lf\n", p[i]);
}
Output:
1.100000
2.200000
3.300000
4.400000
5.500000
I am writing a Python extension module in C (actually C++, but this doesn't matter) that performs some calculations in an OpenMP loop, in which it can call a user-provided Python callback function, which operates on numpy arrays. Since the standard CPython does not allow one to use Python API from multiple threads simultaneously, I protect these callbacks by a #pragma omp critical block.
This works well in some cases, but sometimes creates a deadlock, whereby one thread is trying to acquire the openmp critical lock, and the other is waiting for the GIL lock:
Thread 0:
__kmpc_critical_with_hint (in libomp.dylib) + 1109 [0x105f8a6dd]
__kmp_acquire_queuing_lock(kmp_user_lock*, int) (in libomp.dylib) + 9 [0x105fbaec1]
int __kmp_acquire_queuing_lock_timed_template<false>(kmp_queuing_lock*, int) (in libomp.dylib) + 405 [0x105fb6e6c]
__kmp_wait_yield_4 (in libomp.dylib) + 135,128,... [0x105fb0d0e,0x105fb0d07,...]
Thread 1:
PyObject_Call (in Python) + 99 [0x106014202]
??? (in umath.so) load address 0x106657000 + 0x25e51 [0x10667ce51]
??? (in umath.so) load address 0x106657000 + 0x23b0c [0x10667ab0c]
??? (in umath.so) load address 0x106657000 + 0x2117e [0x10667817e]
??? (in umath.so) load address 0x106657000 + 0x21238 [0x106678238]
PyGILState_Ensure (in Python) + 93 [0x1060ab4a7]
PyEval_RestoreThread (in Python) + 62 [0x10608cb0a]
PyThread_acquire_lock (in Python) + 101 [0x1060bc1a4]
_pthread_cond_wait (in libsystem_pthread.dylib) + 767 [0x7fff97d0d728]
__psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff9d464db6]
Curiously, this happens whenever the Python callback function encounters an invalid floating-point value or overflows, printing a warning message to the console. Apparently this upsets some synchronization mutexes and leads to a deadlock shortly after.
Here is a stripped-down but self-contained example code.
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include <Python.h>
#include <numpy/arrayobject.h>
#include <omp.h>
const int NUM_THREADS = 2; // number of OpenMP threads
const int NUM_BLOCKS = 100; // number of loop iterations
//const double MAX_EXP = 500.0; // this is a safe value - exp(500) does not overflow
const double MAX_EXP = 1000.0; // exp(1000) overflows and produces a warning message, which then hangs the whole thing
int main(int argc, char *argv[])
{
Py_Initialize();
PyEval_InitThreads();
PyObject* numpy = PyImport_ImportModule("numpy");
if(!numpy) {
printf("Failed to import numpy\n");
return 1;
} else printf("numpy imported\n");
import_array1(1);
PyObject* fnc = PyObject_GetAttrString(numpy, "exp");
if(!fnc || !PyCallable_Check(fnc)) {
printf("Failed to get hold on function\n");
return 1;
} else printf("function loaded\n");
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for schedule(dynamic)
for(int i=0; i<NUM_BLOCKS; i++) {
int tn = omp_get_thread_num();
printf("Thread %i: block %i\n", tn, i);
#pragma omp critical
{
//PyGILState_STATE state = PyGILState_Ensure(); ///< does not help
npy_intp dims[1] = { random() % 64000 + 1000 };
PyArrayObject* args = (PyArrayObject*) PyArray_ZEROS(1, dims, NPY_DOUBLE, 0);
double* raw_data = (double*) PyArray_DATA(args);
for(npy_intp k=0; k<dims[0]; k++)
raw_data[k] = random()*MAX_EXP / RAND_MAX;
printf("Thread %i: calling fnc for block %i with %li points\n", tn, i, dims[0]);
PyObject* result = PyObject_CallFunctionObjArgs(fnc, args, NULL);
Py_DECREF(args);
printf("Thread %i: result[0] for block %i with %li points is %g\n", tn, i, dims[0],
*((double*)PyArray_GETPTR1((PyArrayObject*)result, 0)));
Py_XDECREF(result);
//PyGILState_Release(state);
}
}
Py_Finalize();
}
When I set MAX_EXP=500 in line 7, everything works without warnings and deadlocks, but if I replace it with MAX_EXP=1000, this produces a warning message,
sys:1: RuntimeWarning: overflow encountered in exp
and the next loop iteration never finishes. This behaviour is seen on both Linux and MacOS, Python 2.7 or 3.6 all the same. I tried to add some PyGILState_Ensure() to the code but this doesn't help, and the documentation on these aspects is unclear.
Okay, so the problem turned out to lurk deep inside numpy, namely in the _error_handler() function, which is called whenever an invalid floating-point value is produces (NaN, overflow to infinity, etc.)
This function has several regimes - from ignoring the error completely to raising an exception, but by default it issues a Python warning. In doing so, it temporarily re-acquires GIL, which was released during bulk computation, and that's where the deadlock occurs.
A very similar situation leading to the same problem is discussed here: https://github.com/numpy/numpy/issues/5856
My workaround solution was to create a lock-type class that disables numpy warnings during the existence of this class instance (which is created during the parallelized computation), and restores the original settings once this instance is destroyed. While not ideal, this seems to suffice in my case, though my feeling that ultimately the culprit is numpy itself. For completeness, here is the code of this class:
/** Lock-type class that temporarily disables warnings that numpy produces on floating-point
overflows or other invalid values.
The reason for doing this is that such warnings involve subtle interference with GIL
when executed in a multi-threading context, leading to deadlocks if a user-defined Python
function is accessed from multiple threads (even after being protected by an OpenMP critical
section). The instance of this class is created (and hence warnings are suppressed)
whenever a user-defined callback function is instantiated, and the previous warning settings
are restored once such a function is deallocated.
*/
class NumpyWarningsDisabler {
PyObject *seterr, *prevSettings; ///< pointer to the function and its previous settings
public:
NumpyWarningsDisabler() : seterr(NULL), prevSettings(NULL)
{
PyObject* numpy = PyImport_AddModule("numpy");
if(!numpy) return;
seterr = PyObject_GetAttrString(numpy, "seterr");
if(!seterr) return;
// store the dictionary corresponding to current settings of numpy warnings subsystem
prevSettings = PyObject_CallFunction(seterr, const_cast<char*>("s"), "ignore");
if(!prevSettings) { printf("Failed to suppress numpy warnings\n"); }
/*else { printf("Ignoring numpy warnings\n"); }*/
}
~NumpyWarningsDisabler()
{
if(!seterr || !prevSettings) return;
// restore the previous settings of numpy warnings subsystem
PyObject* args = PyTuple_New(0);
PyObject* result = PyObject_Call(seterr, args, prevSettings);
Py_DECREF(args);
if(!result) { printf("Failed to restore numpy warnings\n"); }
/*else printf("Restored numpy warnings\n");*/
Py_XDECREF(result);
}
};
I've written some C-code to call scipy functions. The body, including variable declarations and using EXIT FAIL to denote messages and cleanup steps, is:
PyObject *module_name, *module = NULL;
PyObject *funct = NULL;
PyObject *output = NULL;
int j;
double dInVal, dOutVal;
Py_Initialize();
module_name = PyString_FromString("scipy.stats");
module = PyImport_Import(module_name);
Py_DECREF(module_name);
if (!module)
EXIT FAIL
funct = PyObject_GetAttrString(module, "beta");
if (!funct)
EXIT FAIL
Py_DECREF(module);
for (j=0; j<=10; j++)
{
dInVal = (double)j/10.0;
output = PyObject_CallMethod(funct, "ppf", "(f,f,f)", dInVal, 50.0, 50.0);
if (!output)
EXIT FAIL
dOutVal = PyFloat_AsDouble(output);
Py_DECREF(output);
printf("%6.3f %6.3f\n", dInVal, dOutVal);
}
Py_DECREF(funct);
Py_Finalize();
When I run this as the main routine, it appears to work fine. However, when I run it as a subroutine, it works for a first call, but fails on any subsequent call.
The code does work as a subroutine after the first call, if I make all of the PyObject pointers static (and include the appropriate flags, so that Python is initialized and the "beta" object is imported only once), but make-everything-static seems like a brute force solution to the problem, especially if the program will eventually include more than one subroutine that calls Python.
My question is, what is the best practice for setting up a C-program, to call scipy from a subroutine?
I'm trying to use ctypes to call a c function that was returned as a pointer from another function. It seems from the documentation that I can do this by declaring the function with CFUNCTYPE, and then creating an instance using the pointer. This, however seems to give me a segfault. Here is some sample code.
sample.c:
#include <stdio.h>
unsigned long long simple(void *ptr)
{
printf("pointer = %p\n", ptr);
return (unsigned long long)ptr;
}
void *foo()
{
return (void *)simple;
}
unsigned long long (*bar)(void *ptr) = simple;
int main()
{
bar(foo());
simple(foo());
}
and simple.py:
from ctypes import *
import pdb
_lib = cdll.LoadLibrary('./simple.so')
_simple = _lib.simple
_simple.restype = c_longlong
_simple.argtypes = [ c_void_p ]
_foo = _lib.foo
_bar = CFUNCTYPE(c_int, c_void_p)(_foo())
pdb.set_trace()
_bar(_foo())
Here's a gdb/pdb session:
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/python simple.py
[Thread debugging using libthread_db enabled]
> .../simple.py(15)<module>()
-> _bar(_foo())
(Pdb) p _foo()
-161909044
(Pdb) cast(_bar,c_void_p).value
18446744073547642572L
(Pdb) _simple(_foo())
pointer = 0xfffffffff65976cc
-161909044
(Pdb) int('fffffffff65976cc',16)
18446744073547642572L
Curiously, if I run using the C main function, I get
$ ./simple
pointer = 0x400524
pointer = 0x400524
which doesn't match the pointer that I get from the python code.
What am I doing wrong here?
Thanks in advance for any guidance you can give!
You are not defining any return type for _foo, try adding:
_foo.restype = c_void_p
ctypes defaults to int returntype, and it looks (from the cast done in you pdb session) like you are on a 64-bit system meaning that you pointer will be truncated when converted to int. On my system the code seems to work - but that is a 32-bit system (and unfortunately I don't have any 64-bit system available to test on right now).
Also you _bar definition doesn't really match what is in the C code, I suggest using something like:
_bar = CFUNCTYPE(c_longlong, c_void_p).in_dll(_lib, "bar")