I've written some C-code to call scipy functions. The body, including variable declarations and using EXIT FAIL to denote messages and cleanup steps, is:
PyObject *module_name, *module = NULL;
PyObject *funct = NULL;
PyObject *output = NULL;
int j;
double dInVal, dOutVal;
Py_Initialize();
module_name = PyString_FromString("scipy.stats");
module = PyImport_Import(module_name);
Py_DECREF(module_name);
if (!module)
EXIT FAIL
funct = PyObject_GetAttrString(module, "beta");
if (!funct)
EXIT FAIL
Py_DECREF(module);
for (j=0; j<=10; j++)
{
dInVal = (double)j/10.0;
output = PyObject_CallMethod(funct, "ppf", "(f,f,f)", dInVal, 50.0, 50.0);
if (!output)
EXIT FAIL
dOutVal = PyFloat_AsDouble(output);
Py_DECREF(output);
printf("%6.3f %6.3f\n", dInVal, dOutVal);
}
Py_DECREF(funct);
Py_Finalize();
When I run this as the main routine, it appears to work fine. However, when I run it as a subroutine, it works for a first call, but fails on any subsequent call.
The code does work as a subroutine after the first call, if I make all of the PyObject pointers static (and include the appropriate flags, so that Python is initialized and the "beta" object is imported only once), but make-everything-static seems like a brute force solution to the problem, especially if the program will eventually include more than one subroutine that calls Python.
My question is, what is the best practice for setting up a C-program, to call scipy from a subroutine?
Related
I am writing a Python extension module in C (actually C++, but this doesn't matter) that performs some calculations in an OpenMP loop, in which it can call a user-provided Python callback function, which operates on numpy arrays. Since the standard CPython does not allow one to use Python API from multiple threads simultaneously, I protect these callbacks by a #pragma omp critical block.
This works well in some cases, but sometimes creates a deadlock, whereby one thread is trying to acquire the openmp critical lock, and the other is waiting for the GIL lock:
Thread 0:
__kmpc_critical_with_hint (in libomp.dylib) + 1109 [0x105f8a6dd]
__kmp_acquire_queuing_lock(kmp_user_lock*, int) (in libomp.dylib) + 9 [0x105fbaec1]
int __kmp_acquire_queuing_lock_timed_template<false>(kmp_queuing_lock*, int) (in libomp.dylib) + 405 [0x105fb6e6c]
__kmp_wait_yield_4 (in libomp.dylib) + 135,128,... [0x105fb0d0e,0x105fb0d07,...]
Thread 1:
PyObject_Call (in Python) + 99 [0x106014202]
??? (in umath.so) load address 0x106657000 + 0x25e51 [0x10667ce51]
??? (in umath.so) load address 0x106657000 + 0x23b0c [0x10667ab0c]
??? (in umath.so) load address 0x106657000 + 0x2117e [0x10667817e]
??? (in umath.so) load address 0x106657000 + 0x21238 [0x106678238]
PyGILState_Ensure (in Python) + 93 [0x1060ab4a7]
PyEval_RestoreThread (in Python) + 62 [0x10608cb0a]
PyThread_acquire_lock (in Python) + 101 [0x1060bc1a4]
_pthread_cond_wait (in libsystem_pthread.dylib) + 767 [0x7fff97d0d728]
__psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff9d464db6]
Curiously, this happens whenever the Python callback function encounters an invalid floating-point value or overflows, printing a warning message to the console. Apparently this upsets some synchronization mutexes and leads to a deadlock shortly after.
Here is a stripped-down but self-contained example code.
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include <Python.h>
#include <numpy/arrayobject.h>
#include <omp.h>
const int NUM_THREADS = 2; // number of OpenMP threads
const int NUM_BLOCKS = 100; // number of loop iterations
//const double MAX_EXP = 500.0; // this is a safe value - exp(500) does not overflow
const double MAX_EXP = 1000.0; // exp(1000) overflows and produces a warning message, which then hangs the whole thing
int main(int argc, char *argv[])
{
Py_Initialize();
PyEval_InitThreads();
PyObject* numpy = PyImport_ImportModule("numpy");
if(!numpy) {
printf("Failed to import numpy\n");
return 1;
} else printf("numpy imported\n");
import_array1(1);
PyObject* fnc = PyObject_GetAttrString(numpy, "exp");
if(!fnc || !PyCallable_Check(fnc)) {
printf("Failed to get hold on function\n");
return 1;
} else printf("function loaded\n");
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for schedule(dynamic)
for(int i=0; i<NUM_BLOCKS; i++) {
int tn = omp_get_thread_num();
printf("Thread %i: block %i\n", tn, i);
#pragma omp critical
{
//PyGILState_STATE state = PyGILState_Ensure(); ///< does not help
npy_intp dims[1] = { random() % 64000 + 1000 };
PyArrayObject* args = (PyArrayObject*) PyArray_ZEROS(1, dims, NPY_DOUBLE, 0);
double* raw_data = (double*) PyArray_DATA(args);
for(npy_intp k=0; k<dims[0]; k++)
raw_data[k] = random()*MAX_EXP / RAND_MAX;
printf("Thread %i: calling fnc for block %i with %li points\n", tn, i, dims[0]);
PyObject* result = PyObject_CallFunctionObjArgs(fnc, args, NULL);
Py_DECREF(args);
printf("Thread %i: result[0] for block %i with %li points is %g\n", tn, i, dims[0],
*((double*)PyArray_GETPTR1((PyArrayObject*)result, 0)));
Py_XDECREF(result);
//PyGILState_Release(state);
}
}
Py_Finalize();
}
When I set MAX_EXP=500 in line 7, everything works without warnings and deadlocks, but if I replace it with MAX_EXP=1000, this produces a warning message,
sys:1: RuntimeWarning: overflow encountered in exp
and the next loop iteration never finishes. This behaviour is seen on both Linux and MacOS, Python 2.7 or 3.6 all the same. I tried to add some PyGILState_Ensure() to the code but this doesn't help, and the documentation on these aspects is unclear.
Okay, so the problem turned out to lurk deep inside numpy, namely in the _error_handler() function, which is called whenever an invalid floating-point value is produces (NaN, overflow to infinity, etc.)
This function has several regimes - from ignoring the error completely to raising an exception, but by default it issues a Python warning. In doing so, it temporarily re-acquires GIL, which was released during bulk computation, and that's where the deadlock occurs.
A very similar situation leading to the same problem is discussed here: https://github.com/numpy/numpy/issues/5856
My workaround solution was to create a lock-type class that disables numpy warnings during the existence of this class instance (which is created during the parallelized computation), and restores the original settings once this instance is destroyed. While not ideal, this seems to suffice in my case, though my feeling that ultimately the culprit is numpy itself. For completeness, here is the code of this class:
/** Lock-type class that temporarily disables warnings that numpy produces on floating-point
overflows or other invalid values.
The reason for doing this is that such warnings involve subtle interference with GIL
when executed in a multi-threading context, leading to deadlocks if a user-defined Python
function is accessed from multiple threads (even after being protected by an OpenMP critical
section). The instance of this class is created (and hence warnings are suppressed)
whenever a user-defined callback function is instantiated, and the previous warning settings
are restored once such a function is deallocated.
*/
class NumpyWarningsDisabler {
PyObject *seterr, *prevSettings; ///< pointer to the function and its previous settings
public:
NumpyWarningsDisabler() : seterr(NULL), prevSettings(NULL)
{
PyObject* numpy = PyImport_AddModule("numpy");
if(!numpy) return;
seterr = PyObject_GetAttrString(numpy, "seterr");
if(!seterr) return;
// store the dictionary corresponding to current settings of numpy warnings subsystem
prevSettings = PyObject_CallFunction(seterr, const_cast<char*>("s"), "ignore");
if(!prevSettings) { printf("Failed to suppress numpy warnings\n"); }
/*else { printf("Ignoring numpy warnings\n"); }*/
}
~NumpyWarningsDisabler()
{
if(!seterr || !prevSettings) return;
// restore the previous settings of numpy warnings subsystem
PyObject* args = PyTuple_New(0);
PyObject* result = PyObject_Call(seterr, args, prevSettings);
Py_DECREF(args);
if(!result) { printf("Failed to restore numpy warnings\n"); }
/*else printf("Restored numpy warnings\n");*/
Py_XDECREF(result);
}
};
I am trying to embed Python in a C++ multi-threading program using the Python/C API (version 3.7.3) on a quad-core ARM 64 bit architecture. A dedicated thread-safe class "PyHandler" takes care of all the Python API calls:
class PyHandler
{
public:
PyHandler();
~PyHandler();
bool run_fun();
// ...
private:
PyGILState_STATE _gstate;
std::mutex _mutex;
}
In the constructor I initialize the Python interpreter:
PyHandler::PyHandler()
{
Py_Initialize();
//PyEval_SaveThread(); // UNCOMMENT TO MAKE EVERYTHING WORK !
}
And in the destructor I undo all initializations:
PyHandler::~PyHandler()
{
_gstate = PyGILState_Ensure();
if (Py_IsInitialized()) // finalize python interpreter
Py_Finalize();
}
Now, in order to make run_fun() callable by one thread at a time, I use the mutex variable _mutex (see below). On top of this, I call PyGILState_Ensure() to make sure the current thread holds the python GIL, and call PyGILState_Release() at the end to release it. All the remaining python calls happen within these two calls:
bool PyHandler::run_fun()
{
std::lock_guard<std::mutex> lockGuard(_mutex);
_gstate = PyGILState_Ensure(); // give the current thread the Python GIL
// Python calls...
PyGILState_Release(_gstate); // release the Python GIL till now assigned to the current thread
return true;
}
Here is how the main() looks like:
int main()
{
PyHandler py; // constructor is called !
int n_threads = 10;
std::vector<std::thread> threads;
for (int i = 0; i < n_threads; i++)
threads.push_back(std::thread([&py]() { py.run_fun(); }));
for (int i = 0; i < n_threads; i++)
if (threads[i].joinable())
threads[i].join();
}
Although all precautions, the program always deadlocks at the PyGILState_Ensure() line in run_fun() during the very first attempt. BUT when I uncomment the line with PyEval_SaveThread() in the constructor everything magically works. Why is that ?
Notice that I am not calling PyEval_RestoreThread() anywhere. Am I supposed to use the macros Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS instead ? I thought these macros and PyEval_SaveThread() are only used dealing with Python threads and NOT with non-Python threads, as in my case! Am I missing something ?
The documentation for my case, only mentions the use of PyGILState_Ensure() and PyGILState_Release. Any help is highly appreciated.
I had one c++ program that inside for loop, calling a function.
The function is doing a heavy process, it is embedded with python and performing image processing.
My question is, why can it only run at the first instance of the variable?
Main function (I only show the part of code require in this title):
int main(){
for(int a = 0;a<5;a++){
for(int b=0;b<5;b++){
// I want every increment it go to PyRead() function, doing image processing, and compare
if(PyRead()==1){
// some application might be occur
}
else {
}
}
}
PyRead() function, the function in c++ to go into python environment performing image processing:
bool PyRead(){
string data2;
Py_Initialize();
PyRun_SimpleString("print 'hahahahahawwwwwwwwwwwww' ");
char filename[] = "testcapture";
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyObject * moduleObj = PyImport_ImportModule(filename);
if (moduleObj)
{
PyRun_SimpleString("print 'hahahahaha' ");
char functionName[] = "test";
PyObject * functionObj = PyObject_GetAttrString(moduleObj, functionName);
if (functionObj)
{
if (PyCallable_Check(functionObj))
{
PyObject * argsObject = PyTuple_New(0);
if (argsObject)
{
PyObject * resultObject = PyEval_CallObject(functionObj, argsObject);
if (resultObject)
{
if ((resultObject != Py_None)&&(PyString_Check(resultObject)))
{
data2 = PyString_AsString(resultObject);
}
Py_DECREF(resultObject);
}
else if (PyErr_Occurred()) PyErr_Print();
Py_DECREF(argsObject);
}
}
Py_DECREF(functionObj);
}
else PyErr_Clear();
Py_DECREF(moduleObj);
}
Py_Finalize();
std::cout << "The Python test function returned: " << data2<< std::endl;
cout << "Data2 \n" << data2;
if(compareID(data2) == 1)
return true;
else
return false;
}
This is second time I ask this question in stack overflow. I hope this time this question will be more clear!
I can successful compile with no error.
When I run the program, I realize at a=0, b=0 it will go to PyRead() function and return value, after that it go to a=0, b=1, at that moment the whole program will end.
It supposes to go to PyRead() function again, but it does not do that and straight ending the program.
I must strongly mention that PyRead() function needed a long time to run (30seconds).
I had no idea what happens, seeking for somehelp. Please focus on the Bold part to understand my question.
Thanks.
See the comment in https://docs.python.org/2/c-api/init.html#c.Py_Finalize
Ideally, this frees all memory allocated by the Python interpreter.
Dynamically loaded extension modules loaded by Python are not unloaded.
Some extensions may not work properly if their initialization routine is called more than once
It seems your module, does not play well with this function.
A workaround can be - create the script on the fly and call it with python subprocess.
I am experimenting with writing a library in Rust that I can call from Python code. I would like to be able to pass a void pointer back to Python so that I can hold state between calls into Rust. However, I get a segfault in Rust when trying to access the pointer again.
Full code samples and crash report: https://gist.github.com/robyoung/3644f13a05c95cb1b947
The code
#![feature(libc)]
#![feature(alloc)]
extern crate libc;
use std::boxed;
pub struct Point {
x: i64,
y: i32,
}
#[no_mangle]
pub extern "C" fn start_state() -> *mut Point {
let point = Box::new(Point{x: 0, y: 10});
let raw = unsafe { boxed::into_raw(point) };
println!("{:?}", raw);
raw
}
#[no_mangle]
pub extern "C" fn continue_state(point: *mut Point) -> i32 {
println!("{:?}", point);
let p = unsafe { Box::from_raw(point) };
println!("{} {}", p.x, p.y);
0
}
import ctypes
lib = ctypes.cdll.LoadLibrary('target/libpytesttype.so')
lib.start_state.restype = ctypes.c_void_p
pointer = lib.start_state()
print("{:x}".format(pointer))
lib.continue_state(pointer)
The output
0xdc24000
10dc24000
0xdc24000
[1] 64006 segmentation fault python src/main.py
What am I doing wrong?
eryksun nailed it:
On the Python side, you're missing lib.continue_state.argtypes = (ctypes.c_void_p,). Without defining the parameter as a pointer, ctypes uses the default conversion for a Python integer, which truncates the value to 32-bit, e.g. 0x0dc24000. If you're lucky accessing that address triggers a segfault immediately.
My output (with my own padding) was:
0x103424000
103424000
0x 3424000
So the Debug formatter for pointers should be fine. Not sure why your output differs.
After adding
lib.continue_state.argtypes = (ctypes.c_void_p,)
The program ran just fine.
I'm embedding python in a C++ plug-in. The plug-in calls a python algorithm dozens of times during each session, each time sending the algorithm different data. So far so good
But now I have a problem:
The algorithm takes sometimes minutes to solve and to return a solution, and during that time often the conditions change making that solution irrelevant. So, what I want is to stop the running of the algorithm at any moment, and run it immediately after with other set of data.
Here's the C++ code for embedding python that I have so far:
void py_embed (void*data){
counter_thread=false;
PyObject *pName, *pModule, *pDict, *pFunc;
//To inform the interpreter about paths to Python run-time libraries
Py_SetProgramName(arg->argv[0]);
if(!gil_init){
gil_init=1;
PyEval_InitThreads();
PyEval_SaveThread();
}
PyGILState_STATE gstate = PyGILState_Ensure();
// Build the name object
pName = PyString_FromString(arg->argv[1]);
if( !pName ){
textfile3<<"Can't build the object "<<endl;
}
// Load the module object
pModule = PyImport_Import(pName);
if( !pModule ){
textfile3<<"Can't import the module "<<endl;
}
// pDict is a borrowed reference
pDict = PyModule_GetDict(pModule);
if( !pDict ){
textfile3<<"Can't get the dict"<<endl;
}
// pFunc is also a borrowed reference
pFunc = PyDict_GetItemString(pDict, arg->argv[2]);
if( !pFunc || !PyCallable_Check(pFunc) ){
textfile3<<"Can't get the function"<<endl;
}
/*Call the algorithm and treat the data that is returned from it
...
...
*/
// Clean up
Py_XDECREF(pArgs2);
Py_XDECREF(pValue2);
Py_DECREF(pModule);
Py_DECREF(pName);
PyGILState_Release(gstate);
counter_thread=true;
_endthread();
};
Edit: The python's algorithm is not my work and I shouldn't change it
This is based off of a cursory knowledge of python, and reading the python docs quickly.
PyThreadState_SetAsyncExc lets you inject an exception into a running python thread.
Run your python interpreter in some thread. In another thread, PyGILState_STATE then PyThreadState_SetAsyncExc into the main thread. (This may require some precursor work to teach the python interpreter about the 2nd thread).
Unless the python code you are running is full of "catch alls", this should cause it to terminate execution.
You can also look into the code to create python sub-interpreters, which would let you start up a new script while the old one shuts down.
Py_AddPendingCall is also tempting to use, but there are enough warnings around it maybe not.
Sorry, but your choices are short. You can either change the python code (ok, plugin - not an option) or run it on another PROCESS (with some nice ipc between). Then you can use the system api to wipe it out.
So, I finally thought of a solution (more of a workaround really).
Instead of terminating the thread that is running the algorithm - let's call it T1 -, I create another one -T2 - with the set of data that is relevant at that time.
In every thread i do this:
thread_counter+=1; //global variable
int thisthread=thread_counter;
and after the solution from python is given I just verify which is the most "recent", the one from T1 or from T2:
if(thisthread==thread_counter){
/*save the solution and treat it */
}
Is terms of computer effort this is not the best solution obviously, but it serves my purposes.
Thank you for the help guys
I've been thinking about this problem, and I agree that sub interpreters may provide you one possible solution https://docs.python.org/2/c-api/init.html#sub-interpreter-support. It supports calls for creating new interpreters and ending existing ones. The bugs & caveats sections describes some issues that depending on your architecture may or may not present a problem.
Another possible solution is to use the python multiprocessing module, and within your worker thread test a global variable (something like time_to_die). Then from the parent, you grab the GIL, set the variable, release the GIL and wait for the child to finish.
But then another idea ocurred to me. Why not just use fork(), init your python interpreter in the child and when the parent decides it's time for the python thread to end, just kill it. Something like this:
void process() {
int pid = fork();
if (pid) {
// in parent
sleep(60);
kill(pid, 9);
}
else{
// in child
Py_Initialize();
PyRun_SimpleString("# insert long running python calculation");
}
}
(This example assumes *nix, if you're on windows, substitute CreateProcess()/TerminateProcess())