Embed Python/C API in a multi-threading C++ program

Embed Python/C API in a multi-threading C++ program - python

I am trying to embed Python in a C++ multi-threading program using the Python/C API (version 3.7.3) on a quad-core ARM 64 bit architecture. A dedicated thread-safe class "PyHandler" takes care of all the Python API calls:
class PyHandler
{
public:
PyHandler();
~PyHandler();
bool run_fun();
// ...
private:
PyGILState_STATE _gstate;
std::mutex _mutex;
}
In the constructor I initialize the Python interpreter:
PyHandler::PyHandler()
{
Py_Initialize();
//PyEval_SaveThread(); // UNCOMMENT TO MAKE EVERYTHING WORK !
}
And in the destructor I undo all initializations:
PyHandler::~PyHandler()
{
_gstate = PyGILState_Ensure();
if (Py_IsInitialized()) // finalize python interpreter
Py_Finalize();
}
Now, in order to make run_fun() callable by one thread at a time, I use the mutex variable _mutex (see below). On top of this, I call PyGILState_Ensure() to make sure the current thread holds the python GIL, and call PyGILState_Release() at the end to release it. All the remaining python calls happen within these two calls:
bool PyHandler::run_fun()
{
std::lock_guard<std::mutex> lockGuard(_mutex);
_gstate = PyGILState_Ensure(); // give the current thread the Python GIL
// Python calls...
PyGILState_Release(_gstate); // release the Python GIL till now assigned to the current thread
return true;
}
Here is how the main() looks like:
int main()
{
PyHandler py; // constructor is called !
int n_threads = 10;
std::vector<std::thread> threads;
for (int i = 0; i < n_threads; i++)
threads.push_back(std::thread([&py]() { py.run_fun(); }));
for (int i = 0; i < n_threads; i++)
if (threads[i].joinable())
threads[i].join();
}
Although all precautions, the program always deadlocks at the PyGILState_Ensure() line in run_fun() during the very first attempt. BUT when I uncomment the line with PyEval_SaveThread() in the constructor everything magically works. Why is that ?
Notice that I am not calling PyEval_RestoreThread() anywhere. Am I supposed to use the macros Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS instead ? I thought these macros and PyEval_SaveThread() are only used dealing with Python threads and NOT with non-Python threads, as in my case! Am I missing something ?
The documentation for my case, only mentions the use of PyGILState_Ensure() and PyGILState_Release. Any help is highly appreciated.

Related

PYBIND11: Make changes to class object value in another c++ thread when python interpreter is embedded and running in another thread

I am just printing the value of car1.vehicle_id in python. I want it to print "1234" for the first 2 seconds and then when the value is changes in another thread to " 4543" the change should take effect in python. Is this possible or is there a simple example to help me with this?
c++
#include <pybind11/embed.h>
#include <string>
#include <thread>
#include <chrono>
// Define namespace for pybind11
namespace py = pybind11;
class Vehiclee
{
// Access specifier
public:
Vehiclee(){};
~Vehiclee() {}
// Data Members
int vehicle_id;
std::string vehicle_name;
std::string vehicle_color;
// Member Functions()
void printname()
{
std::cout << "Vehicle id is: " << vehicle_id;
std::cout << "Vehicle name is: " << vehicle_name;
std::cout << "Vehicle color is: " << vehicle_color;
}
};
PYBIND11_EMBEDDED_MODULE(embeded, m){
py::class_(m, "Vehiclee")
.def_readonly("vehicle_name", &Vehiclee::vehicle_name)
.def_readonly("vehicle_color", &Vehiclee::vehicle_color)
.def_readonly("vehicle_id", &Vehiclee::vehicle_id);
}
py::scoped_interpreter python{};
Vehiclee car1;
void threadFunc()
{
sleep(2);
std::cout<<"entering thread";
car1.vehicle_id = 4543;
std::cout<<"Modified val in thread";
}
int main() {
// Initialize the python interpreter
// Import all the functions from scripts by file name in the working directory
auto simpleFuncs = py::module::import("simpleFuncs");
// Test if C++ objects can be passed into python functions
car1.vehicle_id = 1234;
std::thread t1(threadFunc);
simpleFuncs.attr("simplePrint")(car1);
t1.join();
return 0;
}
python
> import time
> import importlib
> import embeded
>
> def simplePrint(argument):
> while(1):
> importlib.reload(embeded)
> print(argument.vehicle_id) time.sleep(1)
Current output
always 1234
Required output
1234 (for first 2 secs)
4543 (after 2 secs)

You need to understand the C++ rules for threading. In C++, threads can run far better in parallel than in Python. This is because in C++, threads are by default running entirely separate from each other, whereas Python uses a Global Interpreter Lock which causes a lot of thread synchronization.
So, in this case you do need the threads to synchronize, because the threads share a variable (car1). The challenge is that .def_readonly hides some boilerplate code which doesn't do synchronization - makes sense, because what object should it use to synchronize?
So what you need to do is make getter and setter methods in Vehicle, and add a std::mutex. In every getter and every setter, you lock and unlock this mutex. This is easy with a std::scoped_lock - this will automatically unlock the mutex when the method returns.
There are other options. For vehicle_id you could use a std::atomic_int, but you'd probably still need a getter method. I don't think pybind understands atomic variables.

Returning new PyObject * from C++ to Python eventually segfaults

I am writing the C++ and Python side of a library that exposes some functionality in our software written in C++ to Python scripts. I'm compiling some source files of interest and a wrapper file that looks like below into a shared library and loading that library using ctypes.
extern "C" {
PyObject *py_get_cxx_set_EXAMPLE(void)
{
std::set<long> cset = get_cxx_set_for_python();
PyGILState_STATE gstate = PyGILState_Ensure();
PyObject *pyset = PySet_New(NULL);
for (long c_long: cset)
PySet_Add(pyset, PyLong_FromLong(c_long));
PyGILState_Release(gstate);
return pyset;
}
}
and on the python side:
example_lib.py_get_cxx_set_EXAMPLE.restype = ctypes.py_object
for i in range(0, 1000):
ret = example_lib.py_get_cxx_set_EXAMPLE()
I find that the first few calls would be successful, but the C++ code would segfault in the middle of the loop. Upon GDB'ing, I would find the end of the callstack like this:
#0 0x000055555563244a in PyErr_Occurred ()
#1 0x000055555562a387 in _PyObject_GC_Malloc ()
#2 0x0000555555629ebd in _PyObject_GC_New ()
#3 0x000055555562b23c in PyDict_New ()
#4 0x00007ffff66df9be in python::to_python_object<db::pmbus_diagnostics> (t=...) at python_wrapper/python.hpp:101
It looks like the Python runtime refuses to make more Python objects (in this case, a dict) for me...!
What did I do wrong in the C++ code?
EDIT::
Updated, see answer

OK, I forgot to add the code to acquire and release the global interpreter lock for some class of functions. Sorry for the silly question.
Have faith in Python kids.

Is it C++ for loop had require run time?

I had one c++ program that inside for loop, calling a function.
The function is doing a heavy process, it is embedded with python and performing image processing.
My question is, why can it only run at the first instance of the variable?
Main function (I only show the part of code require in this title):
int main(){
for(int a = 0;a<5;a++){
for(int b=0;b<5;b++){
// I want every increment it go to PyRead() function, doing image processing, and compare
if(PyRead()==1){
// some application might be occur
}
else {
}
}
}
PyRead() function, the function in c++ to go into python environment performing image processing:
bool PyRead(){
string data2;
Py_Initialize();
PyRun_SimpleString("print 'hahahahahawwwwwwwwwwwww' ");
char filename[] = "testcapture";
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyObject * moduleObj = PyImport_ImportModule(filename);
if (moduleObj)
{
PyRun_SimpleString("print 'hahahahaha' ");
char functionName[] = "test";
PyObject * functionObj = PyObject_GetAttrString(moduleObj, functionName);
if (functionObj)
{
if (PyCallable_Check(functionObj))
{
PyObject * argsObject = PyTuple_New(0);
if (argsObject)
{
PyObject * resultObject = PyEval_CallObject(functionObj, argsObject);
if (resultObject)
{
if ((resultObject != Py_None)&&(PyString_Check(resultObject)))
{
data2 = PyString_AsString(resultObject);
}
Py_DECREF(resultObject);
}
else if (PyErr_Occurred()) PyErr_Print();
Py_DECREF(argsObject);
}
}
Py_DECREF(functionObj);
}
else PyErr_Clear();
Py_DECREF(moduleObj);
}
Py_Finalize();
std::cout << "The Python test function returned: " << data2<< std::endl;
cout << "Data2 \n" << data2;
if(compareID(data2) == 1)
return true;
else
return false;
}
This is second time I ask this question in stack overflow. I hope this time this question will be more clear!
I can successful compile with no error.
When I run the program, I realize at a=0, b=0 it will go to PyRead() function and return value, after that it go to a=0, b=1, at that moment the whole program will end.
It supposes to go to PyRead() function again, but it does not do that and straight ending the program.
I must strongly mention that PyRead() function needed a long time to run (30seconds).
I had no idea what happens, seeking for somehelp. Please focus on the Bold part to understand my question.
Thanks.

See the comment in https://docs.python.org/2/c-api/init.html#c.Py_Finalize
Ideally, this frees all memory allocated by the Python interpreter.
Dynamically loaded extension modules loaded by Python are not unloaded.
Some extensions may not work properly if their initialization routine is called more than once
It seems your module, does not play well with this function.
A workaround can be - create the script on the fly and call it with python subprocess.

Stop embedded python

I'm embedding python in a C++ plug-in. The plug-in calls a python algorithm dozens of times during each session, each time sending the algorithm different data. So far so good
But now I have a problem:
The algorithm takes sometimes minutes to solve and to return a solution, and during that time often the conditions change making that solution irrelevant. So, what I want is to stop the running of the algorithm at any moment, and run it immediately after with other set of data.
Here's the C++ code for embedding python that I have so far:
void py_embed (void*data){
counter_thread=false;
PyObject *pName, *pModule, *pDict, *pFunc;
//To inform the interpreter about paths to Python run-time libraries
Py_SetProgramName(arg->argv[0]);
if(!gil_init){
gil_init=1;
PyEval_InitThreads();
PyEval_SaveThread();
}
PyGILState_STATE gstate = PyGILState_Ensure();
// Build the name object
pName = PyString_FromString(arg->argv[1]);
if( !pName ){
textfile3<<"Can't build the object "<<endl;
}
// Load the module object
pModule = PyImport_Import(pName);
if( !pModule ){
textfile3<<"Can't import the module "<<endl;
}
// pDict is a borrowed reference
pDict = PyModule_GetDict(pModule);
if( !pDict ){
textfile3<<"Can't get the dict"<<endl;
}
// pFunc is also a borrowed reference
pFunc = PyDict_GetItemString(pDict, arg->argv[2]);
if( !pFunc || !PyCallable_Check(pFunc) ){
textfile3<<"Can't get the function"<<endl;
}
/*Call the algorithm and treat the data that is returned from it
...
...
*/
// Clean up
Py_XDECREF(pArgs2);
Py_XDECREF(pValue2);
Py_DECREF(pModule);
Py_DECREF(pName);
PyGILState_Release(gstate);
counter_thread=true;
_endthread();
};
Edit: The python's algorithm is not my work and I shouldn't change it

This is based off of a cursory knowledge of python, and reading the python docs quickly.
PyThreadState_SetAsyncExc lets you inject an exception into a running python thread.
Run your python interpreter in some thread. In another thread, PyGILState_STATE then PyThreadState_SetAsyncExc into the main thread. (This may require some precursor work to teach the python interpreter about the 2nd thread).
Unless the python code you are running is full of "catch alls", this should cause it to terminate execution.
You can also look into the code to create python sub-interpreters, which would let you start up a new script while the old one shuts down.
Py_AddPendingCall is also tempting to use, but there are enough warnings around it maybe not.

Sorry, but your choices are short. You can either change the python code (ok, plugin - not an option) or run it on another PROCESS (with some nice ipc between). Then you can use the system api to wipe it out.

So, I finally thought of a solution (more of a workaround really).
Instead of terminating the thread that is running the algorithm - let's call it T1 -, I create another one -T2 - with the set of data that is relevant at that time.
In every thread i do this:
thread_counter+=1; //global variable
int thisthread=thread_counter;
and after the solution from python is given I just verify which is the most "recent", the one from T1 or from T2:
if(thisthread==thread_counter){
/*save the solution and treat it */
}
Is terms of computer effort this is not the best solution obviously, but it serves my purposes.
Thank you for the help guys

I've been thinking about this problem, and I agree that sub interpreters may provide you one possible solution https://docs.python.org/2/c-api/init.html#sub-interpreter-support. It supports calls for creating new interpreters and ending existing ones. The bugs & caveats sections describes some issues that depending on your architecture may or may not present a problem.
Another possible solution is to use the python multiprocessing module, and within your worker thread test a global variable (something like time_to_die). Then from the parent, you grab the GIL, set the variable, release the GIL and wait for the child to finish.
But then another idea ocurred to me. Why not just use fork(), init your python interpreter in the child and when the parent decides it's time for the python thread to end, just kill it. Something like this:
void process() {
int pid = fork();
if (pid) {
// in parent
sleep(60);
kill(pid, 9);
}
else{
// in child
Py_Initialize();
PyRun_SimpleString("# insert long running python calculation");
}
}
(This example assumes *nix, if you're on windows, substitute CreateProcess()/TerminateProcess())

How can I handle IPC between C and Python?

I have a an application with two processes, one in C and one in Python. The C process is where all the heavy lifting is done, while the Python process handles the user interface.
The C program writes to a large-ish buffer 4 times per second, and the Python process reads this data. To this point the communication to the Python process has been done by AMQP. I would much rather setup some for of memory sharing between the two processes to reduce overhead and increase performance.
What are my options here? Ideally I would simply have the Python process read the physical memory straight (preferable from memory and not from disk), and then taking care of race conditions with Semaphores or something similar. This is however something I have little experience with, so I'd appreciate any help I can get.
I am using Linux btw.

This question has been asked for a long time. I believe the questioner already has the answer, so I wrote this answer for people later coming.
/*C code*/
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define GETEKYDIR ("/tmp")
#define PROJECTID (2333)
#define SHMSIZE (1024)
void err_exit(char *buf) {
fprintf(stderr, "%s\n", buf);
exit(1);
}
int
main(int argc, char **argv)
{
key_t key = ftok(GETEKYDIR, PROJECTID);
if ( key < 0 )
err_exit("ftok error");
int shmid;
shmid = shmget(key, SHMSIZE, IPC_CREAT | IPC_EXCL | 0664);
if ( shmid == -1 ) {
if ( errno == EEXIST ) {
printf("shared memeory already exist\n");
shmid = shmget(key ,0, 0);
printf("reference shmid = %d\n", shmid);
} else {
perror("errno");
err_exit("shmget error");
}
}
char *addr;
/* Do not to specific the address to attach
* and attach for read & write*/
if ( (addr = shmat(shmid, 0, 0) ) == (void*)-1) {
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Attach shared memory failed\n");
printf("remove shared memory identifier successful\n");
}
err_exit("shmat error");
}
strcpy( addr, "Shared memory test\n" );
printf("Enter to exit");
getchar();
if ( shmdt(addr) < 0)
err_exit("shmdt error");
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Finally\n");
printf("remove shared memory identifier successful\n");
}
return 0;
}
#python
# Install sysv_ipc module firstly if you don't have this
import sysv_ipc as ipc
def main():
path = "/tmp"
key = ipc.ftok(path, 2333)
shm = ipc.SharedMemory(key, 0, 0)
#I found if we do not attach ourselves
#it will attach as ReadOnly.
shm.attach(0,0)
buf = shm.read(19)
print(buf)
shm.detach()
pass
if __name__ == '__main__':
main()
The C program need to be executed firstly and do not just stop it before python code executed, it will create the shared memory segment and write something into it. Then Python code attach the same segment and read data from it.
After done the all things, press enter key to stop C program and remove shared memory ID.
We can see more about SharedMemory for python in here:
http://semanchuk.com/philip/sysv_ipc/#shared_memory

Suggestion #1:
The simplest way should be using TCP. You mentioned your data size is large. Unless your data size is too huge, you should be fine using TCP. Ensure you make separate threads in C and Python for transmitting/receiving data over TCP.
Suggestion #2:
Python supports wrappers over C. One popular wrapper is ctypes - http://docs.python.org/2/library/ctypes.html
Assuming you are familiar with IPC between two C programs through shared-memory, you can write a C-wrapper for your python program which reads data from the shared memory.
Also check the following diccussion which talks about IPC between python and C++:
Simple IPC between C++ and Python (cross platform)

How about writing the weight-lifting code as a library in C and then providing a Python module as wrapper around it? That is actually a pretty usual approach, in particular it allows prototyping and profiling in Python and then moving the performance-critical parts to C.
If you really have a reason to need two processes, there is an XMLRPC package in Python that should facilitate such IPC tasks. In any case, use an existing framework instead of inventing your own IPC, unless you can really prove that performance requires it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Embed Python/C API in a multi-threading C++ program - python

Related

PYBIND11: Make changes to class object value in another c++ thread when python interpreter is embedded and running in another thread

Returning new PyObject * from C++ to Python eventually segfaults

Is it C++ for loop had require run time?

Stop embedded python

How can I handle IPC between C and Python?

Categories

Resources