I'm embedding python in a C++ plug-in. The plug-in calls a python algorithm dozens of times during each session, each time sending the algorithm different data. So far so good
But now I have a problem:
The algorithm takes sometimes minutes to solve and to return a solution, and during that time often the conditions change making that solution irrelevant. So, what I want is to stop the running of the algorithm at any moment, and run it immediately after with other set of data.
Here's the C++ code for embedding python that I have so far:
void py_embed (void*data){
counter_thread=false;
PyObject *pName, *pModule, *pDict, *pFunc;
//To inform the interpreter about paths to Python run-time libraries
Py_SetProgramName(arg->argv[0]);
if(!gil_init){
gil_init=1;
PyEval_InitThreads();
PyEval_SaveThread();
}
PyGILState_STATE gstate = PyGILState_Ensure();
// Build the name object
pName = PyString_FromString(arg->argv[1]);
if( !pName ){
textfile3<<"Can't build the object "<<endl;
}
// Load the module object
pModule = PyImport_Import(pName);
if( !pModule ){
textfile3<<"Can't import the module "<<endl;
}
// pDict is a borrowed reference
pDict = PyModule_GetDict(pModule);
if( !pDict ){
textfile3<<"Can't get the dict"<<endl;
}
// pFunc is also a borrowed reference
pFunc = PyDict_GetItemString(pDict, arg->argv[2]);
if( !pFunc || !PyCallable_Check(pFunc) ){
textfile3<<"Can't get the function"<<endl;
}
/*Call the algorithm and treat the data that is returned from it
...
...
*/
// Clean up
Py_XDECREF(pArgs2);
Py_XDECREF(pValue2);
Py_DECREF(pModule);
Py_DECREF(pName);
PyGILState_Release(gstate);
counter_thread=true;
_endthread();
};
Edit: The python's algorithm is not my work and I shouldn't change it
This is based off of a cursory knowledge of python, and reading the python docs quickly.
PyThreadState_SetAsyncExc lets you inject an exception into a running python thread.
Run your python interpreter in some thread. In another thread, PyGILState_STATE then PyThreadState_SetAsyncExc into the main thread. (This may require some precursor work to teach the python interpreter about the 2nd thread).
Unless the python code you are running is full of "catch alls", this should cause it to terminate execution.
You can also look into the code to create python sub-interpreters, which would let you start up a new script while the old one shuts down.
Py_AddPendingCall is also tempting to use, but there are enough warnings around it maybe not.
Sorry, but your choices are short. You can either change the python code (ok, plugin - not an option) or run it on another PROCESS (with some nice ipc between). Then you can use the system api to wipe it out.
So, I finally thought of a solution (more of a workaround really).
Instead of terminating the thread that is running the algorithm - let's call it T1 -, I create another one -T2 - with the set of data that is relevant at that time.
In every thread i do this:
thread_counter+=1; //global variable
int thisthread=thread_counter;
and after the solution from python is given I just verify which is the most "recent", the one from T1 or from T2:
if(thisthread==thread_counter){
/*save the solution and treat it */
}
Is terms of computer effort this is not the best solution obviously, but it serves my purposes.
Thank you for the help guys
I've been thinking about this problem, and I agree that sub interpreters may provide you one possible solution https://docs.python.org/2/c-api/init.html#sub-interpreter-support. It supports calls for creating new interpreters and ending existing ones. The bugs & caveats sections describes some issues that depending on your architecture may or may not present a problem.
Another possible solution is to use the python multiprocessing module, and within your worker thread test a global variable (something like time_to_die). Then from the parent, you grab the GIL, set the variable, release the GIL and wait for the child to finish.
But then another idea ocurred to me. Why not just use fork(), init your python interpreter in the child and when the parent decides it's time for the python thread to end, just kill it. Something like this:
void process() {
int pid = fork();
if (pid) {
// in parent
sleep(60);
kill(pid, 9);
}
else{
// in child
Py_Initialize();
PyRun_SimpleString("# insert long running python calculation");
}
}
(This example assumes *nix, if you're on windows, substitute CreateProcess()/TerminateProcess())
Related
I am trying to embed Python in a C++ multi-threading program using the Python/C API (version 3.7.3) on a quad-core ARM 64 bit architecture. A dedicated thread-safe class "PyHandler" takes care of all the Python API calls:
class PyHandler
{
public:
PyHandler();
~PyHandler();
bool run_fun();
// ...
private:
PyGILState_STATE _gstate;
std::mutex _mutex;
}
In the constructor I initialize the Python interpreter:
PyHandler::PyHandler()
{
Py_Initialize();
//PyEval_SaveThread(); // UNCOMMENT TO MAKE EVERYTHING WORK !
}
And in the destructor I undo all initializations:
PyHandler::~PyHandler()
{
_gstate = PyGILState_Ensure();
if (Py_IsInitialized()) // finalize python interpreter
Py_Finalize();
}
Now, in order to make run_fun() callable by one thread at a time, I use the mutex variable _mutex (see below). On top of this, I call PyGILState_Ensure() to make sure the current thread holds the python GIL, and call PyGILState_Release() at the end to release it. All the remaining python calls happen within these two calls:
bool PyHandler::run_fun()
{
std::lock_guard<std::mutex> lockGuard(_mutex);
_gstate = PyGILState_Ensure(); // give the current thread the Python GIL
// Python calls...
PyGILState_Release(_gstate); // release the Python GIL till now assigned to the current thread
return true;
}
Here is how the main() looks like:
int main()
{
PyHandler py; // constructor is called !
int n_threads = 10;
std::vector<std::thread> threads;
for (int i = 0; i < n_threads; i++)
threads.push_back(std::thread([&py]() { py.run_fun(); }));
for (int i = 0; i < n_threads; i++)
if (threads[i].joinable())
threads[i].join();
}
Although all precautions, the program always deadlocks at the PyGILState_Ensure() line in run_fun() during the very first attempt. BUT when I uncomment the line with PyEval_SaveThread() in the constructor everything magically works. Why is that ?
Notice that I am not calling PyEval_RestoreThread() anywhere. Am I supposed to use the macros Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS instead ? I thought these macros and PyEval_SaveThread() are only used dealing with Python threads and NOT with non-Python threads, as in my case! Am I missing something ?
The documentation for my case, only mentions the use of PyGILState_Ensure() and PyGILState_Release. Any help is highly appreciated.
I am trying to use Jedi https://github.com/davidhalter/jedi to create a custom python editor and I am using c++, it works perfectly, but it is a bit slow and stalls for a small while, so I am calling those functions from inside a thread in c++, but doing so I am having a stack overflow error sometimes.
Here is my code:
//~ Create Script Instance class
PyObject* pScript = PyObject_Call(
PyObject_GetAttrString(GetJediModule(), "Script"),
PyTuple_Pack(1, PyString_FromString(TCHAR_TO_UTF8(*Source))),
NULL);
if (pScript == NULL)
{
UE_LOG(LogTemp, Verbose, TEXT("unable to get Script Class from Jedi Module"));
Py_DECREF(pScript);
ClearPython();
return;
}
//~ Call complete method from Script Class
PyObject* Result = PyObject_Call(
PyObject_GetAttrString(pScript, "complete"),
PyTuple_Pack(2, Py_BuildValue("i", Line), Py_BuildValue("i", Offset)),
NULL);
if (Result == NULL)
{
UE_LOG(LogTemp, Verbose, TEXT("unable to call complete method from Script class"));
Py_DECREF(Result);
ClearPython();
return;
}
The error happens when call PyObject_Call, I assume is because the thread, since it works perfectly when I call the function from the main thread, but the stack is not telling me anything useful, just an error inside the python.dll
Well I found the answer just by luck, it is possible to choose the stack size when I launch my thread in UE and I was using a super tiny value of 1024, I did a small modification and I have been testing for 3 hours without crashes anymore, so I guess is safe to assume is working now.
Here is how I setup the stack size, third arg is the stack size:
Thread = FRunnableThread::Create(this, TEXT("FAutoCompleteWorker"), 8 * 8 * 4096, TPri_Normal);
I currently have a piece of hardware connected to C++ code using the MFC (Windows programming) framework. Basically the hardware is passing in image frames to my C++ code. In my C++ code, I am then calling a Python script using the CPython (Python embedding in C++) API to execute a model on that image. I've been noticing some weird behavior with the images though.
My C++ code is executing my Python script perfectly until some frame in the range of 80-90. After that point, my C++ code, for some reason, just stops executing the Python script. Despite that, the C++ code is still running normally - EXCEPT for the fact (which I just stated) that it's not executing the Python script.
Something to note: my Python script takes 5 seconds to execute the FIRST time, but then only 0.02 seconds to execute each frame after that first frame (I think due to the model getting set up).
At first, I thought it was a problem with the speed, so I replaced all my Python code with just a "time.sleep()" call with varying time, and, even if I sleep 5 seconds each C++ call to Python still always gets executed. As a result, I don't think it's a matter of the total time. For instance, if I do "time.sleep(1)" which sleeps for a second (which is longer than my Python script execution time AFTER the first frame), my Python script still always gets executed.
Does anyone have any idea why this might be happening? Could it be because of the uneven running times? Since it's taking 5 seconds to run the first frame and then significantly faster for each frame after that. Could it be that the Python is somehow unable to catch up after that time period?
This is my first time executing C++/Python on hardware, so I'm also new to this. Any helps would be greatly appreciated!
To give some idea of my code, here is a snippet:
if (pFuncFrame && PyCallable_Check(pFuncFrame)) {
PyObject* pArgs = PyTuple_New(1);
PyTuple_SetItem(pArgs, 0, PyUnicode_FromString("img.bmp"));
PyObject_CallObject(pFuncFrame, pArgs);
std::cout << "Called the frame function";
}
else {
std::cout << "Did not get the frame function";
}
I'm willing to bet that the first execution ends in a Python exception which isn't cleared until you execute some new Python statement in the second iteration, which therefore fails immediately. I recommend fixing the memory leaks and adding some error handling code to get some diagnostics (which will be useful either which way). For example (haven't tried, since you didn't provide a compilable example, but the following shouldn't be too far off):
if (pFuncFrame && PyCallable_Check(pFuncFrame)) {
PyObject* pArgs = PyTuple_New(1);
PyTuple_SetItem(pArgs, 0, PyUnicode_FromString("img.bmp"));
PyObject* res = PyObject_CallObject(pFuncFrame, pArgs);
if (!res) {
if (PyErr_Occurred()) PyErr_Print();
else std::cerr << "Python exception without error set\n";
} else {
Py_DECREF(res);
std::cout << "Called the frame function";
}
Py_DECREF(pArgs);
}
else {
std::cout << "Did not get the frame function";
}
I have a an application with two processes, one in C and one in Python. The C process is where all the heavy lifting is done, while the Python process handles the user interface.
The C program writes to a large-ish buffer 4 times per second, and the Python process reads this data. To this point the communication to the Python process has been done by AMQP. I would much rather setup some for of memory sharing between the two processes to reduce overhead and increase performance.
What are my options here? Ideally I would simply have the Python process read the physical memory straight (preferable from memory and not from disk), and then taking care of race conditions with Semaphores or something similar. This is however something I have little experience with, so I'd appreciate any help I can get.
I am using Linux btw.
This question has been asked for a long time. I believe the questioner already has the answer, so I wrote this answer for people later coming.
/*C code*/
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define GETEKYDIR ("/tmp")
#define PROJECTID (2333)
#define SHMSIZE (1024)
void err_exit(char *buf) {
fprintf(stderr, "%s\n", buf);
exit(1);
}
int
main(int argc, char **argv)
{
key_t key = ftok(GETEKYDIR, PROJECTID);
if ( key < 0 )
err_exit("ftok error");
int shmid;
shmid = shmget(key, SHMSIZE, IPC_CREAT | IPC_EXCL | 0664);
if ( shmid == -1 ) {
if ( errno == EEXIST ) {
printf("shared memeory already exist\n");
shmid = shmget(key ,0, 0);
printf("reference shmid = %d\n", shmid);
} else {
perror("errno");
err_exit("shmget error");
}
}
char *addr;
/* Do not to specific the address to attach
* and attach for read & write*/
if ( (addr = shmat(shmid, 0, 0) ) == (void*)-1) {
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Attach shared memory failed\n");
printf("remove shared memory identifier successful\n");
}
err_exit("shmat error");
}
strcpy( addr, "Shared memory test\n" );
printf("Enter to exit");
getchar();
if ( shmdt(addr) < 0)
err_exit("shmdt error");
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Finally\n");
printf("remove shared memory identifier successful\n");
}
return 0;
}
#python
# Install sysv_ipc module firstly if you don't have this
import sysv_ipc as ipc
def main():
path = "/tmp"
key = ipc.ftok(path, 2333)
shm = ipc.SharedMemory(key, 0, 0)
#I found if we do not attach ourselves
#it will attach as ReadOnly.
shm.attach(0,0)
buf = shm.read(19)
print(buf)
shm.detach()
pass
if __name__ == '__main__':
main()
The C program need to be executed firstly and do not just stop it before python code executed, it will create the shared memory segment and write something into it. Then Python code attach the same segment and read data from it.
After done the all things, press enter key to stop C program and remove shared memory ID.
We can see more about SharedMemory for python in here:
http://semanchuk.com/philip/sysv_ipc/#shared_memory
Suggestion #1:
The simplest way should be using TCP. You mentioned your data size is large. Unless your data size is too huge, you should be fine using TCP. Ensure you make separate threads in C and Python for transmitting/receiving data over TCP.
Suggestion #2:
Python supports wrappers over C. One popular wrapper is ctypes - http://docs.python.org/2/library/ctypes.html
Assuming you are familiar with IPC between two C programs through shared-memory, you can write a C-wrapper for your python program which reads data from the shared memory.
Also check the following diccussion which talks about IPC between python and C++:
Simple IPC between C++ and Python (cross platform)
How about writing the weight-lifting code as a library in C and then providing a Python module as wrapper around it? That is actually a pretty usual approach, in particular it allows prototyping and profiling in Python and then moving the performance-critical parts to C.
If you really have a reason to need two processes, there is an XMLRPC package in Python that should facilitate such IPC tasks. In any case, use an existing framework instead of inventing your own IPC, unless you can really prove that performance requires it.
I want to extend a large C project with some new functionality, but I really want to write it in Python. Basically, I want to call Python code from C code. However, Python->C wrappers like SWIG allow for the OPPOSITE, that is writing C modules and calling C from Python.
I'm considering an approach involving IPC or RPC (I don't mind having multiple processes); that is, having my pure-Python component run in a separate process (on the same machine) and having my C project communicate with it by writing/reading from a socket (or unix pipe). my python component can read/write to socket to communicate. Is that a reasonable approach? Is there something better? Like some special RPC mechanism?
Thanks for the answer so far - however, i'd like to focus on IPC-based approaches since I want to have my Python program in a separate process as my C program. I don't want to embed a Python interpreter. Thanks!
I recommend the approaches detailed here. It starts by explaining how to execute strings of Python code, then from there details how to set up a Python environment to interact with your C program, call Python functions from your C code, manipulate Python objects from your C code, etc.
EDIT: If you really want to go the route of IPC, then you'll want to use the struct module or better yet, protlib. Most communication between a Python and C process revolves around passing structs back and forth, either over a socket or through shared memory.
I recommend creating a Command struct with fields and codes to represent commands and their arguments. I can't give much more specific advice without knowing more about what you want to accomplish, but in general I recommend the protlib library, since it's what I use to communicate between C and Python programs (disclaimer: I am the author of protlib).
Have you considered just wrapping your python application in a shell script and invoking it from within your C application?
Not the most elegant solution, but it is very simple.
See the relevant chapter in the manual: http://docs.python.org/extending/
Essentially you'll have to embed the python interpreter into your program.
I haven't used an IPC approach for Python<->C communication but it should work pretty well. I would have the C program do a standard fork-exec and use redirected stdin and stdout in the child process for the communication. A nice text-based communication will make it very easy to develop and test the Python program.
If I had decided to go with IPC, I'd probably splurge with XML-RPC -- cross-platform, lets you easily put the Python server project on a different node later if you want, has many excellent implementations (see here for many, including C and Python ones, and here for the simple XML-RPC server that's part the Python standard library -- not as highly scalable as other approaches but probably fine and convenient for your use case).
It may not be a perfect IPC approach for all cases (or even a perfect RPC one, by all means!), but the convenience, flexibility, robustness, and broad range of implementations outweigh a lot of minor defects, in my opinion.
This seems quite nice http://thrift.apache.org/, there is even a book about it.
Details:
The Apache Thrift software framework, for scalable cross-language
services development, combines a software stack with a code generation
engine to build services that work efficiently and seamlessly between
C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa,
JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
I've used the "standard" approach of Embedding Python in Another Application. But it's complicated/tedious. Each new function in Python is painful to implement.
I saw an example of Calling PyPy from C. It uses CFFI to simplify the interface but it requires PyPy, not Python. Read and understand this example first, at least at a high level.
I modified the C/PyPy example to work with Python. Here's how to call Python from C using CFFI.
My example is more complicated because I implemented three functions in Python instead of one. I wanted to cover additional aspects of passing data back and forth.
The complicated part is now isolated to passing the address of api to Python. That only has to be implemented once. After that it's easy to add new functions in Python.
interface.h
// These are the three functions that I implemented in Python.
// Any additional function would be added here.
struct API {
double (*add_numbers)(double x, double y);
char* (*dump_buffer)(char *buffer, int buffer_size);
int (*release_object)(char *obj);
};
test_cffi.c
//
// Calling Python from C.
// Based on Calling PyPy from C:
// http://doc.pypy.org/en/latest/embedding.html#more-complete-example
//
#include <stdio.h>
#include <assert.h>
#include "Python.h"
#include "interface.h"
struct API api; /* global var */
int main(int argc, char *argv[])
{
int rc;
// Start Python interpreter and initialize "api" in interface.py using
// old style "Embedding Python in Another Application":
// https://docs.python.org/2/extending/embedding.html#embedding-python-in-another-application
PyObject *pName, *pModule, *py_results;
PyObject *fill_api;
#define PYVERIFY(exp) if ((exp) == 0) { fprintf(stderr, "%s[%d]: ", __FILE__, __LINE__); PyErr_Print(); exit(1); }
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import sys;"
"sys.path.insert(0, '.')" );
PYVERIFY( pName = PyString_FromString("interface") )
PYVERIFY( pModule = PyImport_Import(pName) )
Py_DECREF(pName);
PYVERIFY( fill_api = PyObject_GetAttrString(pModule, "fill_api") )
// "k" = [unsigned long],
// see https://docs.python.org/2/c-api/arg.html#c.Py_BuildValue
PYVERIFY( py_results = PyObject_CallFunction(fill_api, "k", &api) )
assert(py_results == Py_None);
// Call Python function from C using cffi.
printf("sum: %f\n", api.add_numbers(12.3, 45.6));
// More complex example.
char buffer[20];
char * result = api.dump_buffer(buffer, sizeof buffer);
assert(result != 0);
printf("buffer: %s\n", result);
// Let Python perform garbage collection on result now.
rc = api.release_object(result);
assert(rc == 0);
// Close Python interpreter.
Py_Finalize();
return 0;
}
interface.py
import cffi
import sys
import traceback
ffi = cffi.FFI()
ffi.cdef(file('interface.h').read())
# Hold references to objects to prevent garbage collection.
noGCDict = {}
# Add two numbers.
# This function was copied from the PyPy example.
#ffi.callback("double (double, double)")
def add_numbers(x, y):
return x + y
# Convert input buffer to repr(buffer).
#ffi.callback("char *(char*, int)")
def dump_buffer(buffer, buffer_len):
try:
# First attempt to access data in buffer.
# Using the ffi/lib objects:
# http://cffi.readthedocs.org/en/latest/using.html#using-the-ffi-lib-objects
# One char at time, Looks inefficient.
#data = ''.join([buffer[i] for i in xrange(buffer_len)])
# Second attempt.
# FFI Interface:
# http://cffi.readthedocs.org/en/latest/using.html#ffi-interface
# Works but doc says "str() gives inconsistent results".
#data = str( ffi.buffer(buffer, buffer_len) )
# Convert C buffer to Python str.
# Doc says [:] is recommended instead of str().
data = ffi.buffer(buffer, buffer_len)[:]
# The goal is to return repr(data)
# but it has to be converted to a C buffer.
result = ffi.new('char []', repr(data))
# Save reference to data so it's not freed until released by C program.
noGCDict[ffi.addressof(result)] = result
return result
except:
print >>sys.stderr, traceback.format_exc()
return ffi.NULL
# Release object so that Python can reclaim the memory.
#ffi.callback("int (char*)")
def release_object(ptr):
try:
del noGCDict[ptr]
return 0
except:
print >>sys.stderr, traceback.format_exc()
return 1
def fill_api(ptr):
global api
api = ffi.cast("struct API*", ptr)
api.add_numbers = add_numbers
api.dump_buffer = dump_buffer
api.release_object = release_object
Compile:
gcc -o test_cffi test_cffi.c -I/home/jmudd/pgsql-native/Python-2.7.10.install/include/python2.7 -L/home/jmudd/pgsql-native/Python-2.7.10.install/lib -lpython2.7
Execute:
$ test_cffi
sum: 57.900000
buffer: 'T\x9e\x04\x08\xa8\x93\xff\xbf]\x86\x04\x08\x00\x00\x00\x00\x00\x00\x00\x00'
$
Few tips for binding it with Python 3
file() not supported, use open()
ffi.cdef(open('interface.h').read())
PyObject* PyStr_FromString(const char *u)
Create a PyStr from a UTF-8 encoded null-terminated character buffer.
Python 2: PyString_FromString
Python 3: PyUnicode_FromString
Change to: PYVERIFY( pName = PyUnicode_FromString("interface") )
Program name
wchar_t *name = Py_DecodeLocale(argv[0], NULL);
Py_SetProgramName(name);
for compiling
gcc cc.c -o cc -I/usr/include/python3.6m -I/usr/include/x86_64-linux-gnu/python3.6m -lpython3.6m
I butchered dump def .. maybe it will give some ideas
def get_prediction(buffer, buffer_len):
try:
data = ffi.buffer(buffer, buffer_len)[:]
result = ffi.new('char []', data)
print('\n I am doing something here here........',data )
resultA = ffi.new('char []', b"Failed") ### New message
##noGCDict[ffi.addressof(resultA)] = resultA
return resultA
except:
print >>sys.stderr, traceback.format_exc()
return ffi.NULL
}
Hopefully it will help and save you some time
apparently Python need to be able to compile to win32 dll, it will solve the problem
In such a way that converting c# code to win32 dlls will make it usable by any development tool