Send 50KB array from C++ to Python (NumPy) on Windows - python

I have a C++ and a Python application on Windows (7+). I wish to send a ~50KB array of binary data (int[], float[], or double[]) from the C++ application to a NumPy array in the Python application in real time. I want <100ms latency, but can handle up to 500ms. I'm unsure of the correct way to do this.
I believe NumPy technically stores its arrays as just an array of binary data just like C++ (assuming a reasonable C++ compiler, like modern MSVC or GCC). Therefore technically it should be very easy, but I haven't been able to identify a good way to do this.
My current plan would be to use a memory mapped file, and then handle locking the memory-mapped file with more traditional IPC such as the Win32 message pump or a semaphore.
I'm however not sure whether NumPy can read straight from a memory mapped file. It can create a memory-map to a file on disk with numpy.memmap, but this doesn't seem to work for pure memory mapped file where I just have a name or a handle.
I don't know if this is the right approach. Maybe I can get it to work, but ideally I would also want to do it the right way and not be surprised by nasty consequences of me coding stuff I don't understand.
I would appreciate any help or pointers to material that might help me figure out the correct way to do this.
UPDATE:
My C++ code (proof-of-concept) would look like this:
// Host application.
// Creates 20 byte memory mapped file with name "Global\test_mmap_file" and
// containing 5 uint32s.
// Note: Requires SeCreateGlobalPrivilege to create global memory mapped
#include <windows.h>
#include <iostream>
#include <cassert>
file
int main()
{
HANDLE file_mapping_handle = NULL;
unsigned int* buffer = 0;
assert(sizeof(unsigned int) == 4); // Require compatability with np.uint32
const size_t buffer_sz = 5 * sizeof(unsigned int);
file_mapping_handle = CreateFileMapping(
INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE,
0,
buffer_sz,
L"Global\\test_mmap_file");
if (!file_mapping_handle)
{
std::cout << "CreateFileMapping failed (Host).\n";
std::cout << "Error code: 0x" << std::hex << GetLastError() << std::endl;
std::cin.get();
return 1;
}
buffer = (unsigned int*)MapViewOfFile(
file_mapping_handle,
FILE_MAP_ALL_ACCESS,
0,
0,
buffer_sz);
if (!buffer)
{
CloseHandle(file_mapping_handle);
std::cout << " MapViewOfFile failed (Host).\n";
std::cout << "Error code: 0x" << std::hex << GetLastError() << std::endl;
std::cin.get();
return 1;
}
buffer[0] = 2;
buffer[1] = 3;
buffer[2] = 5;
buffer[3] = 7;
buffer[4] = 11;
std::cout << "Data sent, press enter to exit.\n";
std::cin.get();
UnmapViewOfFile(buffer);
CloseHandle(file_mapping_handle);
return 0;
}
I wanted some way to access this shared memory from Python and create a numpy array. I tried,
import numpy as np
L_mm = np.memmap('Global\\test_mmap_file')
but this fails as Global\test_mmap_file is not a filename. Following the hints given by abarnert I constructed the following client program which seems to work:
import numpy as np
import mmap
mm = mmap.mmap(0,20,'Global\\test_mmap_file')
L = np.frombuffer(mm,dtype = np.uint32)
print (L)
mm.close()
This requires admin privileges for both programs to run (or giving the user the right to SeCreateGlobalObjects). However I think this should easily be bypassed by not giving the shared memory a global name, and instead duplicating the handle and passing it to the Python program. It also doesn't control access to the shared memory properly, but that should be easy with a semaphore or some other such construct.

Related

Writing address of a numpy array to a file and then opening it in C++ via ctypes

I was wondering if it's possible to actually write in a file an address of a numpy array, via e.g. ctypeslib.ndpointer or something similar and then open this file in a C++ function, also called through ctypes in the same python process and read this address, convert it to e.g. C++ double array.
This will all be happening in the same python process.
I am aware that it's possible to pass it as a function argument and that works, but that isn't something I'd need.
This is how the code would look like, don't mind the syntax errors:
test.py
with open(path) as f:
f.write(matrix.ctypes.data_as(np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, flags='C_CONTIGUOUS')))
and cpp:
void function()
{
... read file, get address stored into double* array;
e.g. then print out the values
}
Where could I be wrong?
I work on a project where we are writing np array to a file and then reading that file in cpp, which is wasteful. I want to try adjusting it to write and later on read just this address. Sending a ndpointer or something else as a function argument wont work, as that would require editing big partion of the project.
I think that the data of your np.array will be lost once the python program terminates therefore you will not be able to access its memory location once the program ends.
Unfortunately, I don't know how to do it using ctypes but only using the C-API Extention.
With it, you access directly the python variable from c. It is represented by a pointer therefore you could access the address of any python object( therefore also ndarrays).
in python you would write:
import c_module
import NumPy as np
...
a = np.array([...])
#generate the numpy array
...
c_module.c_fun(a)
and then in your c++ code, you will receive the memory address
static PyObject* py_f_roots(PyObject* self, PyObject* args) {
PyObject *np_array_py;
if (!PyArg_ParseTuple(args, "OO", &np_array_py))
return NULL;
//now np_array_py points to the memory cell of the python numpy array a
//if you want to access it you need to cast it to a PyArrayObject *
PyArrayObject *np_array = (PyArrayObject *) np_array_py;
//you can access the data
double *data = (double *) PyArray_DATA(np_array);
return Py_None;
}
The documentation for numpy c API
The reference manual for c python extention
If the Python and C code are run in the same process, then the address you write from Python will be valid in C. I think you want the following:
test.py
import ctypes as ct
import numpy as np
matrix = np.array([1.1,2.2,3.3,4.4,5.5])
# use binary to write the address
with open('addr.bin','wb') as f:
# type of pointer doesn't matter just need the address
f.write(matrix.ctypes.data_as(ct.c_void_p))
# test function to receive the filename
dll = ct.CDLL('./test')
dll.func.argtypes = ct.c_char_p,
dll.func.restype = None
dll.func(b'addr.bin')
test.c
#include <stdio.h>
__declspec(dllexport)
void func(const char* file) {
double* p;
FILE* fp = fopen(file,"rb"); // read the pointer
fread(&p, 1, sizeof(p), fp);
fclose(fp);
for(int i = 0; i < 5; ++i) // dump the elements
printf("%lf\n", p[i]);
}
Output:
1.100000
2.200000
3.300000
4.400000
5.500000

PYBIND11: Make changes to class object value in another c++ thread when python interpreter is embedded and running in another thread

I am just printing the value of car1.vehicle_id in python. I want it to print "1234" for the first 2 seconds and then when the value is changes in another thread to " 4543" the change should take effect in python. Is this possible or is there a simple example to help me with this?
c++
#include <pybind11/embed.h>
#include <string>
#include <thread>
#include <chrono>
// Define namespace for pybind11
namespace py = pybind11;
class Vehiclee
{
// Access specifier
public:
Vehiclee(){};
~Vehiclee() {}
// Data Members
int vehicle_id;
std::string vehicle_name;
std::string vehicle_color;
// Member Functions()
void printname()
{
std::cout << "Vehicle id is: " << vehicle_id;
std::cout << "Vehicle name is: " << vehicle_name;
std::cout << "Vehicle color is: " << vehicle_color;
}
};
PYBIND11_EMBEDDED_MODULE(embeded, m){
py::class_(m, "Vehiclee")
.def_readonly("vehicle_name", &Vehiclee::vehicle_name)
.def_readonly("vehicle_color", &Vehiclee::vehicle_color)
.def_readonly("vehicle_id", &Vehiclee::vehicle_id);
}
py::scoped_interpreter python{};
Vehiclee car1;
void threadFunc()
{
sleep(2);
std::cout<<"entering thread";
car1.vehicle_id = 4543;
std::cout<<"Modified val in thread";
}
int main() {
// Initialize the python interpreter
// Import all the functions from scripts by file name in the working directory
auto simpleFuncs = py::module::import("simpleFuncs");
// Test if C++ objects can be passed into python functions
car1.vehicle_id = 1234;
std::thread t1(threadFunc);
simpleFuncs.attr("simplePrint")(car1);
t1.join();
return 0;
}
python
> import time
> import importlib
> import embeded
>
> def simplePrint(argument):
> while(1):
> importlib.reload(embeded)
> print(argument.vehicle_id) time.sleep(1)
Current output
always 1234
Required output
1234 (for first 2 secs)
4543 (after 2 secs)
You need to understand the C++ rules for threading. In C++, threads can run far better in parallel than in Python. This is because in C++, threads are by default running entirely separate from each other, whereas Python uses a Global Interpreter Lock which causes a lot of thread synchronization.
So, in this case you do need the threads to synchronize, because the threads share a variable (car1). The challenge is that .def_readonly hides some boilerplate code which doesn't do synchronization - makes sense, because what object should it use to synchronize?
So what you need to do is make getter and setter methods in Vehicle, and add a std::mutex. In every getter and every setter, you lock and unlock this mutex. This is easy with a std::scoped_lock - this will automatically unlock the mutex when the method returns.
There are other options. For vehicle_id you could use a std::atomic_int, but you'd probably still need a getter method. I don't think pybind understands atomic variables.

Creating a basic PyTupleObject using Python's C API

I'm having difficulty with creating a PyTupleObject using the Python C api.
#include "Python.h"
int main() {
int err;
Py_ssize_t size = 2;
PyObject *the_tuple = PyTuple_New(size); // this line crashes the program
if (!the_tuple)
std::cerr << "the tuple is null" << std::endl;
err = PyTuple_SetItem(the_tuple, (Py_ssize_t) 0, PyLong_FromLong((long) 5.7));
if (err < 0) {
std::cerr << "first set item failed" << std::endl;
}
err = PyTuple_SetItem(the_tuple, (Py_ssize_t) 1, PyLong_FromLong((long) 5.7));
if (err < 0) {
std::cerr << "second set item failed" << std::endl;
}
return 0;
}
crashes with
Process finished with exit code -1073741819 (0xC0000005)
But so does everything else i've tried so far. Any ideas what I'm doing wrong? Not that I'm just trying to run the as a C++ program, as I'm just trying to do tests on the code before adding a swig typemap.
The commenter #asynts is correct in that you need to initialize the interpreter via Py_Initialize if you want to interact with Python objects (you are, in fact, embedding Python). There are a subset of functions from the API that can safely be called without initializing the interpreter, but creating Python objects do not fall within this subset.
Py_BuildValue may "work" (as in, not creating a segfault with those specific arguments), but it will cause issues elsewhere in the code if you try to do anything with it without having initialized the interpreter.
It seems that you're trying to extend Python rather than embed it, but you're embedding it to test the extension code. You may want to refer to the official documentation for extending Python with C/C++ to guide you through this process.

C++ Python module crashes in Blender but not in Python console

Problem
I am trying to call my C++ code from Python 3.7 with Blender 2.82a (also happens in 2.83). The code should optimize a camera path. It can be used without Blender, however, I use Blender to setup a scene with camera path and query depth values in a scene.
I tried calling the optimization function in C++ and in the Python console. Both worked without any problems. The problem is, when I call it in Blender, Blender crashes.
This is the crash report:
# Blender 2.83.0, Commit date: 2020-06-03 14:38, Hash 211b6c29f771
# backtrace
./blender(BLI_system_backtrace+0x1d) [0x6989e9d]
./blender() [0xc1548f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fa5fb3dc3c0]
/lib/x86_64-linux-gnu/libpthread.so.0(raise+0xcb) [0x7fa5fb3dc24b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fa5fb3dc3c0]
./blender(_ZN5Eigen8IOFormatD1Ev+0xa3) [0x179bc43]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_Z2_zRK6CameraRKN5Eigen6MatrixIdLi9ELi1ELi0ELi9ELi1EEEiiRKSt8functionIFdRK3RayEE+0x2e2) [0x7fa5d1538e72]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_ZN11OpticalFlow13GradPathErrorERKSt6vectorI6CameraSaIS1_EEiiRKSt8functionIFdRK3RayEEd+0x5a7) [0x7fa5d1539c77]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_Z16_gradientDescentRKSt6vectorI6CameraSaIS0_EEiiRKSt8functionIFdRK3RayEEd+0x54b) [0x7fa5d153b5fb]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_ZN11OpticalFlow12OptimizePathERKSt6vectorI6CameraSaIS1_EEiiRKSt8functionIFdRK3RayEEdNS_18OptimizationMethodE+0x22) [0x7fa5d153bcb2]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(+0x3d910) [0x7fa5d1533910]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(+0x317ed) [0x7fa5d15277ed]
./blender(_PyMethodDef_RawFastCallKeywords+0x2f3) [0x570f373]
./blender(_PyCFunction_FastCallKeywords+0x25) [0x570f3f5]
./blender(_PyEval_EvalFrameDefault+0x7468) [0xc0fb48]
./blender(_PyEval_EvalCodeWithName+0xadc) [0x57c0d8c]
./blender(PyEval_EvalCodeEx+0x3e) [0x57c0ebe]
./blender(PyEval_EvalCode+0x1b) [0x57c0eeb]
./blender() [0x11f35ac]
./blender() [0x1600cde]
./blender() [0xec6a93]
./blender() [0xec6d07]
./blender(WM_operator_name_call_ptr+0x1a) [0xec720a]
./blender() [0x14f2082]
./blender() [0x15020d5]
./blender() [0xeca877]
./blender() [0xecaecc]
./blender(wm_event_do_handlers+0x310) [0xecb5e0]
./blender(WM_main+0x20) [0xec2230]
./blender(main+0x321) [0xb4bfd1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa5facb50b3]
./blender() [0xc11c0c]
I am using Eigen for linear algebra calculations and pybind11 to compile it into a python module. The Eigen types are all fixed size as I don't need them to be dynamic (possible reason for the issue). I compile with gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 on Ubuntu 20.04. I currently use c++11 standard but it's not a requirement.
Current findings
With faulthandler.enabled() it gives me
Fatal Python error: Segmentation fault
Current thread 0x00007fa5fab18040 (most recent call first):
File "/Text", line 16 in <module>
I already found that it crashes on the same line in the program, which is when the result of a matrix-vector multiplication is supposed to be returned and inserted into a std::vector. I printed the vector and the matrix prior, to make sure they do not contain garbage and that worked fine.
I also tried to store it in an intermediate variable and print it and then it crashes on printing. The multiplication itself seems not to cause the segfault.
I figured, I try to call the function, where it occurs, directly from Blender, but then it works and returns a result without segfault.
I suspect it to be some kind of memory alignment issue and tried everything suggested here and on the Eigen documentary. Namely, I use Eigen::aligned_allocator in every std::vector, only pass the Eigen objects as const & and have EIGEN_MAKE_ALIGNED_OPERATOR_NEW in the camera and ray class which have Eigen type members.
Using #define EIGEN_DONT_VECTORIZE and #define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT only gave me partial success. It doesn't crash at the same line as before anymore. Weirdly enough, if I also add a cout before the return, the function finishes and returns.
Parts where crashes occur:
The project is not public and the C++ code is quite lengthy, so I only include parts of it. Let me know if you need more. The rest looks very similar, so if there is something conceptually wrong, it is probably wrong in here too. It is not a minimal example (and it contains some debug prints), as I don't know why it happens and the error is not always on the same part.
// in header
// this helped somehow
#define EIGEN_DONT_VECTORIZE
#define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT
// ***********************************
#include <iostream>
#include <numeric>
#include <array>
#include <vector>
#include <Eigen/Dense>
#include <Eigen/StdVector>
#include "matrix_types.h"
#include "camera.h"
// *******************************************************************************
// Vector9d is a typedef
// in cpp
Vector9d _z(const Camera& cam, const Vector9d& derivX, int x_dir, int y_dir, const std::function<double(const Ray&)>& depthTest){
Eigen::IOFormat HeavyFmt(Eigen::FullPrecision, 0, ", ", ",\n", "[", "]", "[", "]");
Matrix9d M0 = OpticalFlow::M(cam, x_dir, y_dir, depthTest);
std::cout << "M_\n" << M0.format(HeavyFmt) << "\n" <<std::endl;
Vector9d z = M0 * derivX;
return z;
}
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> OpticalFlow::GradPathError(const std::vector<Camera>& pathPositions, int x_dir, int y_dir, const std::function<double(const Ray&)>& depthTest, double h){
int n = pathPositions.size()-1;
Eigen::IOFormat HeavyFmt(Eigen::FullPrecision);
Eigen::IOFormat HeavyMtxFmt(Eigen::FullPrecision, 0, ", ", ",\n", "[", "]", "[", "]");
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> gradPE;
gradPE.reserve(n+1);
// save values that will be used more often in calculations
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> derivXs;
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> zs;
std::vector<std::array<Matrix9d, 9>> gradMs;
derivXs.reserve(n+1);
zs.reserve(n+1);
gradMs.reserve(n+1);
for(int i = 0; i<n+1; ++i){
derivXs.push_back(_derivCamPath(pathPositions, i));
Camera cam = pathPositions[i];
Vector9d derivX = _derivCamPath(pathPositions, i);
zs.push_back(_z(cam, derivX, x_dir, y_dir, depthTest)); // <--- crashed here, if vectorization not turned off
gradMs.push_back(GradM(cam, x_dir, y_dir, depthTest, h));
}
for(int i = 0; i<n+1; ++i){
Vector9d derivZ = _derivZ(zs, i);
std::cout << "Zt_" << i << "\n" << derivZ.format(HeavyFmt) << "\n" << std::endl;
gradPE.push_back(1.0/(n+1) * _w(derivZ, derivXs[i], gradMs[i]));
}
// if this is included and vectorization turned off, it doesn't crash
// std::cout << "end" << std::endl;
return gradPE; // <-- crash here if vectorization is off
}
I hope someone can help me find the cause or what else I can try to track it down further. I am not very experienced with C++, so the code might have obvious issues.
I think I found the cause.
This line ./blender(_ZN5Eigen8IOFormatD1Ev+0xd3) [0x2041673] actually named the culprit in a not so readable format. I used gdb to debug the python call from Blender and in the backtrace, the same line is #0 0x0000000002041673 in Eigen::IOFormat::~IOFormat() ()
The problem was I used Eigen::IOFormat in my program to debug print the matrices and vectors for easier copying into another program where I just wanted to check if the values are correct. You can see it in the excerpt code I posted in the question. I only used it in those two functions and the segfault only occurred in those two functions. There might be something else wrong but for now, it seems to work.

How can I handle IPC between C and Python?

I have a an application with two processes, one in C and one in Python. The C process is where all the heavy lifting is done, while the Python process handles the user interface.
The C program writes to a large-ish buffer 4 times per second, and the Python process reads this data. To this point the communication to the Python process has been done by AMQP. I would much rather setup some for of memory sharing between the two processes to reduce overhead and increase performance.
What are my options here? Ideally I would simply have the Python process read the physical memory straight (preferable from memory and not from disk), and then taking care of race conditions with Semaphores or something similar. This is however something I have little experience with, so I'd appreciate any help I can get.
I am using Linux btw.
This question has been asked for a long time. I believe the questioner already has the answer, so I wrote this answer for people later coming.
/*C code*/
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define GETEKYDIR ("/tmp")
#define PROJECTID (2333)
#define SHMSIZE (1024)
void err_exit(char *buf) {
fprintf(stderr, "%s\n", buf);
exit(1);
}
int
main(int argc, char **argv)
{
key_t key = ftok(GETEKYDIR, PROJECTID);
if ( key < 0 )
err_exit("ftok error");
int shmid;
shmid = shmget(key, SHMSIZE, IPC_CREAT | IPC_EXCL | 0664);
if ( shmid == -1 ) {
if ( errno == EEXIST ) {
printf("shared memeory already exist\n");
shmid = shmget(key ,0, 0);
printf("reference shmid = %d\n", shmid);
} else {
perror("errno");
err_exit("shmget error");
}
}
char *addr;
/* Do not to specific the address to attach
* and attach for read & write*/
if ( (addr = shmat(shmid, 0, 0) ) == (void*)-1) {
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Attach shared memory failed\n");
printf("remove shared memory identifier successful\n");
}
err_exit("shmat error");
}
strcpy( addr, "Shared memory test\n" );
printf("Enter to exit");
getchar();
if ( shmdt(addr) < 0)
err_exit("shmdt error");
if (shmctl(shmid, IPC_RMID, NULL) == -1)
err_exit("shmctl error");
else {
printf("Finally\n");
printf("remove shared memory identifier successful\n");
}
return 0;
}
#python
# Install sysv_ipc module firstly if you don't have this
import sysv_ipc as ipc
def main():
path = "/tmp"
key = ipc.ftok(path, 2333)
shm = ipc.SharedMemory(key, 0, 0)
#I found if we do not attach ourselves
#it will attach as ReadOnly.
shm.attach(0,0)
buf = shm.read(19)
print(buf)
shm.detach()
pass
if __name__ == '__main__':
main()
The C program need to be executed firstly and do not just stop it before python code executed, it will create the shared memory segment and write something into it. Then Python code attach the same segment and read data from it.
After done the all things, press enter key to stop C program and remove shared memory ID.
We can see more about SharedMemory for python in here:
http://semanchuk.com/philip/sysv_ipc/#shared_memory
Suggestion #1:
The simplest way should be using TCP. You mentioned your data size is large. Unless your data size is too huge, you should be fine using TCP. Ensure you make separate threads in C and Python for transmitting/receiving data over TCP.
Suggestion #2:
Python supports wrappers over C. One popular wrapper is ctypes - http://docs.python.org/2/library/ctypes.html
Assuming you are familiar with IPC between two C programs through shared-memory, you can write a C-wrapper for your python program which reads data from the shared memory.
Also check the following diccussion which talks about IPC between python and C++:
Simple IPC between C++ and Python (cross platform)
How about writing the weight-lifting code as a library in C and then providing a Python module as wrapper around it? That is actually a pretty usual approach, in particular it allows prototyping and profiling in Python and then moving the performance-critical parts to C.
If you really have a reason to need two processes, there is an XMLRPC package in Python that should facilitate such IPC tasks. In any case, use an existing framework instead of inventing your own IPC, unless you can really prove that performance requires it.

Categories

Resources