Problem
I am trying to call my C++ code from Python 3.7 with Blender 2.82a (also happens in 2.83). The code should optimize a camera path. It can be used without Blender, however, I use Blender to setup a scene with camera path and query depth values in a scene.
I tried calling the optimization function in C++ and in the Python console. Both worked without any problems. The problem is, when I call it in Blender, Blender crashes.
This is the crash report:
# Blender 2.83.0, Commit date: 2020-06-03 14:38, Hash 211b6c29f771
# backtrace
./blender(BLI_system_backtrace+0x1d) [0x6989e9d]
./blender() [0xc1548f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fa5fb3dc3c0]
/lib/x86_64-linux-gnu/libpthread.so.0(raise+0xcb) [0x7fa5fb3dc24b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fa5fb3dc3c0]
./blender(_ZN5Eigen8IOFormatD1Ev+0xa3) [0x179bc43]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_Z2_zRK6CameraRKN5Eigen6MatrixIdLi9ELi1ELi0ELi9ELi1EEEiiRKSt8functionIFdRK3RayEE+0x2e2) [0x7fa5d1538e72]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_ZN11OpticalFlow13GradPathErrorERKSt6vectorI6CameraSaIS1_EEiiRKSt8functionIFdRK3RayEEd+0x5a7) [0x7fa5d1539c77]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_Z16_gradientDescentRKSt6vectorI6CameraSaIS0_EEiiRKSt8functionIFdRK3RayEEd+0x54b) [0x7fa5d153b5fb]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(_ZN11OpticalFlow12OptimizePathERKSt6vectorI6CameraSaIS1_EEiiRKSt8functionIFdRK3RayEEdNS_18OptimizationMethodE+0x22) [0x7fa5d153bcb2]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(+0x3d910) [0x7fa5d1533910]
/home/name/Programs/blender-2.83.0-linux64/2.83/python/lib/python3.7/optFlowCam.cpython-37m-x86_64-linux-gnu.so(+0x317ed) [0x7fa5d15277ed]
./blender(_PyMethodDef_RawFastCallKeywords+0x2f3) [0x570f373]
./blender(_PyCFunction_FastCallKeywords+0x25) [0x570f3f5]
./blender(_PyEval_EvalFrameDefault+0x7468) [0xc0fb48]
./blender(_PyEval_EvalCodeWithName+0xadc) [0x57c0d8c]
./blender(PyEval_EvalCodeEx+0x3e) [0x57c0ebe]
./blender(PyEval_EvalCode+0x1b) [0x57c0eeb]
./blender() [0x11f35ac]
./blender() [0x1600cde]
./blender() [0xec6a93]
./blender() [0xec6d07]
./blender(WM_operator_name_call_ptr+0x1a) [0xec720a]
./blender() [0x14f2082]
./blender() [0x15020d5]
./blender() [0xeca877]
./blender() [0xecaecc]
./blender(wm_event_do_handlers+0x310) [0xecb5e0]
./blender(WM_main+0x20) [0xec2230]
./blender(main+0x321) [0xb4bfd1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa5facb50b3]
./blender() [0xc11c0c]
I am using Eigen for linear algebra calculations and pybind11 to compile it into a python module. The Eigen types are all fixed size as I don't need them to be dynamic (possible reason for the issue). I compile with gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 on Ubuntu 20.04. I currently use c++11 standard but it's not a requirement.
Current findings
With faulthandler.enabled() it gives me
Fatal Python error: Segmentation fault
Current thread 0x00007fa5fab18040 (most recent call first):
File "/Text", line 16 in <module>
I already found that it crashes on the same line in the program, which is when the result of a matrix-vector multiplication is supposed to be returned and inserted into a std::vector. I printed the vector and the matrix prior, to make sure they do not contain garbage and that worked fine.
I also tried to store it in an intermediate variable and print it and then it crashes on printing. The multiplication itself seems not to cause the segfault.
I figured, I try to call the function, where it occurs, directly from Blender, but then it works and returns a result without segfault.
I suspect it to be some kind of memory alignment issue and tried everything suggested here and on the Eigen documentary. Namely, I use Eigen::aligned_allocator in every std::vector, only pass the Eigen objects as const & and have EIGEN_MAKE_ALIGNED_OPERATOR_NEW in the camera and ray class which have Eigen type members.
Using #define EIGEN_DONT_VECTORIZE and #define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT only gave me partial success. It doesn't crash at the same line as before anymore. Weirdly enough, if I also add a cout before the return, the function finishes and returns.
Parts where crashes occur:
The project is not public and the C++ code is quite lengthy, so I only include parts of it. Let me know if you need more. The rest looks very similar, so if there is something conceptually wrong, it is probably wrong in here too. It is not a minimal example (and it contains some debug prints), as I don't know why it happens and the error is not always on the same part.
// in header
// this helped somehow
#define EIGEN_DONT_VECTORIZE
#define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT
// ***********************************
#include <iostream>
#include <numeric>
#include <array>
#include <vector>
#include <Eigen/Dense>
#include <Eigen/StdVector>
#include "matrix_types.h"
#include "camera.h"
// *******************************************************************************
// Vector9d is a typedef
// in cpp
Vector9d _z(const Camera& cam, const Vector9d& derivX, int x_dir, int y_dir, const std::function<double(const Ray&)>& depthTest){
Eigen::IOFormat HeavyFmt(Eigen::FullPrecision, 0, ", ", ",\n", "[", "]", "[", "]");
Matrix9d M0 = OpticalFlow::M(cam, x_dir, y_dir, depthTest);
std::cout << "M_\n" << M0.format(HeavyFmt) << "\n" <<std::endl;
Vector9d z = M0 * derivX;
return z;
}
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> OpticalFlow::GradPathError(const std::vector<Camera>& pathPositions, int x_dir, int y_dir, const std::function<double(const Ray&)>& depthTest, double h){
int n = pathPositions.size()-1;
Eigen::IOFormat HeavyFmt(Eigen::FullPrecision);
Eigen::IOFormat HeavyMtxFmt(Eigen::FullPrecision, 0, ", ", ",\n", "[", "]", "[", "]");
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> gradPE;
gradPE.reserve(n+1);
// save values that will be used more often in calculations
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> derivXs;
std::vector<Vector9d, Eigen::aligned_allocator<Vector9d>> zs;
std::vector<std::array<Matrix9d, 9>> gradMs;
derivXs.reserve(n+1);
zs.reserve(n+1);
gradMs.reserve(n+1);
for(int i = 0; i<n+1; ++i){
derivXs.push_back(_derivCamPath(pathPositions, i));
Camera cam = pathPositions[i];
Vector9d derivX = _derivCamPath(pathPositions, i);
zs.push_back(_z(cam, derivX, x_dir, y_dir, depthTest)); // <--- crashed here, if vectorization not turned off
gradMs.push_back(GradM(cam, x_dir, y_dir, depthTest, h));
}
for(int i = 0; i<n+1; ++i){
Vector9d derivZ = _derivZ(zs, i);
std::cout << "Zt_" << i << "\n" << derivZ.format(HeavyFmt) << "\n" << std::endl;
gradPE.push_back(1.0/(n+1) * _w(derivZ, derivXs[i], gradMs[i]));
}
// if this is included and vectorization turned off, it doesn't crash
// std::cout << "end" << std::endl;
return gradPE; // <-- crash here if vectorization is off
}
I hope someone can help me find the cause or what else I can try to track it down further. I am not very experienced with C++, so the code might have obvious issues.
I think I found the cause.
This line ./blender(_ZN5Eigen8IOFormatD1Ev+0xd3) [0x2041673] actually named the culprit in a not so readable format. I used gdb to debug the python call from Blender and in the backtrace, the same line is #0 0x0000000002041673 in Eigen::IOFormat::~IOFormat() ()
The problem was I used Eigen::IOFormat in my program to debug print the matrices and vectors for easier copying into another program where I just wanted to check if the values are correct. You can see it in the excerpt code I posted in the question. I only used it in those two functions and the segfault only occurred in those two functions. There might be something else wrong but for now, it seems to work.
Related
I'm having difficulty with creating a PyTupleObject using the Python C api.
#include "Python.h"
int main() {
int err;
Py_ssize_t size = 2;
PyObject *the_tuple = PyTuple_New(size); // this line crashes the program
if (!the_tuple)
std::cerr << "the tuple is null" << std::endl;
err = PyTuple_SetItem(the_tuple, (Py_ssize_t) 0, PyLong_FromLong((long) 5.7));
if (err < 0) {
std::cerr << "first set item failed" << std::endl;
}
err = PyTuple_SetItem(the_tuple, (Py_ssize_t) 1, PyLong_FromLong((long) 5.7));
if (err < 0) {
std::cerr << "second set item failed" << std::endl;
}
return 0;
}
crashes with
Process finished with exit code -1073741819 (0xC0000005)
But so does everything else i've tried so far. Any ideas what I'm doing wrong? Not that I'm just trying to run the as a C++ program, as I'm just trying to do tests on the code before adding a swig typemap.
The commenter #asynts is correct in that you need to initialize the interpreter via Py_Initialize if you want to interact with Python objects (you are, in fact, embedding Python). There are a subset of functions from the API that can safely be called without initializing the interpreter, but creating Python objects do not fall within this subset.
Py_BuildValue may "work" (as in, not creating a segfault with those specific arguments), but it will cause issues elsewhere in the code if you try to do anything with it without having initialized the interpreter.
It seems that you're trying to extend Python rather than embed it, but you're embedding it to test the extension code. You may want to refer to the official documentation for extending Python with C/C++ to guide you through this process.
I am writing a Python extension module in C (actually C++, but this doesn't matter) that performs some calculations in an OpenMP loop, in which it can call a user-provided Python callback function, which operates on numpy arrays. Since the standard CPython does not allow one to use Python API from multiple threads simultaneously, I protect these callbacks by a #pragma omp critical block.
This works well in some cases, but sometimes creates a deadlock, whereby one thread is trying to acquire the openmp critical lock, and the other is waiting for the GIL lock:
Thread 0:
__kmpc_critical_with_hint (in libomp.dylib) + 1109 [0x105f8a6dd]
__kmp_acquire_queuing_lock(kmp_user_lock*, int) (in libomp.dylib) + 9 [0x105fbaec1]
int __kmp_acquire_queuing_lock_timed_template<false>(kmp_queuing_lock*, int) (in libomp.dylib) + 405 [0x105fb6e6c]
__kmp_wait_yield_4 (in libomp.dylib) + 135,128,... [0x105fb0d0e,0x105fb0d07,...]
Thread 1:
PyObject_Call (in Python) + 99 [0x106014202]
??? (in umath.so) load address 0x106657000 + 0x25e51 [0x10667ce51]
??? (in umath.so) load address 0x106657000 + 0x23b0c [0x10667ab0c]
??? (in umath.so) load address 0x106657000 + 0x2117e [0x10667817e]
??? (in umath.so) load address 0x106657000 + 0x21238 [0x106678238]
PyGILState_Ensure (in Python) + 93 [0x1060ab4a7]
PyEval_RestoreThread (in Python) + 62 [0x10608cb0a]
PyThread_acquire_lock (in Python) + 101 [0x1060bc1a4]
_pthread_cond_wait (in libsystem_pthread.dylib) + 767 [0x7fff97d0d728]
__psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff9d464db6]
Curiously, this happens whenever the Python callback function encounters an invalid floating-point value or overflows, printing a warning message to the console. Apparently this upsets some synchronization mutexes and leads to a deadlock shortly after.
Here is a stripped-down but self-contained example code.
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include <Python.h>
#include <numpy/arrayobject.h>
#include <omp.h>
const int NUM_THREADS = 2; // number of OpenMP threads
const int NUM_BLOCKS = 100; // number of loop iterations
//const double MAX_EXP = 500.0; // this is a safe value - exp(500) does not overflow
const double MAX_EXP = 1000.0; // exp(1000) overflows and produces a warning message, which then hangs the whole thing
int main(int argc, char *argv[])
{
Py_Initialize();
PyEval_InitThreads();
PyObject* numpy = PyImport_ImportModule("numpy");
if(!numpy) {
printf("Failed to import numpy\n");
return 1;
} else printf("numpy imported\n");
import_array1(1);
PyObject* fnc = PyObject_GetAttrString(numpy, "exp");
if(!fnc || !PyCallable_Check(fnc)) {
printf("Failed to get hold on function\n");
return 1;
} else printf("function loaded\n");
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for schedule(dynamic)
for(int i=0; i<NUM_BLOCKS; i++) {
int tn = omp_get_thread_num();
printf("Thread %i: block %i\n", tn, i);
#pragma omp critical
{
//PyGILState_STATE state = PyGILState_Ensure(); ///< does not help
npy_intp dims[1] = { random() % 64000 + 1000 };
PyArrayObject* args = (PyArrayObject*) PyArray_ZEROS(1, dims, NPY_DOUBLE, 0);
double* raw_data = (double*) PyArray_DATA(args);
for(npy_intp k=0; k<dims[0]; k++)
raw_data[k] = random()*MAX_EXP / RAND_MAX;
printf("Thread %i: calling fnc for block %i with %li points\n", tn, i, dims[0]);
PyObject* result = PyObject_CallFunctionObjArgs(fnc, args, NULL);
Py_DECREF(args);
printf("Thread %i: result[0] for block %i with %li points is %g\n", tn, i, dims[0],
*((double*)PyArray_GETPTR1((PyArrayObject*)result, 0)));
Py_XDECREF(result);
//PyGILState_Release(state);
}
}
Py_Finalize();
}
When I set MAX_EXP=500 in line 7, everything works without warnings and deadlocks, but if I replace it with MAX_EXP=1000, this produces a warning message,
sys:1: RuntimeWarning: overflow encountered in exp
and the next loop iteration never finishes. This behaviour is seen on both Linux and MacOS, Python 2.7 or 3.6 all the same. I tried to add some PyGILState_Ensure() to the code but this doesn't help, and the documentation on these aspects is unclear.
Okay, so the problem turned out to lurk deep inside numpy, namely in the _error_handler() function, which is called whenever an invalid floating-point value is produces (NaN, overflow to infinity, etc.)
This function has several regimes - from ignoring the error completely to raising an exception, but by default it issues a Python warning. In doing so, it temporarily re-acquires GIL, which was released during bulk computation, and that's where the deadlock occurs.
A very similar situation leading to the same problem is discussed here: https://github.com/numpy/numpy/issues/5856
My workaround solution was to create a lock-type class that disables numpy warnings during the existence of this class instance (which is created during the parallelized computation), and restores the original settings once this instance is destroyed. While not ideal, this seems to suffice in my case, though my feeling that ultimately the culprit is numpy itself. For completeness, here is the code of this class:
/** Lock-type class that temporarily disables warnings that numpy produces on floating-point
overflows or other invalid values.
The reason for doing this is that such warnings involve subtle interference with GIL
when executed in a multi-threading context, leading to deadlocks if a user-defined Python
function is accessed from multiple threads (even after being protected by an OpenMP critical
section). The instance of this class is created (and hence warnings are suppressed)
whenever a user-defined callback function is instantiated, and the previous warning settings
are restored once such a function is deallocated.
*/
class NumpyWarningsDisabler {
PyObject *seterr, *prevSettings; ///< pointer to the function and its previous settings
public:
NumpyWarningsDisabler() : seterr(NULL), prevSettings(NULL)
{
PyObject* numpy = PyImport_AddModule("numpy");
if(!numpy) return;
seterr = PyObject_GetAttrString(numpy, "seterr");
if(!seterr) return;
// store the dictionary corresponding to current settings of numpy warnings subsystem
prevSettings = PyObject_CallFunction(seterr, const_cast<char*>("s"), "ignore");
if(!prevSettings) { printf("Failed to suppress numpy warnings\n"); }
/*else { printf("Ignoring numpy warnings\n"); }*/
}
~NumpyWarningsDisabler()
{
if(!seterr || !prevSettings) return;
// restore the previous settings of numpy warnings subsystem
PyObject* args = PyTuple_New(0);
PyObject* result = PyObject_Call(seterr, args, prevSettings);
Py_DECREF(args);
if(!result) { printf("Failed to restore numpy warnings\n"); }
/*else printf("Restored numpy warnings\n");*/
Py_XDECREF(result);
}
};
I'm relatively new to Python and this is my first attempt at writing a C extension.
Background
In my Python 3.X project I need to load and parse large binary files (10-100MB) to extract data for further processing. The binary content is organized in frames: headers followed by a variable amount of data. Due to the low performance in Python I decided to go for a C extension to speedup the loading part.
The standalone C code outperforms Python by a factor in between 20x-500x so I am pretty satisfied with it.
The problem: the memory keeps growing when I invoke the function from my C-extension multiple times within the same Python module.
my_c_ext.c
#include <Python.h>
#include <numpy/arrayobject.h>
#include "my_c_ext.h"
static unsigned short *X, *Y;
static PyObject* c_load(PyObject* self, PyObject* args)
{
char *filename;
if(!PyArg_ParseTuple(args, "s", &filename))
return NULL;
PyObject *PyX, *PyY;
__load(filename);
npy_intp dims[1] = {n_events};
PyX = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, X);
PyArray_ENABLEFLAGS((PyArrayObject*)PyX, NPY_ARRAY_OWNDATA);
PyY = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, Y);
PyArray_ENABLEFLAGS((PyArrayObject*)PyY, NPY_ARRAY_OWNDATA);
PyObject *xy = Py_BuildValue("NN", PyX, PyY);
return xy;
}
...
//More Python C-extension boilerplate (methods, etc..)
...
void __load(char *) {
// open file, extract frame header and compute new_size
X = realloc(X, new_size * sizeof(*X));
Y = realloc(Y, new_size * sizeof(*Y));
X[i] = ...
Y[i] = ...
return;
}
test.py
import my_c_ext as ce
binary_files = ['file1.bin',...,'fileN.bin']
for f in binary_files:
x,y = ce.c_load(f)
del x,y
Here I am deleting the returned objects in hope of lowering memory usage.
After reading several posts (e.g. this, this and this), I am still stuck.
I tried to add/remove the PyArray_ENABLEFLAGS setting the NPY_ARRAY_OWNDATA flag without experiencing any difference. It is not yet clear to me if the NPY_ARRAY_OWNDATA implies a free(X) in C. If I explicitly free the arrays in C, I ran into a segfault when trying to load second file in the for loop in test.py.
Any idea of what am I doing wrong?
This looks like a memory management disaster. NPY_ARRAY_OWNDATA should cause it to call free on the data (or at least PyArray_free which isn't necessarily the same thing...).
However once this is done you still have the global variables X and Y pointing to a now-invalid area of memory. You then call realloc on those invalid pointers. At this point you're well into undefined behaviour and so anything could happen.
If it's a global variable then the memory needs to be managed globally, not by Numpy. If the memory is managed by the Numpy array then you need to ensure that you store no other way to access it except through that Numpy array. Anything else is going to cause you problems.
I have made a module in Python using SimpleITK, which I tried to speed up by reimplementing in C++. It turns out to be quite a lot slower.
The bottleneck is the usage of the DisplacementFieldJacobianDeterminantFilter.
These two snippets give an example of the usage of the filters.
1000 generations: C++ = 55s, python = 8s
Should I expect the c++ to be faster?
def test_DJD(label_path, ngen):
im = sitk.ReadImage(label_path)
for i in range(ngen):
jacobian = sitk.DisplacementFieldJacobianDeterminant(im)
if __name__ == '__main__':
label = sys.argv[1]
ngen = int(sys.argv[2])
test_DJD(label, ngen)
And the c++ code
typedef itk::Vector<float, 3> VectorType;
typedef itk::Image<VectorType, 3> VectorImageType;
typedef itk::DisplacementFieldJacobianDeterminantFilter<VectorImageType > JacFilterType;
typedef itk::Image<float, 3> FloatImageType;
int main(int argc, char** argv) {
std::string idealJacPath = argv[1];
std::string numGensString = argv[2];
int numGens;
istringstream ( numGensString ) >> numGens;
typedef itk::ImageFileReader<VectorImageType> VectorReaderType;
VectorReaderType::Pointer reader=VectorReaderType::New();
reader->SetFileName(idealJacPath);
reader->Update();
VectorImageType::Pointer vectorImage=reader->GetOutput();
JacFilterType::Pointer jacFilter = JacFilterType::New();
FloatImageType::Pointer generatedJac = FloatImageType::New();
for (int i =0; i < numGens; i++){
jacFilter->SetInput(vectorImage);
jacFilter->Update();
jacFilter->Modified();
generatedJac = jacFilter->GetOutput();
}
return 0;
}
I'm using the c++ ITK 4.8.2 and compiled in 'release' mode on Ubuntu 15.4. And the python SimpleITK v 9.0
You seem to be benchmarking using loops. Using loops for benchmarking is not a good practice, because the compilers and interpreters does a lot of optimizations to them.
I believe that in here
for i in range(ngen):
jacobian = sitk.DisplacementFieldJacobianDeterminant(im)
The python interpreter most probably realized that you are only using the last value assigned to the jacobian variable, therefore executing only ONE iteration of the loop. This is a very common loop optimization.
On the other hand, since you call a couple of dynamic method in the C++ version (jacFilter->Update();), is possible that the compiler could not infer that the other calls are not being used, making your C++ version slower since all the invocations to the DisplacementFieldJacobianDeterminant::update method are actually made.
Another possible cause is that the ITK pipeline in Python is not being forced to update, as you call explicitly the jacFilter->Modified() in C++ but this is not explicit in the Python version.
I have a C++ and a Python application on Windows (7+). I wish to send a ~50KB array of binary data (int[], float[], or double[]) from the C++ application to a NumPy array in the Python application in real time. I want <100ms latency, but can handle up to 500ms. I'm unsure of the correct way to do this.
I believe NumPy technically stores its arrays as just an array of binary data just like C++ (assuming a reasonable C++ compiler, like modern MSVC or GCC). Therefore technically it should be very easy, but I haven't been able to identify a good way to do this.
My current plan would be to use a memory mapped file, and then handle locking the memory-mapped file with more traditional IPC such as the Win32 message pump or a semaphore.
I'm however not sure whether NumPy can read straight from a memory mapped file. It can create a memory-map to a file on disk with numpy.memmap, but this doesn't seem to work for pure memory mapped file where I just have a name or a handle.
I don't know if this is the right approach. Maybe I can get it to work, but ideally I would also want to do it the right way and not be surprised by nasty consequences of me coding stuff I don't understand.
I would appreciate any help or pointers to material that might help me figure out the correct way to do this.
UPDATE:
My C++ code (proof-of-concept) would look like this:
// Host application.
// Creates 20 byte memory mapped file with name "Global\test_mmap_file" and
// containing 5 uint32s.
// Note: Requires SeCreateGlobalPrivilege to create global memory mapped
#include <windows.h>
#include <iostream>
#include <cassert>
file
int main()
{
HANDLE file_mapping_handle = NULL;
unsigned int* buffer = 0;
assert(sizeof(unsigned int) == 4); // Require compatability with np.uint32
const size_t buffer_sz = 5 * sizeof(unsigned int);
file_mapping_handle = CreateFileMapping(
INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE,
0,
buffer_sz,
L"Global\\test_mmap_file");
if (!file_mapping_handle)
{
std::cout << "CreateFileMapping failed (Host).\n";
std::cout << "Error code: 0x" << std::hex << GetLastError() << std::endl;
std::cin.get();
return 1;
}
buffer = (unsigned int*)MapViewOfFile(
file_mapping_handle,
FILE_MAP_ALL_ACCESS,
0,
0,
buffer_sz);
if (!buffer)
{
CloseHandle(file_mapping_handle);
std::cout << " MapViewOfFile failed (Host).\n";
std::cout << "Error code: 0x" << std::hex << GetLastError() << std::endl;
std::cin.get();
return 1;
}
buffer[0] = 2;
buffer[1] = 3;
buffer[2] = 5;
buffer[3] = 7;
buffer[4] = 11;
std::cout << "Data sent, press enter to exit.\n";
std::cin.get();
UnmapViewOfFile(buffer);
CloseHandle(file_mapping_handle);
return 0;
}
I wanted some way to access this shared memory from Python and create a numpy array. I tried,
import numpy as np
L_mm = np.memmap('Global\\test_mmap_file')
but this fails as Global\test_mmap_file is not a filename. Following the hints given by abarnert I constructed the following client program which seems to work:
import numpy as np
import mmap
mm = mmap.mmap(0,20,'Global\\test_mmap_file')
L = np.frombuffer(mm,dtype = np.uint32)
print (L)
mm.close()
This requires admin privileges for both programs to run (or giving the user the right to SeCreateGlobalObjects). However I think this should easily be bypassed by not giving the shared memory a global name, and instead duplicating the handle and passing it to the Python program. It also doesn't control access to the shared memory properly, but that should be easy with a semaphore or some other such construct.