I am using swig with python to call a function myfunc() written in C++. Every time I call myfunc() I have to generate the same huge sparse matrix. What I would like to do instead, is to create the matrix once, then pass an a pointer of the matrix to python, without reallocating space every time. What I fear is that this could cause some kind of memory leak.
What is the best way to do this?
The matrix is part of Eigen::SparseMatrix.
Is it maybe save to simply pass a pointer back and forth? Python would not know how to handle it, but as long as the space stays allocated, will I be able to reuse the pointer in C++?
This is precisely how swig handles an unknown object: It passes a pointer to the object around, together with some type information (a string). If a function takes a pointer of that type as argument, swig will happily pass it that pointer. See the swig docs here.
You just have to make sure the types match up, i.e., you cannot pass say a MatrixXd* to python and use it in a function taking a MatrixBase<MatrixXd>*, since swig will not know that the types are compatible.
Also, for unknown objects (at least pointers to such), swig will not do any memory management, so you will need to allocate and deallocate the object on the C++ side.
Related
I'm using python bytes objects to pass some data to natively implemented methods using the CFFI library, for example:
from cffi import FFI
ffi = FFI()
lib = ffi.dlopen(libname)
ffi.cdef("""
void foo(char*);
""")
x = b'abc123'
lib.foo(x)
As far as I understand, the pointer received by the native method is that of the actual underlying buffer behind the x bytes object. This works fine 99% of the time, but sometimes the pointer seems to get invalidated and the data it points to contains garbage, some time after the native call has finished - the native code keeps the pointer around after returning from the initial call and expects the data to be present there, and the python code makes sure to keep a reference to x so that the pointer, hopefully, remains valid.
In these cases, if I call a native method with the same bytes object again, I can see that I get a different pointer pointing to the same value but located at a different address, indicating that the underlying buffer behind the bytes object has moved (if my assumption about CFFI extracting a pointer to the underlying array contained by the bytes object is correct, and a temporary copy is not created anywhere), even though, to the best of my knowledge, the bytes object has not been modified in any way (the code is part of a large codebase, but I'm reasonably sure that the bytes objects do not get modified directly by code).
What could be happening here? Is my assumption about CFFI getting a pointer to the actual internal buffer of the bytes object incorrect? Is python maybe allowed to silently reallocate the buffers behind bytes objects for garbage collection / memory compaction reasons, and does that unaware of me holding a pointer to it? I'm using pypy instead of the default python interpreter, if that makes a difference.
Your guess is the correct answer. The (documented) guarantee is only that the pointer passed in this case is valid for the duration of the call.
PyPy's garbage collector can move objects in memory, if they are small enough that doing so is a win in overall performance. When doing such a cffi call, though, pypy will generally mark the object as "pinned" for the duration of the call (unless there are already too many pinned objects and adding more would seriously hurt future GC performance; in this rare case it will make a copy anyway and free it afterwards).
If your C code needs to access the memory after the call returned, you have to make explicitly a copy, eg with ffi.new("char[]", mybytes), and keep it alive as long as needed.
I have a problem with a project I am working on and am not sure about the best way to
resolve it.
Basically I am pushing a slow python algorithm into a c++ shared library that I am using to do a lot of the numerically intense stuff. One of the c++ functions is of the form:
const int* some_function(inputs){
//does some stuff
int *return_array = new int[10];
// fills return array with a few values
return return_array;
}
I.e returns an array here. This array is interpreted within python using numpy ndpointer as per:
lib.some_function.restype = ndpointer(dtype=c_int, shape=(10,))
I have a couple of questions that I have been fretting over for a while:
1) I have dynamically allocated memory here. Given that I am calling this function through the shared library and into python, do I cause a memory leak? My program is long running and I will likely call this function millions of times, so this is important.
2) Is there a better data structure I can be using? If this was a pure c++ function I would return a vector, but from googling around, this seems to be a non- ideal solution in python with ctypes. I also have other functions in the c++ library that call this function. Given that I have just written the function and am about to write the others, I know to delete[] the returned pointer after use in these functions. However, I am unsatisfied with the current situation, as if someone other than myself (or indeed myself in a few months) uses this function, there is a relatively high chance of future memory leaks.
Thanks!
Yes, you are leaking memory. It is not possible for the Python code to automatically free the pointed-to memory (since it has no idea how it was allocated). You need to provide a corresponding de-allocation function (to call delete[]) and tell Python how to call it (possibly using a wrapper framework as recommended by #RichardHidges).
You probably want to consider using either SWIG or boost::python
There's an example of converting a std::vector to a python list using boost::python here:
std::vector to boost::python::list
here is the link for swig:
http://www.swig.org
I have a compatibility library that uses SWIG to access a C++ library. I would find it useful to be able to create a SWIG-wrapped Python object inside this layer (as opposed to accepting the C++ object as an argument or returning one). I.e. I want the PyObject* that points to the SWIG-wrapped C++ object.
I discovered that the SWIG_NewPointerObj function does exactly this. The SWIG-generated xx_wrap.cpp file uses this function, but it's also made available in the header emitted by swig -python -external-runtime swigpyrun.h
HOWEVER, I cannot find any reference to what the last argument to this function is. It appears that it specifies the ownership of the object, but there is no documentation that says what each of the options mean (or even what they all are).
It appears that the following are acceptable values:
0
SWIG_POINTER_OWN
SWIG_POINTER_NOSHADOW
SWIG_POINTER_NEW = OWN + NOSHADOW
SWIG_POINTER_DISOWN (I'm not sure if SWIG_NewPointerObj accepts this)
SWIG_POINTER_IMPLICIT_CONV (I'm not sure if SWIG_NewPointerObj accepts this)
I want to create an object that is used only in my wrapping layer. I want to create it out of my own pointer to the C++ object (so I can change the C++ object's value and have it be reflected in the Python object. I need it so it can be passed to a Python callback function. I want to keep this one instance throughout the life of the program so that I don't waste time creating/destroying identical objects for each callback. Which option is appropriate, and what do I Py_INCREF?
When you create new pointer objects with SWIG_NewPointerObj, you may pass the following flags:
SWIG_POINTER_OWN
SWIG_POINTER_NOSHADOW
If SWIG_POINTER_OWN is set, the destructor of the underlying C++ class will be called when the Python pointer is finalized. By default, the destructor will not be called. See Memory Management
For your use case, you don't need to set any flags at all.
From what I can see in the sources, if SWIG_POINTER_NOSHADOW is set, then a basic wrapped pointer is returned. You will not be able to access member variables in Python. All you'll have is an opaque pointer.
Reference: /usr/share/swig/2.0.7/python/pyrun.swg
I need to call a function in a C library from python, which would free() the parameter.
So I tried create_string_buffer(), but it seems like that this buffer would be freed by Python later, and this would make the buffer be freed twice.
I read on the web that Python would refcount the buffers, and free them when there is no reference. So how can I create a buffer which python would not care about it afterwards? Thanks.
example:
I load the dll with: lib = cdll.LoadLibrary("libxxx.so") and then call the function with: path = create_string_buffer(topdir) and lib.load(path). However, the load function in the libxxx.so would free its argument. And later "path" would be freed by Python, so it is freed twice
Try the following in the given order:
Try by all means to manage your memory in Python, for example using create_string_buffer(). If you can control the behaviour of the C function, modify it to not free() the buffer.
If the library function you call frees the buffer after using it, there must be some library function that allocates the buffer (or the library is broken).
Of course you could call malloc() via ctypes, but this would break all good practices on memory management. Use it as a last resort. Almost certainly, this will introduce hard to find bugs at some later time.
I'm considering integrating some C code into a Python system (Django), and I was considering using the Python / C API. The alternative is two separate processes with IPC, but I'm looking into direct interaction first. I'm new to Python so I'm trying to get a feel for the right direction to take.
Is it possible for a call to a C initialiser function to malloc a block of memory (and put something in it) and return a handle to it back to the Python script (pointer to the start of the memory block). The allocated memory should remain on the heap after the init function returns. The Python script can then call subsequent C functions (passing as an argument the pointer to the start of memory) and the function can do some thinking and return a value to the Python script. Finally, there's another C function to deallocate the memory.
Assume that the application is single-threaded and that after the init function, the memory is only read from so concurrency isn't an issue. The amount of memory will be a few hundred megabytes.
Is this even possible? Will Python let me malloc from the heap and allow it to stay there? Will it come from the Python process's memory? Will Python try and clear it up (i.e. does it do its own memory allocation and not expect any other processes to interfere with its address space)?
Could I just return the byte array as a Python managed string (or similar datatype) and pass the reference back as an argument to the C call? Would Python be OK with such a large string?
Would I be better off doing this with a separate process and IPC?
Cython
You can certainly use the C API to do what you want. You'll create a class in C, which can hold onto any memory it wants. That memory doesn't have to be exposed to Python at all if you don't want.
If you are comfortable building C DLLs, and don't need to perform Python operations in C, then ctypes might be your best bet.