I'm relatively new to Python and this is my first attempt at writing a C extension.
Background
In my Python 3.X project I need to load and parse large binary files (10-100MB) to extract data for further processing. The binary content is organized in frames: headers followed by a variable amount of data. Due to the low performance in Python I decided to go for a C extension to speedup the loading part.
The standalone C code outperforms Python by a factor in between 20x-500x so I am pretty satisfied with it.
The problem: the memory keeps growing when I invoke the function from my C-extension multiple times within the same Python module.
my_c_ext.c
#include <Python.h>
#include <numpy/arrayobject.h>
#include "my_c_ext.h"
static unsigned short *X, *Y;
static PyObject* c_load(PyObject* self, PyObject* args)
{
char *filename;
if(!PyArg_ParseTuple(args, "s", &filename))
return NULL;
PyObject *PyX, *PyY;
__load(filename);
npy_intp dims[1] = {n_events};
PyX = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, X);
PyArray_ENABLEFLAGS((PyArrayObject*)PyX, NPY_ARRAY_OWNDATA);
PyY = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, Y);
PyArray_ENABLEFLAGS((PyArrayObject*)PyY, NPY_ARRAY_OWNDATA);
PyObject *xy = Py_BuildValue("NN", PyX, PyY);
return xy;
}
...
//More Python C-extension boilerplate (methods, etc..)
...
void __load(char *) {
// open file, extract frame header and compute new_size
X = realloc(X, new_size * sizeof(*X));
Y = realloc(Y, new_size * sizeof(*Y));
X[i] = ...
Y[i] = ...
return;
}
test.py
import my_c_ext as ce
binary_files = ['file1.bin',...,'fileN.bin']
for f in binary_files:
x,y = ce.c_load(f)
del x,y
Here I am deleting the returned objects in hope of lowering memory usage.
After reading several posts (e.g. this, this and this), I am still stuck.
I tried to add/remove the PyArray_ENABLEFLAGS setting the NPY_ARRAY_OWNDATA flag without experiencing any difference. It is not yet clear to me if the NPY_ARRAY_OWNDATA implies a free(X) in C. If I explicitly free the arrays in C, I ran into a segfault when trying to load second file in the for loop in test.py.
Any idea of what am I doing wrong?
This looks like a memory management disaster. NPY_ARRAY_OWNDATA should cause it to call free on the data (or at least PyArray_free which isn't necessarily the same thing...).
However once this is done you still have the global variables X and Y pointing to a now-invalid area of memory. You then call realloc on those invalid pointers. At this point you're well into undefined behaviour and so anything could happen.
If it's a global variable then the memory needs to be managed globally, not by Numpy. If the memory is managed by the Numpy array then you need to ensure that you store no other way to access it except through that Numpy array. Anything else is going to cause you problems.
Related
I was wondering if it's possible to actually write in a file an address of a numpy array, via e.g. ctypeslib.ndpointer or something similar and then open this file in a C++ function, also called through ctypes in the same python process and read this address, convert it to e.g. C++ double array.
This will all be happening in the same python process.
I am aware that it's possible to pass it as a function argument and that works, but that isn't something I'd need.
This is how the code would look like, don't mind the syntax errors:
test.py
with open(path) as f:
f.write(matrix.ctypes.data_as(np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, flags='C_CONTIGUOUS')))
and cpp:
void function()
{
... read file, get address stored into double* array;
e.g. then print out the values
}
Where could I be wrong?
I work on a project where we are writing np array to a file and then reading that file in cpp, which is wasteful. I want to try adjusting it to write and later on read just this address. Sending a ndpointer or something else as a function argument wont work, as that would require editing big partion of the project.
I think that the data of your np.array will be lost once the python program terminates therefore you will not be able to access its memory location once the program ends.
Unfortunately, I don't know how to do it using ctypes but only using the C-API Extention.
With it, you access directly the python variable from c. It is represented by a pointer therefore you could access the address of any python object( therefore also ndarrays).
in python you would write:
import c_module
import NumPy as np
...
a = np.array([...])
#generate the numpy array
...
c_module.c_fun(a)
and then in your c++ code, you will receive the memory address
static PyObject* py_f_roots(PyObject* self, PyObject* args) {
PyObject *np_array_py;
if (!PyArg_ParseTuple(args, "OO", &np_array_py))
return NULL;
//now np_array_py points to the memory cell of the python numpy array a
//if you want to access it you need to cast it to a PyArrayObject *
PyArrayObject *np_array = (PyArrayObject *) np_array_py;
//you can access the data
double *data = (double *) PyArray_DATA(np_array);
return Py_None;
}
The documentation for numpy c API
The reference manual for c python extention
If the Python and C code are run in the same process, then the address you write from Python will be valid in C. I think you want the following:
test.py
import ctypes as ct
import numpy as np
matrix = np.array([1.1,2.2,3.3,4.4,5.5])
# use binary to write the address
with open('addr.bin','wb') as f:
# type of pointer doesn't matter just need the address
f.write(matrix.ctypes.data_as(ct.c_void_p))
# test function to receive the filename
dll = ct.CDLL('./test')
dll.func.argtypes = ct.c_char_p,
dll.func.restype = None
dll.func(b'addr.bin')
test.c
#include <stdio.h>
__declspec(dllexport)
void func(const char* file) {
double* p;
FILE* fp = fopen(file,"rb"); // read the pointer
fread(&p, 1, sizeof(p), fp);
fclose(fp);
for(int i = 0; i < 5; ++i) // dump the elements
printf("%lf\n", p[i]);
}
Output:
1.100000
2.200000
3.300000
4.400000
5.500000
I'm stuck in wrapping C dll with python.
dll side is provided blackbox and I made additional wrapper class.
python gives C dll numpy ndarray. then C dll process it and return it.
When I delete array in C, whole program dies. with or without deep copy. below are brief code to reproduce
python main
import mypyd, numpy
myobject = mypyd.myimpro()
image = readimg() #ndarray
result = myobject.process(image) <- process die here with exit code -1073741819
dllmain header
BOOST_PYTHON_MODULE(mypyd){
class_<mywrapclass>("myimpro")
.def("process", &mywrapclass::process)
}
c wrapper header
namespace np = boost::python::numpy;
class mywrapclass{
dllhandler m_handler;
myimagestruct process(np::ndarray);
}
c wrapper code
mywrapclass::mywrapclass(){
Py_Initialize();
np::initialize();
m_handler = initInDLL()
}
np::ndarray mywrapclass::process(np::ndarray input){
myimagestruct image = ndarray2struct(input); ## deep copy data array
myimagestruct result = processInDLL(m_handler, image);
np::ndarray ret = struct2ndarray(result); ## also deep copy data array. just in case.
return ret;
}
c header for dll(I can't modify this code. it's given with dll)
typedef struct {
void *data
... size, type etc
} myimagestruct;
typedef void * dllhandler;
dllhandler = initInDLL();
myimagestruct processInDLL(myimagestruct);
sorry for wrong code. Initially, I naively thought boost.python or deep copying will solve this heap memory problem, lol. Tried almost every keyword in my brain but couldn't even find any point to start.
dll works perfect without boost wrapping. So it must be my part to resolve it.
Thanks for reading anyway.
The following minimal example of calling a python function from C++ has a memory leak on my system:
script.py:
import tensorflow
def foo(param):
return "something"
main.cpp:
#include "python3.5/Python.h"
#include <iostream>
#include <string>
int main()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
PyRun_SimpleString("sys.path.append('./')");
PyObject* moduleName = PyUnicode_FromString("script");
PyObject* pModule = PyImport_Import(moduleName);
PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
PyObject* param = PyUnicode_FromString("dummy");
PyObject* args = PyTuple_Pack(1, param);
PyObject* result = PyObject_CallObject(fooFunc, args);
Py_CLEAR(result);
Py_CLEAR(args);
Py_CLEAR(param);
Py_CLEAR(fooFunc);
Py_CLEAR(pModule);
Py_CLEAR(moduleName);
Py_Finalize();
}
compiled with
g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main
and run with valgrind
valgrind --leak-check=yes ./main
produces the following summary
LEAK SUMMARY:
==24155== definitely lost: 161,840 bytes in 103 blocks
==24155== indirectly lost: 33 bytes in 2 blocks
==24155== possibly lost: 184,791 bytes in 132 blocks
==24155== still reachable: 14,067,324 bytes in 130,118 blocks
==24155== of which reachable via heuristic:
==24155== stdstring : 2,273,096 bytes in 43,865 blocks
==24155== suppressed: 0 bytes in 0 blocks
I'm using Linux Mint 18.2 Sonya, g++ 5.4.0, Python 3.5.2 and TensorFlow 1.4.1.
Removing import tensorflow makes the leak disappear. Is this a bug in TensorFlow or did I do something wrong? (I expect the latter to be true.)
Additionally when I create a Keras layer in Python
#script.py
from keras.layers import Input
def foo(param):
a = Input(shape=(32,))
return "str"
and run the call to Python from C++ repeatedly
//main.cpp
#include "python3.5/Python.h"
#include <iostream>
#include <string>
int main()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
PyRun_SimpleString("sys.path.append('./')");
PyObject* moduleName = PyUnicode_FromString("script");
PyObject* pModule = PyImport_Import(moduleName);
for (int i = 0; i < 10000000; ++i)
{
std::cout << i << std::endl;
PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
PyObject* param = PyUnicode_FromString("dummy");
PyObject* args = PyTuple_Pack(1, param);
PyObject* result = PyObject_CallObject(fooFunc, args);
Py_CLEAR(result);
Py_CLEAR(args);
Py_CLEAR(param);
Py_CLEAR(fooFunc);
}
Py_CLEAR(pModule);
Py_CLEAR(moduleName);
Py_Finalize();
}
the memory consumption of the application continuously grows ad infinitum during runtime.
So I guess there is something fundamentally wrong with the way I call the python function from C++, but what is it?
There are two different types "memory leaks" in your question.
Valgrind is telling you about the first type of memory leaks. However, it is pretty usual for python modules to "leak" memory - it is mostly some globals which are allocated/initialized when the module is loaded. And because the module is loaded only once in Python its not a big problem.
A well known example is numpy's PyArray_API: It must be initialized via _import_array, is then never deleted and stays in memory until the python interpreter is shut down.
So it is a "memory leak" per design, you can argue whether it is a good design or not, but at the end of the day there is nothing you could do about it.
I don't have enough insight into the tensorflow-module to pin-point the places where such memory leaks happen, but I'm pretty sure that it's nothing you should worry about.
The second "memory leak" is more subtle.
You can get a lead, when you compare the valgrind output for 10^4 and 10^5 iterations of the loop - there will be almost no difference! There is however difference in the peak-memory consumption.
Differently from C++, Python has a garbage collector - so you cannot know when exactly an object is destructed. CPython uses reference counting, so when a reference count gets 0, the object is destroyed. However, when there is a cycle of references (e.g. object A holds a reference of object B and object B holds a reference of object B) it is not so simple: the garbage collector needs to iterate through all objects to find such no longer used cycles.
One could think, that keras.layers.Input has such a cycle somewhere (and this is true), but this is not the reason for this "memory leak", which can be observed also for pure python.
We use objgraph-package to inspect the references, let's run the following python script:
#pure.py
from keras.layers import Input
import gc
import sys
import objgraph
def foo(param):
a = Input(shape=(1280,))
return "str"
### MAIN :
print("Counts at the beginning:")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)
for i in range(int(sys.argv[1])):
foo(" ")
gc.collect()# just to be sure
print("\n\n\n Counts at the end")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)
import random
objgraph.show_chain(
objgraph.find_backref_chain(
random.choice(objgraph.by_type('Tensor')), #take some random tensor
objgraph.is_proper_module),
filename='chain.png')
and run it:
>>> python pure.py 1000
We can see the following: at the end there are exactly 1000 Tersors, that means none of our created objects got disposed!
If we take a look at the chain, which keeps a tensor-object alive (was created with objgraph.show_chain), so we see:
that there is a tensorflow-Graph-object where all tensors are registered and stay there until session is closed.
So far the theory, however neighter:
#close session and free resources:
import keras
keras.backend.get_session().close()#free all resources
print("\n\n\n Counts after session.close():")
objgraph.show_most_common_types()
nor the here proposed solution:
with tf.Graph().as_default(), tf.Session() as sess:
for step in range(int(sys.argv[1])):
foo(" ")
has worked for the current tensorflow-version. Which is probably a bug.
In a nutshell: You do nothing wrong in your c++-code, there are no memory leaks you are responsible for. In fact you would see exactly same memory consumption if you would call the function foo from a pure python-script over and over again.
All created Tensors are registered in a Graph-object and aren't automatically released, you must release them by closing the backend session - which however doesn't work due to a bug in the current tensorflow-version 1.4.0.
I am writing a Python Extension in C (on Linux (Ubuntu 14.04)) and ran into an issue with dynamic memory allocation. I searched through SO and found several posts on free() calls causing similar errors because free() tries to free memory that is not dynamically allocated. But I don't know if/how that is a problem in the code below:
#include <Python.h>
#include <stdio.h>
#include <stdlib.h>
static PyObject* tirepy_process_data(PyObject* self, PyObject *args)
{
FILE* rawfile = NULL;
char* rawfilename = (char *) malloc(128*sizeof(char));
if(rawfilename == NULL)
printf("malloc() failed.\n");
memset(rawfilename, 0, 128);
const int num_datapts = 0; /* Just Some interger variable*/
if (!PyArg_ParseTuple(args, "si", &rawfilename, &num_datapts)) {
return NULL;
}
/* Here I am going top open the file, read the contents and close it again */
printf("Raw file name is: %s \n", rawfilename);
free(rawfilename);
return Py_BuildValue("i", num_profiles);
}
The output is:
Raw file name is: \home\location_to_file\
*** Error in `/usr/bin/python': free(): invalid pointer: 0xb7514244 ***
According to the documentation:
These formats allow to access an object as a contiguous chunk of memory. You don’t have to provide raw storage for the returned unicode or bytes area. Also, you won’t have to release any memory yourself, except with the es, es#, et and et# formats.
(Emphasis is added by me)
So you do not need to first allocate memory with malloc(). You also do not need to free() the memory afterwards.
Your error occurs because you are trying to free memory that was provided/allocated by Python. So C (malloc/free) is unable to free it, as it is unknown to the C runtime.
Please consider the API docs for `PyArg_ParseTuple': https://docs.python.org/2/c-api/arg.html
You shall NOT pass a pointer to allocated memory, nor shall you free it afterwards.
I am creating small GUI system and I would like to make my rendering with python and cairo and pystacia libraries. For C++/Python interaction I am using Boost Python but I am having troubles with pointers.
I have seen this kind of question asked few times but didn't quite understand how to solve it.
If I have a strcut/class with only pointer for image data:
struct ImageData{
unsigned char* data;
void setData(unsigned char* data) { this->data = data; } // lets assume there is more code here that manages memory
unsigned char* getData() { return data; }
}
how can I make this available for python to do this (C++):
ImageData myimage;
myimage.data = some_image_data;
global["myimage"] = python::ptr(&myimage);
and in python:
import mymodule
from mymodule import ImageData
myimagedata = myimage.GetData()
#manipulate with image data and those manipulations can be read from C++ data pointer that is passed
My code works for calling basic method calling of passed ptr to class. This is probably basic use case but I haven't been able to make it work. I tried shared_ptr but failed. Should it be solved using shared_ptr, proxy class or some other way?
You have a problem here with the locality of your variable: myimage will get deleted when you get out of its scope. To fix this, you can create it with dynamic memory allocation and moving then moving the pointer to python:
ImageData * myimage = new ImageData();
myimage->data = new ImageRawData(some_image_data); // assuming copy of the buffer
global["myimage"] = boost::python::ptr(myimage);
Please notice that this does not take care of memory handling on python. You shall use boost::python::handle<> to correctly state you are transferring memory management to python.