I am writing a Python Extension in C (on Linux (Ubuntu 14.04)) and ran into an issue with dynamic memory allocation. I searched through SO and found several posts on free() calls causing similar errors because free() tries to free memory that is not dynamically allocated. But I don't know if/how that is a problem in the code below:
#include <Python.h>
#include <stdio.h>
#include <stdlib.h>
static PyObject* tirepy_process_data(PyObject* self, PyObject *args)
{
FILE* rawfile = NULL;
char* rawfilename = (char *) malloc(128*sizeof(char));
if(rawfilename == NULL)
printf("malloc() failed.\n");
memset(rawfilename, 0, 128);
const int num_datapts = 0; /* Just Some interger variable*/
if (!PyArg_ParseTuple(args, "si", &rawfilename, &num_datapts)) {
return NULL;
}
/* Here I am going top open the file, read the contents and close it again */
printf("Raw file name is: %s \n", rawfilename);
free(rawfilename);
return Py_BuildValue("i", num_profiles);
}
The output is:
Raw file name is: \home\location_to_file\
*** Error in `/usr/bin/python': free(): invalid pointer: 0xb7514244 ***
According to the documentation:
These formats allow to access an object as a contiguous chunk of memory. You don’t have to provide raw storage for the returned unicode or bytes area. Also, you won’t have to release any memory yourself, except with the es, es#, et and et# formats.
(Emphasis is added by me)
So you do not need to first allocate memory with malloc(). You also do not need to free() the memory afterwards.
Your error occurs because you are trying to free memory that was provided/allocated by Python. So C (malloc/free) is unable to free it, as it is unknown to the C runtime.
Please consider the API docs for `PyArg_ParseTuple': https://docs.python.org/2/c-api/arg.html
You shall NOT pass a pointer to allocated memory, nor shall you free it afterwards.
Related
I was wondering if it's possible to actually write in a file an address of a numpy array, via e.g. ctypeslib.ndpointer or something similar and then open this file in a C++ function, also called through ctypes in the same python process and read this address, convert it to e.g. C++ double array.
This will all be happening in the same python process.
I am aware that it's possible to pass it as a function argument and that works, but that isn't something I'd need.
This is how the code would look like, don't mind the syntax errors:
test.py
with open(path) as f:
f.write(matrix.ctypes.data_as(np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, flags='C_CONTIGUOUS')))
and cpp:
void function()
{
... read file, get address stored into double* array;
e.g. then print out the values
}
Where could I be wrong?
I work on a project where we are writing np array to a file and then reading that file in cpp, which is wasteful. I want to try adjusting it to write and later on read just this address. Sending a ndpointer or something else as a function argument wont work, as that would require editing big partion of the project.
I think that the data of your np.array will be lost once the python program terminates therefore you will not be able to access its memory location once the program ends.
Unfortunately, I don't know how to do it using ctypes but only using the C-API Extention.
With it, you access directly the python variable from c. It is represented by a pointer therefore you could access the address of any python object( therefore also ndarrays).
in python you would write:
import c_module
import NumPy as np
...
a = np.array([...])
#generate the numpy array
...
c_module.c_fun(a)
and then in your c++ code, you will receive the memory address
static PyObject* py_f_roots(PyObject* self, PyObject* args) {
PyObject *np_array_py;
if (!PyArg_ParseTuple(args, "OO", &np_array_py))
return NULL;
//now np_array_py points to the memory cell of the python numpy array a
//if you want to access it you need to cast it to a PyArrayObject *
PyArrayObject *np_array = (PyArrayObject *) np_array_py;
//you can access the data
double *data = (double *) PyArray_DATA(np_array);
return Py_None;
}
The documentation for numpy c API
The reference manual for c python extention
If the Python and C code are run in the same process, then the address you write from Python will be valid in C. I think you want the following:
test.py
import ctypes as ct
import numpy as np
matrix = np.array([1.1,2.2,3.3,4.4,5.5])
# use binary to write the address
with open('addr.bin','wb') as f:
# type of pointer doesn't matter just need the address
f.write(matrix.ctypes.data_as(ct.c_void_p))
# test function to receive the filename
dll = ct.CDLL('./test')
dll.func.argtypes = ct.c_char_p,
dll.func.restype = None
dll.func(b'addr.bin')
test.c
#include <stdio.h>
__declspec(dllexport)
void func(const char* file) {
double* p;
FILE* fp = fopen(file,"rb"); // read the pointer
fread(&p, 1, sizeof(p), fp);
fclose(fp);
for(int i = 0; i < 5; ++i) // dump the elements
printf("%lf\n", p[i]);
}
Output:
1.100000
2.200000
3.300000
4.400000
5.500000
So I have a C program that I am running from Python. But am getting segmentation fault error. when I run the C program alone, it runs fine. The C program interfaces a fingerprint sensor using the fprint lib.
#include <poll.h>
#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>
#include <libfprint/fprint.h>
int main(){
struct fp_dscv_dev **devices;
struct fp_dev *device;
struct fp_img **img;
int r;
r=fp_init();
if(r<0){
printf("Error");
return 1;
}
devices=fp_discover_devs();
if(devices){
device=fp_dev_open(*devices);
fp_dscv_devs_free(devices);
}
if(device==NULL){
printf("NO Device\n");
return 1;
}else{
printf("Yes\n");
}
int caps;
caps=fp_dev_img_capture(device,0,img);
printf("bloody status %i \n",caps);
//save the fingerprint image to file. ** this is the block that
causes the segmentation fault.
int imrstx;
imrstx=fp_img_save_to_file(*img,"enrolledx.pgm");
fp_img_free(*img);
fp_exit();
return 0;
}
the python code
from ctypes import *
so_file = "/home/arkounts/Desktop/pythonsdk/capture.so"
my_functions = CDLL(so_file)
a=my_functions.main()
print(a)
print("Done")
The capture.so is built and accessed in python. But calling from python, I get a Segmentation fault. What could be my problem?
Thanks alot
Although I am unfamiliar with libfprint, after taking a look at your code and comparing it with the documentation, I see two issues with your code that can both cause a segmentation fault:
First issue:
According to the documentation of the function fp_discover_devs, NULL is returned on error. On success, a NULL-terminated list is returned, which may be empty.
In the following code, you check for failure/success, but don't check for an empty list:
devices=fp_discover_devs();
if(devices){
device=fp_dev_open(*devices);
fp_dscv_devs_free(devices);
}
If devices is non-NULL, but empty, then devices[0] (which is equivalent to *devices) is NULL. In that case, you pass this NULL pointer to fp_dev_open. This may cause a segmentation fault.
I don't think that this is the reason for your segmentation fault though, because this error in your code would only be triggered if an empty list were returned.
Second issue:
The last parameter of fp_dev_img_capture should be a pointer to an allocated variable of type struct fp_img *. This tells the function the address of the variable that it should write to. However, with the code
struct fp_img **img;
[...]
caps=fp_dev_img_capture(device,0,img);
you are passing that function a wild pointer, because img does not point to any valid object. This can cause a segmentation fault as soon as the wild pointer is dereferenced by the function or cause some other kind of undefined behavior, such as overwriting other variables in your program.
I suggest you write the following code instead:
struct fp_img *img;
[...]
caps=fp_dev_img_capture(device,0,&img);
Now the third parameter is pointing to a valid object (to the variable img).
Since img is now a single pointer and not a double pointer, you must pass img instead of *img to the functions fp_img_save_to_file and fp_img_free.
This second issue is probably the reason for your segmentation fault. It seems that you were just "lucky" that your program did not segfault as a standalone program.
I'm relatively new to Python and this is my first attempt at writing a C extension.
Background
In my Python 3.X project I need to load and parse large binary files (10-100MB) to extract data for further processing. The binary content is organized in frames: headers followed by a variable amount of data. Due to the low performance in Python I decided to go for a C extension to speedup the loading part.
The standalone C code outperforms Python by a factor in between 20x-500x so I am pretty satisfied with it.
The problem: the memory keeps growing when I invoke the function from my C-extension multiple times within the same Python module.
my_c_ext.c
#include <Python.h>
#include <numpy/arrayobject.h>
#include "my_c_ext.h"
static unsigned short *X, *Y;
static PyObject* c_load(PyObject* self, PyObject* args)
{
char *filename;
if(!PyArg_ParseTuple(args, "s", &filename))
return NULL;
PyObject *PyX, *PyY;
__load(filename);
npy_intp dims[1] = {n_events};
PyX = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, X);
PyArray_ENABLEFLAGS((PyArrayObject*)PyX, NPY_ARRAY_OWNDATA);
PyY = PyArray_SimpleNewFromData(1, dims, NPY_UINT16, Y);
PyArray_ENABLEFLAGS((PyArrayObject*)PyY, NPY_ARRAY_OWNDATA);
PyObject *xy = Py_BuildValue("NN", PyX, PyY);
return xy;
}
...
//More Python C-extension boilerplate (methods, etc..)
...
void __load(char *) {
// open file, extract frame header and compute new_size
X = realloc(X, new_size * sizeof(*X));
Y = realloc(Y, new_size * sizeof(*Y));
X[i] = ...
Y[i] = ...
return;
}
test.py
import my_c_ext as ce
binary_files = ['file1.bin',...,'fileN.bin']
for f in binary_files:
x,y = ce.c_load(f)
del x,y
Here I am deleting the returned objects in hope of lowering memory usage.
After reading several posts (e.g. this, this and this), I am still stuck.
I tried to add/remove the PyArray_ENABLEFLAGS setting the NPY_ARRAY_OWNDATA flag without experiencing any difference. It is not yet clear to me if the NPY_ARRAY_OWNDATA implies a free(X) in C. If I explicitly free the arrays in C, I ran into a segfault when trying to load second file in the for loop in test.py.
Any idea of what am I doing wrong?
This looks like a memory management disaster. NPY_ARRAY_OWNDATA should cause it to call free on the data (or at least PyArray_free which isn't necessarily the same thing...).
However once this is done you still have the global variables X and Y pointing to a now-invalid area of memory. You then call realloc on those invalid pointers. At this point you're well into undefined behaviour and so anything could happen.
If it's a global variable then the memory needs to be managed globally, not by Numpy. If the memory is managed by the Numpy array then you need to ensure that you store no other way to access it except through that Numpy array. Anything else is going to cause you problems.
The following minimal example of calling a python function from C++ has a memory leak on my system:
script.py:
import tensorflow
def foo(param):
return "something"
main.cpp:
#include "python3.5/Python.h"
#include <iostream>
#include <string>
int main()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
PyRun_SimpleString("sys.path.append('./')");
PyObject* moduleName = PyUnicode_FromString("script");
PyObject* pModule = PyImport_Import(moduleName);
PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
PyObject* param = PyUnicode_FromString("dummy");
PyObject* args = PyTuple_Pack(1, param);
PyObject* result = PyObject_CallObject(fooFunc, args);
Py_CLEAR(result);
Py_CLEAR(args);
Py_CLEAR(param);
Py_CLEAR(fooFunc);
Py_CLEAR(pModule);
Py_CLEAR(moduleName);
Py_Finalize();
}
compiled with
g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main
and run with valgrind
valgrind --leak-check=yes ./main
produces the following summary
LEAK SUMMARY:
==24155== definitely lost: 161,840 bytes in 103 blocks
==24155== indirectly lost: 33 bytes in 2 blocks
==24155== possibly lost: 184,791 bytes in 132 blocks
==24155== still reachable: 14,067,324 bytes in 130,118 blocks
==24155== of which reachable via heuristic:
==24155== stdstring : 2,273,096 bytes in 43,865 blocks
==24155== suppressed: 0 bytes in 0 blocks
I'm using Linux Mint 18.2 Sonya, g++ 5.4.0, Python 3.5.2 and TensorFlow 1.4.1.
Removing import tensorflow makes the leak disappear. Is this a bug in TensorFlow or did I do something wrong? (I expect the latter to be true.)
Additionally when I create a Keras layer in Python
#script.py
from keras.layers import Input
def foo(param):
a = Input(shape=(32,))
return "str"
and run the call to Python from C++ repeatedly
//main.cpp
#include "python3.5/Python.h"
#include <iostream>
#include <string>
int main()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
PyRun_SimpleString("sys.path.append('./')");
PyObject* moduleName = PyUnicode_FromString("script");
PyObject* pModule = PyImport_Import(moduleName);
for (int i = 0; i < 10000000; ++i)
{
std::cout << i << std::endl;
PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
PyObject* param = PyUnicode_FromString("dummy");
PyObject* args = PyTuple_Pack(1, param);
PyObject* result = PyObject_CallObject(fooFunc, args);
Py_CLEAR(result);
Py_CLEAR(args);
Py_CLEAR(param);
Py_CLEAR(fooFunc);
}
Py_CLEAR(pModule);
Py_CLEAR(moduleName);
Py_Finalize();
}
the memory consumption of the application continuously grows ad infinitum during runtime.
So I guess there is something fundamentally wrong with the way I call the python function from C++, but what is it?
There are two different types "memory leaks" in your question.
Valgrind is telling you about the first type of memory leaks. However, it is pretty usual for python modules to "leak" memory - it is mostly some globals which are allocated/initialized when the module is loaded. And because the module is loaded only once in Python its not a big problem.
A well known example is numpy's PyArray_API: It must be initialized via _import_array, is then never deleted and stays in memory until the python interpreter is shut down.
So it is a "memory leak" per design, you can argue whether it is a good design or not, but at the end of the day there is nothing you could do about it.
I don't have enough insight into the tensorflow-module to pin-point the places where such memory leaks happen, but I'm pretty sure that it's nothing you should worry about.
The second "memory leak" is more subtle.
You can get a lead, when you compare the valgrind output for 10^4 and 10^5 iterations of the loop - there will be almost no difference! There is however difference in the peak-memory consumption.
Differently from C++, Python has a garbage collector - so you cannot know when exactly an object is destructed. CPython uses reference counting, so when a reference count gets 0, the object is destroyed. However, when there is a cycle of references (e.g. object A holds a reference of object B and object B holds a reference of object B) it is not so simple: the garbage collector needs to iterate through all objects to find such no longer used cycles.
One could think, that keras.layers.Input has such a cycle somewhere (and this is true), but this is not the reason for this "memory leak", which can be observed also for pure python.
We use objgraph-package to inspect the references, let's run the following python script:
#pure.py
from keras.layers import Input
import gc
import sys
import objgraph
def foo(param):
a = Input(shape=(1280,))
return "str"
### MAIN :
print("Counts at the beginning:")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)
for i in range(int(sys.argv[1])):
foo(" ")
gc.collect()# just to be sure
print("\n\n\n Counts at the end")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)
import random
objgraph.show_chain(
objgraph.find_backref_chain(
random.choice(objgraph.by_type('Tensor')), #take some random tensor
objgraph.is_proper_module),
filename='chain.png')
and run it:
>>> python pure.py 1000
We can see the following: at the end there are exactly 1000 Tersors, that means none of our created objects got disposed!
If we take a look at the chain, which keeps a tensor-object alive (was created with objgraph.show_chain), so we see:
that there is a tensorflow-Graph-object where all tensors are registered and stay there until session is closed.
So far the theory, however neighter:
#close session and free resources:
import keras
keras.backend.get_session().close()#free all resources
print("\n\n\n Counts after session.close():")
objgraph.show_most_common_types()
nor the here proposed solution:
with tf.Graph().as_default(), tf.Session() as sess:
for step in range(int(sys.argv[1])):
foo(" ")
has worked for the current tensorflow-version. Which is probably a bug.
In a nutshell: You do nothing wrong in your c++-code, there are no memory leaks you are responsible for. In fact you would see exactly same memory consumption if you would call the function foo from a pure python-script over and over again.
All created Tensors are registered in a Graph-object and aren't automatically released, you must release them by closing the backend session - which however doesn't work due to a bug in the current tensorflow-version 1.4.0.
I try to log all the output of a program written in Python and C. However, printing from Python causes IOError: [Errno 9] Bad file descriptor
Please, does anyone know what the problem is and how to fix it?
PS: It's on Windows XP, Python 2.6 and MinGW GCC
#include <windows.h>
#include <fcntl.h>
#include "Python.h"
int main()
{
int fds[2];
_pipe(fds, 1024, O_BINARY);
_dup2(fds[1], 1);
setvbuf(stdout, NULL, _IONBF, 0);
/* alternative version: */
// HANDLE hReadPipe, hWritePipe;
// int fd;
// DWORD nr;
// CreatePipe(&hReadPipe, &hWritePipe, NULL, 0);
// fd = _open_osfhandle((intptr_t)hWritePipe, _O_BINARY);
// _dup2(fd, 1);
// setvbuf(stdout, NULL, _IONBF, 0);
write(1, "write\n", 6);
printf("printf\n");
Py_Initialize();
PyRun_SimpleString("print 'print'"); // this breaks
Py_Finalize();
char buffer[1024];
fprintf(stderr, "buffer size: %d\n", read(fds[0], buffer, 1024)); // should always be more than 0
/* alternative version: */
// CloseHandle(hWritePipe);
// char buffer[1024];
// ReadFile(hReadPipe, buffer, 1024, &nr, NULL);
// fprintf(stderr, "buffer size: %d\n", nr); // should always be more than 0
}
I think it could be to do with different C runtimes. I know you can't pass file descriptors between different C runtimes - Python is built with MSVC (you will need to check which version) - so you could try to make MinGW build against the same C runtime - I think there is options to do this in MinGW like -lmsvcrt80 (or whichever is the appropriate versions) but for licensing reasons they can't distribute the libraries so you will have to find them on your system. Sorry I don't have any more details on that for now, but hopefully its a start for some googling.
A simpler way would be to just do it all in Python... just make a class which exposes a write and perhaps flush method and assign it to sys.stdout. Eg for a file you can just pass an open file object - it's probably straightfoward to do a similar thing for your pipe. Then just import it and sys and set sys.stdout in a PyRun_SimpleString.