I have met a Python Segmentation fault when developoing a python c module.
After debugging, it turns out that one of the pools current using has freeblock set to be 0xffffffff.
Core Dump gdb frames:
(gdb) frame 0
#0 Py0bject_Malloc (nbytes=53) at../Objects/obmalloc.c:837
837 in ../Objects/obmalloc.c
(gdb) p bp
$6 = (block *) Oxffffffffffffffff <error: Cannot access memory at address Oxffffffffffffffff>
for Better colored text, still provide gdb screen shots here.
Relative code:
void *PyObject_Malloc(size_t nbytes) {
...
/*
* This implicitly redirects malloc(0).
*/
if ((nbytes - 1) < SMALL_REQUEST_THRESHOLD) {
LOCK();
/*
* Most frequent paths first
*/
size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
pool = usedpools[size + size];
if (pool != pool->nextpool) {
/*
* There is a used pool for this size class.
* Pick up the head block of its free list.
*/
++pool->ref.count;
bp = pool->freeblock;
assert(bp != NULL);
if ((pool->freeblock = *(block **)bp) != NULL) {
UNLOCK();
return (void *)bp;
}
...
}
A above have shown, it pick value that pool's freeblock(bp) point to while bp is 0xffffffff which violates our cognization to python memory management.
So the question is, when and why, would the freeblock pointer have been assigned with 0xffffffff?
It turns out that i have not use gil properly, and somewhere multithreading error has been triggered. Then the freeblock's value has been wrong.
Related
I wrote a python code embedded with C code by using ctypes.
the C code is being called multiple times in a for loop.
the C code is as follows:
test.h
#include<Python.h>
PyObject *getFeature(wchar_t *text);
// where the unigram is a Set Object with type 'PySetObject'
test.c
#include<test.h>
PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2);
ret = PyList_Append(featureList, curString);
Py_DECREF(curString);
return featureList;
}
and then I compiled it and get a shared lib called libtest.so. So I can import this C .so file into the python code with ctypes like below:
test.py
import ctypes
dir_path = 'path/to/the/libtest.so'
feature_extractor = ctypes.PyDLL(
os.path.join(dir_path, 'libtest.so'))
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = [
ctypes.c_wchar_p, ctypes.py_object]
get_feature_c.restype = ctypes.py_object
def get_feature(text):
return [text[:2]]
times = 100000
for i in range(times):
res = get_feature_c('ncd') # the memory size will become larger and larger.
for i in range(times):
res = get_feature('ncd') # the memory will remain in a fixed size.
and I moniter the memory cost of the program with command top and find that the memory explodes in comply with the for loop times.
but when I write a python func, the memory remains in a steady size.
I assume that after every call of the C func, the memory is not released correctly. So how to release and control the memory after each calling?
BTW: I only ask this question in a simple way, and the whole C func code is in C code. and there is no memory leak in the C code.
The code in your example doesn't leak:
#include<test.h>
PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
// Create new reference to "curString" (allcates memory)
PyObject *curString = PyUnicode_FromWideChar(text, 2);
// Add "curString" to "featureList", incrementing reference count
ret = PyList_Append(featureList, curString);
// "curString" no longer used, reduce reference count.
Py_DECREF(curString);
// Correctly returns a single reference to the list,
// which contains a single reference to a string
return featureList;
}
When res is re-assigned the return value of get_feature_c, the previous value of res (a list) has its reference count reduced. If that count is zero (it is) then the references of each item in the list is decremented as well, and the objects are freed if their reference goes to zero, then the list object is freed as well.
But in your referenced C code, There are many leaks due to not calling Py_DECREF. When you leak a reference, an object's reference count never reaches zero and never freed, creating a memory leak:
// Create a new object with "PyUnicode_FromWideChar",
// Add another reference via "featureList",
// so leaked reference to the object.
ret = PyList_Append(featureList, PyUnicode_FromWideChar(charCurrentFeature, 2));
Also here:
PyObject *bigrams1 = PySet_New(0);
// each "PyUnicode_FromWideChar" leaks a reference.
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"据", 1));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc", 2));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"ckd", 3));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc.3e", 5));
You can test if your code leaks references with a debug build of your test DLL and a debug build of Python. I'll demonstrate with a Windows build:
test.c - debug build compiled with Micrsoft Visual Studio
cl /LD /MDd /W3 /Ic:\python310\include test.c -link /libpath:c:\python310\libs
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
#include <Python.h>
API PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2); // allocates curString (1st reference)
ret = PyList_Append(featureList, curString); // Creates 2nd reference to curString in featureList
Py_DECREF(curString); // curString no longer used
return featureList;
}
test.py
import ctypes as ct
import sys
feature_extractor = ct.PyDLL('./test')
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = ct.c_wchar_p, # OP example code had error here
get_feature_c.restype = ct.py_object
def get_feature(text):
return [text[:2]]
times = 10
for i in range(times):
print(sys.gettotalrefcount()) # Only available in debug build of Python
res = get_feature_c('ncd')
Output when run with debug build of Python to enable sys.gettotalrefcount(), and note that total reference count doesn't grow over loops:
C:\>python_d test.py
70904
70910
70910
70910
70910
70910
70910
70910
70910
70910
Now with Py_DECREF commented out a reference is leaked every loop:
70904
70911
70912
70913
70914
70915
70916
70917
70918
70919
Recently I read an article about CPython memory model: https://rushter.com/blog/python-memory-managment/. The article shows the following structure that CPython uses to manage single pool:
struct pool_header {
union { block *_padding;
uint count; } ref; /* number of allocated blocks */
block *freeblock; /* pool's free list head */
struct pool_header *nextpool; /* next pool of this size class */
struct pool_header *prevpool; /* previous pool "" */
uint arenaindex; /* index into arenas of base adr */
uint szidx; /* block size class index */
uint nextoffset; /* bytes to virgin block */
uint maxnextoffset; /* largest valid nextoffset */
};
The thing that I don't get is how CPython obtains this header to update if some block became free? Am I right that it relies on some low level trick that if you allocate a page of 4Kb size then it's pointer is somehow aligned and you can detect the start of the page by zeroing out several bits (possibly 12, because 2^12=4096) of blocks address? Am I right?
It just rounds the block address down to the nearest pool-aligned value:
/* Round pointer P down to the closest pool-aligned address <= P, as a poolp */
#define POOL_ADDR(P) ((poolp)_Py_ALIGN_DOWN((P), POOL_SIZE))
While the rounding is essentially as you hypothesized, it is valid not because of the low-level trick you hypothesized, but simply because CPython manually ensures pools have the necessary alignment. When CPython allocates the big chunk of memory used for an arena's pools, it sets the arena's pool_address to the first pool-aligned address in that big chunk of memory:
/* pool_address <- first pool-aligned address in the arena
nfreepools <- number of whole pools that fit after alignment */
arenaobj->pool_address = (block*)arenaobj->address;
arenaobj->nfreepools = MAX_POOLS_IN_ARENA;
excess = (uint)(arenaobj->address & POOL_SIZE_MASK);
if (excess != 0) {
--arenaobj->nfreepools;
arenaobj->pool_address += POOL_SIZE - excess;
}
I am using ctypes to try and speed up my code.
My problem is similar to the one in this tutorial : https://cvstuff.wordpress.com/2014/11/27/wraping-c-code-with-python-ctypes-memory-and-pointers/
As pointed out in the tutorial I should free the memory after using the C function. Here is my C code
//C functions
double* getStuff(double *R_list, int items){
double results[items];
double* results_p;
for(int i = 0; i < items; i++){
res = calculation ; \\do some calculation
results[i] = res; }
results_p = results;
printf("C allocated address %p \n", results_p);
return results_p; }
void free_mem(double *a){
printf("freeing address: %p\n", a);
free(a); }
Which I compile with gcc -shared -Wl,-lgsl,-soname, simps -o libsimps.so -fPIC simps.c
And python:
//Python
from ctypes import *
import numpy as np
mydll = CDLL("libsimps.so")
mydll.getStuff.restype = POINTER(c_double)
mydll.getStuff.argtypes = [POINTER(c_double),c_int]
mydll.free_mem.restype = None
mydll.free_mem.argtypes = [POINTER(c_double)]
R = np.logspace(np.log10(0.011),1, 100, dtype = float) #input
tracers = c_int(len(R))
R_c = R.ctypes.data_as(POINTER(c_double))
for_list = mydll.getStuff(R_c,tracers)
print 'Python allocated', hex(for_list)
for_list_py = np.array(np.fromiter(for_list, dtype=np.float64, count=len(R)))
mydll.free_mem(for_list)
Up to the last line the code does what I want it to and the for_list_py values are correct. However, when I try to free the memory, I get a Segmentation fault and on closer inspection the address associated with for_list --> hex(for_list) is different to the one allocated to results_p within C part of the code.
As pointed out in this question, Python ctypes: how to free memory? Getting invalid pointer error , for_list will return the same address if mydll.getStuff.restype is set to c_void_p. But then I struggle to put the actual values I want into for_list_py. This is what I've tried:
cast(for_list, POINTER(c_double) )
for_list_py = np.array(np.fromiter(for_list, dtype=np.float64, count=len(R)))
mydll.free_mem(for_list)
where the cast operation seems to change for_list into an integer. I'm fairly new to C and very confused. Do I need to free that chunk of memory? If so, how do I do that whilst also keeping the output in a numpy array? Thanks!
Edit: It appears that the address allocated in C and the one I'm trying to free are the same, though I still recieve a Segmentation fault.
C allocated address 0x7ffe559a3960
freeing address: 0x7ffe559a3960
Segmentation fault
If I do print for_list I get <__main__.LP_c_double object at 0x7fe2fc93ab00>
Conclusion
Just to let everyone know, I've struggled with c_types for a bit.
I've ended up opting for SWIG instead of c_types. I've found that the code runs faster on the whole (compared to the version presented here). I found this documentation on dealing with memory deallocation in SWIG very useful https://scipy-cookbook.readthedocs.io/items/SWIG_Memory_Deallocation.html as well as the fact that SWIG gives you a very easy way of dealing with numpy n-dimensional arrays.
After getStuff function exits, the memory allocated to results array is not available any more, so when you try to free it, it crashes the program.
Try this instead:
double* getStuff(double *R_list, int items)
{
double* results_p = malloc(sizeof((*results_p) * (items + 1));
if (results_p == NULL)
{
// handle error
}
for(int i = 0; i < items; i++)
{
res = calculation ; \\do some calculation
results_p[i] = res;
}
printf("C allocated address %p \n", results_p);
return results_p;
}
I'm a Python veteran, but haven't dabbled much in C. After half a day of not finding anything on the internet that works for me, I thought I would ask here and get the help I need.
What I want to do is write a simple C function that accepts a string and returns a different string. I plan to bind this function in several languages (Java, Obj-C, Python, etc.) so I think it has to be pure C?
Here's what I have so far. Notice I get a segfault when trying to retrieve the value in Python.
hello.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char* hello(char* name) {
static char greeting[100] = "Hello, ";
strcat(greeting, name);
strcat(greeting, "!\n");
printf("%s\n", greeting);
return greeting;
}
main.py
import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
foo = hello.hello(c_name)
print c_name.value # this comes back fine
print ctypes.c_char_p(foo).value # segfault
I've read that the segfault is caused by C releasing the memory that was initially allocated for the returned string. Maybe I'm just barking up the wrong tree?
What's the proper way to accomplish what I want?
Your problem is that greeting was allocated on the stack, but the stack is destroyed when the function returns. You could allocate the memory dynamically:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char* hello(char* name) {
char* greeting = malloc(100);
snprintf("Hello, %s!\n", 100, name)
printf("%s\n", greeting);
return greeting;
}
But that's only part of the battle because now you have a memory leak. You could plug that with another ctypes call to free().
...or a much better approach is to read up on the official C binding to python (python 2.x at http://docs.python.org/2/c-api/ and python 3.x at http://docs.python.org/3/c-api/). Have your C function create a python string object and hand that back. It will be garbage collected by python automatically. Since you are writing the C side, you don't have to play the ctypes game.
...edit..
I didn't compile and test, but I think this .py would work:
import ctypes
# define the interface
hello = ctypes.cdll.LoadLibrary('./hello.so')
# find lib on linux or windows
libc = ctypes.CDLL(ctypes.util.find_library('c'))
# declare the functions we use
hello.hello.argtypes = (ctypes.c_char_p,)
hello.hello.restype = ctypes.c_char_p
libc.free.argtypes = (ctypes.c_void_p,)
# wrap hello to make sure the free is done
def hello(name):
_result = hello.hello(name)
result = _result.value
libc.free(_result)
return result
# do the deed
print hello("Frank")
In hello.c you return a local array. You have to return a pointer to an array, which has to be dynamically allocated using malloc.
char* hello(char* name)
{
char hello[] = "Hello ";
char excla[] = "!\n";
char *greeting = malloc ( sizeof(char) * ( strlen(name) + strlen(hello) + strlen(excla) + 1 ) );
if( greeting == NULL) exit(1);
strcpy( greeting , hello);
strcat(greeting, name);
strcat(greeting, excla);
return greeting;
}
I ran into this same problem today and found you must override the default return type (int) by setting restype on the method. See Return types in the ctype doc here.
import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
hello.hello.restype = ctypes.c_char_p # override the default return type (int)
foo = hello.hello(c_name)
print c_name.value
print ctypes.c_char_p(foo).value
I also ran into the same problem but used a different approach. I was suppose to find a string in a list of strings matchin a certain value.
Basically I initalized a char array with the size of longest string in my list. Then passed that as an argument to my function to hold the corresponding value.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void find_gline(char **ganal_lines, /*line array*/
size_t size, /*array size*/
char *idnb, /* id number for check */
char *resline) {
/*Iterates over lines and finds the one that contains idnb
then affects the result to the resline*/
for (size_t i = 0; i < size; i++) {
char *line = ganal_lines[i];
if (strstr(line, idnb) != NULL) {
size_t llen = strlen(line);
for (size_t k = 0; k < llen; k++) {
resline[k] = line[k];
}
return;
}
}
return;
}
This function was wrapped by the corresponding python function:
def find_gline_wrap(lines: list, arg: str, cdll):
""
# set arg types
mlen = maxlen(lines) # gives the length of the longest string in string list
linelen = len(lines)
line_array = ctypes.c_char_p * linelen
cdll.find_gline.argtypes = [
line_array,
ctypes.c_size_t,
ctypes.c_char_p,
ctypes.c_char_p,
]
#
argbyte = bytes(arg, "utf-8")
resbyte = bytes("", "utf-8")
ganal_lines = line_array(*lines)
size = ctypes.c_size_t(linelen)
idnb = ctypes.c_char_p(argbyte)
resline = ctypes.c_char_p(resbyte * mlen)
pdb.set_trace()
result = cdll.find_gline(ganal_lines, size, idnb, resline)
# getting rid of null char at the end
result = resline.value[:-1].decode("utf-8")
return result
Here's what happens. And why it's breaking. When hello() is called, the C stack pointer is moved up, making room for any memory needed by your function. Along with some function call overhead, all of your function locals are managed there. So that static char greeting[100], means that 100 bytes of the increased stack are for that string. You than use some functions that manipulate that memory. At the you place a pointer on the stack to the greeting memory. And then you return from the call, at which point, the stack pointer is retracted back to it's original before call position. So those 100 bytes that were on the stack for the duration of your call, are essentially up for grabs again as the stack is further manipulated. Including the address field which pointed to that value and that you returned. At that point, who knows what happens to it, but it's likely set to zero or some other value. And when you try to access it as if it were still viable memory, you get a segfault.
To get around, you need to manage that memory differently somehow. You can have your function allocate the memory on the heap, but you'll need to make sure it gets free()'ed at a later date, by your binding. OR, you can write your function so that the binding language passes it a glump of memory to be used.
That's a single threaded code.
In particular: ahocorasick Python extension module (easy_install ahocorasick).
I isolated the problem to a trivial example:
import ahocorasick
t = ahocorasick.KeywordTree()
t.add("a")
When I run it in gdb, all is fine, same happens when I enter these instructions into Python CLI. However, when I try to run the script regularily, I get a segfault.
To make it even weirder, the line that causes segfault (identified by core dump analysis) is a regular int incrementation (see the bottom of the function body).
I'm completely stuck by this moment, what can I do?
int
aho_corasick_addstring(aho_corasick_t *in, unsigned char *string, size_t n)
{
aho_corasick_t* g = in;
aho_corasick_state_t *state,*s = NULL;
int j = 0;
state = g->zerostate;
// As long as we have transitions follow them
while( j != n &&
(s = aho_corasick_goto_get(state,*(string+j))) != FAIL )
{
state = s;
++j;
}
if ( j == n ) {
/* dyoo: added so that if a keyword ends up in a prefix
of another, we still mark that as a match.*/
aho_corasick_output(s) = j;
return 0;
}
while( j != n )
{
// Create new state
if ( (s = xalloc(sizeof(aho_corasick_state_t))) == NULL )
return -1;
s->id = g->newstate++;
debug(printf("allocating state %d\n", s->id)); /* debug */
s->depth = state->depth + 1;
/* FIXME: check the error return value of
aho_corasick_goto_initialize. */
aho_corasick_goto_initialize(s);
// Create transition
aho_corasick_goto_set(state,*(string+j), s);
debug(printf("%u -> %c -> %u\n",state->id,*(string+j),s->id));
state = s;
aho_corasick_output(s) = 0;
aho_corasick_fail(s) = NULL;
++j; // <--- HERE!
}
aho_corasick_output(s) = n;
return 0;
}
There are other tools you can use that will find faults that does not necessarily crash the program.
valgrind, electric fence, purify, coverity, and lint-like tools may be able to help you.
You might need to build your own python in some cases for this to be usable. Also, for memory corruption things, there is (or was, haven't built exetensions in a while) a possibility to let python use direct memory allocation instead of pythons own.
Have you tried translating that while loop to a for loop? Maybe there's some subtle misunderstanding with the ++j that will disappear if you use something more intuitive.