Need guidance regarding reference counting - python

I'm chasing a memory leak that seems to come from a long-running process which contains a C extension that I wrote. I've been poring over the code and the Extensions docs and I'm sure it's correct but I'd like to make sure regarding the reference handling of PyList and PyDict.
From the docs I gather that PyDict_SetItem() borrows references to both key and value, hence I have to DECREF them after inserting. PyList_SetItem() and PyTuple_SetItem() steal a reference to the inserted item so I don't have to DECREF. Correct?
Creating a dict:
PyObject *dict = PyDict_New();
if (dict) {
for (i = 0; i < length; ++i) {
PyObject *key, *value;
key = parse_string(ctx); /* returns a PyString */
if (key) {
value = parse_object(ctx); /* returns some PyObject */
if (value) {
PyDict_SetItem(dict, key, value);
Py_DECREF(value); /* correct? */
}
Py_DECREF(key); /* correct? */
}
if (!key || !value) {
Py_DECREF(dict);
dict = NULL;
break;
}
}
}
return dict;
Creating a list:
PyObject *list = PyList_New(length);
if (list) {
PyObject *item;
for (i = 0; i < length; ++i) {
item = parse_object(ctx); /* returns some PyObject */
if (item) {
PyList_SetItem(list, i, item);
/* No DECREF here */
} else {
Py_DECREF(list);
list = NULL;
break;
}
}
}
return list;
The parse_* function don't need extra scrutiny: They only create objects on their last line like this (for example):
return PyLong_FromLong(...);
If they encounter an error, they don't create any object but set an exception earlier in the function body:
return PyErr_Format(...);
EDIT
Here's some output from valgrind --leak-check=full. Clearly it is my code leaking memory, but why? Why is PyDict_New is at the top of the (recursive) chain? Does that mean that the dict created here doesn't get DECREF'd when the whole thing is garbage collected?
Just to be clear here: When I build a nested data structure of Python types in C and then DECREF the topmost instance, Python will recursively DECREF all the contents of the structure, won't it?
==4357== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==4357== by 0x4F20DBC: PyObject_Malloc (in /usr/lib64/libpython3.6m.so.1.0)
==4357== by 0x4FC0F98: _PyObject_GC_Malloc (in /usr/lib64/libpython3.6m.so.1.0)
==4357== by 0x4FC102C: _PyObject_GC_New (in /usr/lib64/libpython3.6m.so.1.0)
==4357== by 0x4F11EC0: PyDict_New (in /usr/lib64/libpython3.6m.so.1.0)
==4357== by 0xE5821BA: parse_dict (parser.c:350)
==4357== by 0xE581987: parse_object (parser.c:675)
==4357== by 0xE5821F0: parse_dict (parser.c:358)
==4357== by 0xE581987: parse_object (parser.c:675)
==4357== by 0xE5823CE: parse (parser.c:727)

Forgot to Py_DECREF(item) after PyList_Append(list, item) in a seemingly unrelated piece of code. PyList_SetItem() steals references, PyList_Append() doesn't.

Related

How can I print the content of PyByteArrayObject*?

I am using PyArg_Parsetuple to parse a bytearray sent from Python with the Y format specifier.
Y (bytearray) [PyByteArrayObject *]
Requires that the Python object is a bytearray object, without attempting any conversion.
Raises TypeError if the object is not a bytearray object.
In C code I am doing:
static PyObject* py_write(PyObject* self, PyObject* args)
{
PyByteArrayObject* obj;
PyArg_ParseTuple(args, "Y", &obj);
.
.
.
The Python script is sending the following data:
arr = bytearray()
arr.append(0x2)
arr.append(0x0)
How do I loop over the PyByteArrayObject* in C? To print 2 and 0?
Rather than poking implementation details, you should go through the documented API, particularly, accessing the data buffer through PyByteArray_AS_STRING or PyByteArray_AsString rather than through direct struct member access:
char *data = PyByteArray_AS_STRING(bytearray);
Py_ssize_t len = PyByteArray_GET_SIZE(bytearray);
for (Py_ssize_t i = 0; i < len; i++) {
do_whatever_with(data[i]);
}
Note that everything in the public API takes the bytearray as a PyObject *, not a PyByteArrayObject *.
With the help of the comment section, I found the definition for PyByteArrayObject
/* Object layout */
typedef struct {
PyObject_VAR_HEAD
Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */
char *ob_bytes; /* Physical backing buffer */
char *ob_start; /* Logical start inside ob_bytes */
Py_ssize_t ob_exports; /* How many buffer exports */
} PyByteArrayObject;
And the actual code to loop
PyByteArrayObject* obj;
PyArg_ParseTuple(args, "Y", &obj);
Py_ssize_t i = 0;
for (i = 0; i < PyByteArray_GET_SIZE(obj); i++)
printf("%u\n", obj->ob_bytes[i]);
And I got the expected output.
Even better, simply use the Direct API
char* s = PyByteArray_AsString(obj);
int i = 0;
for (i = 0; i < PyByteArray_GET_SIZE(obj); i++)
printf("%u\n", s[i]);

How to convert a returned Python dictionary to a C++ std::map<string, string>

I'm calling Python from C++, and trying to perform some data conversions.
For example, if I call the following Python function
def getAMap():
data = {}
data["AnItem 1"] = "Item value 1"
data["AnItem 2"] = "Item value 2"
return data
from C++ as:
PyObject *pValue= PyObject_CallObject(pFunc, NULL);
where pFunc is a PyObject* that points to the getAMap python function.
Code for setting up pFunc omitted for clarity.
The returned pointer, pValue is a pointer to a (among other things) Python dictionary.
Question is, how to get thh dictionary into a std::map on the C++ side as smoothly as possible?
I'm using C++ Builder bcc32 compiler that can't handle any fancy template code, like boost python, or C++11 syntax.
(Changed question as the python object is a dictionary, not a tuple)
It's pretty ugly, but I came up with this:
std::map<std::string, std::string> my_map;
// Python Dictionary object
PyObject *pDict = PyObject_CallObject(pFunc, NULL);
// Both are Python List objects
PyObject *pKeys = PyDict_Keys(pDict);
PyObject *pValues = PyDict_Values(pDict);
for (Py_ssize_t i = 0; i < PyDict_Size(pDict); ++i) {
// PyString_AsString returns a char*
my_map.insert( std::pair<std::string, std::string>(
*PyString_AsString( PyList_GetItem(pKeys, i) ),
*PyString_AsString( PyList_GetItem(pValues, i) ) );
}

Python C Wrapper Memory Leak

I am moderately experienced in python and C but new to writing python modules as wrappers on C functions. For a project I needed one function named "score" to run much faster than I was able to get in python so I coded it in C and literally just want to be able to call it from python. It takes in a python list of integers and I want the C function to get an array of integers, the length of that array, and then return an integer back to python. Here is my current (working) solution.
static PyObject *module_score(PyObject *self, PyObject *args) {
int i, size, value, *gene;
PyObject *seq, *data;
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "O", &data))
return NULL;
seq = PySequence_Fast(data, "expected a sequence");
size = PySequence_Size(seq);
gene = (int*) PyMem_Malloc(size * sizeof(int));
for (i = 0; i < size; i++)
gene[i] = PyInt_AsLong(PySequence_Fast_GET_ITEM(seq, i));
/* Call the external C function*/
value = score(gene, size);
PyMem_Free(gene);
/* Build the output tuple */
PyObject *ret = Py_BuildValue("i", value);
return ret;
}
This works but seems to leak memory and at a rate I can't ignore. I made sure that the leak is happening in the shown function by temporarily making the score function just return 0 and still saw the leaking behavior. I had thought that the call to PyMem_Free should take care of the PyMem_Malloc'ed storage but my current guess is that something in this function is getting allocated and retained on each call since the leaking behavior is proportional to the number of calls to this function. Am I not doing the sequence to array conversion correctly or am I possibly returning the ending value inefficiently? Any help is appreciated.
seq is a new Python object so you will need delete that object. You should check if seq is NULL, too.
Something like (untested):
static PyObject *module_score(PyObject *self, PyObject *args) {
int i, size, value, *gene;
long temp;
PyObject *seq, *data;
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "O", &data))
return NULL;
if (!(seq = PySequence_Fast(data, "expected a sequence")))
return NULL;
size = PySequence_Size(seq);
gene = (int*) PyMem_Malloc(size * sizeof(int));
for (i = 0; i < size; i++) {
temp = PyInt_AsLong(PySequence_Fast_GET_ITEM(seq, i));
if (temp == -1 && PyErr_Occurred()) {
Py_DECREF(seq);
PyErr_SetString(PyExc_ValueError, "an integer value is required");
return NULL;
}
/* Do whatever you need to verify temp will fit in an int */
gene[i] = (int*)temp;
}
/* Call the external C function*/
value = score(gene, size);
PyMem_Free(gene);
Py_DECREF(seq):
/* Build the output tuple */
PyObject *ret = Py_BuildValue("i", value);
return ret;
}

SWIG -- Using typemap inside of extend

I have a c++ class written and I am using SWIG to make a Python version of my class. I would like to overload the constructor so that it can take in Python lists. For example:
>>> import example
>>> a = example.Array([1,2,3,4])
I was attempting to use the typemap feature in swig, but the scope of typemap does not include code in extend
Here is a similar example to what I have...
%typemap(in) double[]
{
if (!PyList_Check($input))
return NULL;
int size = PyList_Size($input);
int i = 0;
$1 = (double *) malloc((size+1)*sizeof(double));
for (i = 0; i < size; i++)
{
PyObject *o = PyList_GetItem($input,i);
if (PyNumber_Check(o))
$1[i] = PyFloat_AsDouble(o);
else
{
PyErr_SetString(PyExc_TypeError,"list must contain numbers");
free($1);
return NULL;
}
}
$1[i] = 0;
}
%include "Array.h"
%extend Array
{
Array(double lst[])
{
Array *a = new Array();
...
/* do stuff with lst[] */
...
return a;
}
}
I know the typemap is working correctly (I wrote a small test function that just prints out elements in the double[]).
I attempted putting the typemap inside the extend clause, but that did not solve the problem.
Maybe there is another way to use Python Lists inside of the extend, but I could not find any examples.
Thanks for the help in advance.
You're really close: instead of a double lst[], extend with std::list<double>:
%include "std_list.i" // or std_vector.i
%include "Array.h"
%extend Array
{
Array(const std::list<double>& numbers) {
Array* arr = new Array;
...put numbers list items in "arr", then
return a; // interpreter will take ownership
}
}
SWIG should automatically convert the Python list to the std::list.

What is the proper usage of PyArg_ParseTuple

I am using what seems to be the exact usgae of PyArg_ParseTuple, yet the code is still failing to work. I am using python 2.7
This is my C code for the Python Extension I am writing:
static PyObject* tpp(PyObject* self, PyObject* args)
{
PyObject* obj;
PyObject* seq;
int i, len;
PyObject* item;
int arrayValue, temp;
if (!PyArg_ParseTuple(args, "O", &obj)){
printf("Item is not a list\n");
return NULL;
}
seq = PySequence_Fast(obj, "expected a sequence");
len = PySequence_Size(obj);
arrayValue = -5;
printf("[\n");
for (i = 0; i < len; i++) {
item = PySequence_Fast_GET_ITEM(seq, i);
// printf("%d : %d, PyArg: ", item, *item);
// PyArg_ParseTuple(item, "I", &temp);
PyObject* objectsRepresentation = PyObject_Repr(item);
const char* s = PyString_AsString(objectsRepresentation);
printf("%s\n", s);
PyObject* objType = PyObject_Type(item);
PyObject* objTypeString = PyObject_Repr(objType);
const char* sType = PyString_AsString(objTypeString);
printf("%s\n", sType);
if (PyArg_ParseTuple(item, "i", &arrayValue) != 0){
printf("%d\n", arrayValue);
printf("horray!\n");
}
}
Py_DECREF(seq);
printf("]\n");
printf("Item is a list!\n");
Py_RETURN_NONE;
}
Then I just build the extension and go to the terminal
import et
and then
et.tpp([1,2])
fails to print the line
if (PyArg_ParseTuple(item, "i", &arrayValue) != 0){
printf("%d\n", arrayValue);
printf("horray!\n");
}
I checked the type, as you can see in the code, of the elements in the list, and it prints 'int'. Yet for some reason PyArg_ParseTuple is having errors.
I need to be able to access information from lists in python to copy some data, pass it to my C code elsewhere, and then return the result to python.
Thank you so much!
The answer is to use long PyInt_AsLong(PyObject *io)
"long PyInt_AsLong(PyObject *io) Will first attempt to cast the object to a PyIntObject, if it is not already one, and then return its value. If there is an error, -1 is returned, and the caller should check PyErr_Occurred() to find out whether there was an error, or whether the value just happened to be -1."
This is from http://docs.python.org/2/c-api/int.html That is the official c python int objects documentation which has all relevant methods.
Unfortunately this returns only a long value. However, a simple cast should suffice if the expected values will be small.
PyArg_ParseTuple() is about parsing tuples only, as the name suggests. In your code, item is an int, not a tuple. In order to convert an int object to a C value, you need to use arrayValue = PyInt_AsLong(item). Note that it returns a C long, not an int, so you should declare arrayValue as a long.
(EDIT: previously I mentioned PyInt_FromLong by mistake.)

Categories

Resources