Returning objects to Python from C - python

I've read the documentation for the Python C-API, and even written a few extension modules. However, I'm still a bit unclear on the exact semantics when it comes to returning Python objects from a C function.
The limited examples in the Python docs usually show a C function which returns the result of Py_BuildValue. Now, Py_BuildValue returns a New Reference, and transfers ownership of this reference over to the interpreter. So, can I extrapolate from this that it is a general rule that any object returned to Python must be a new reference, and that returning an object from a C function is the same as transferring ownership of the object over to the interpreter?
If so, what about cases where you return an object that is already owned by something? For example, suppose you write a C function which takes in a PyObject* which is a tuple, and you call PyTuple_GetItem on it and return the result. PyTuple_GetItem returns a borrowed reference - meaning that the item is still "owned" by the tuple. So, does a C function which returns the result of something like PyTuple_GetItem have to INCREF the result before returning it to the interpreter?
For example:
static PyObject* my_extension_module(PyObject* tup)
{
PyObject* item = PyTuple_GetItem(tup, 1);
if (!item) { /* handle error */ }
return item; // <--- DO WE NEED TO INCREF BEFORE RETURNING HERE?
}

Python expects any function you expose to it to return a new reference, yes. If you only have a borrowed reference, you have to call Py_INCREF to give a new reference. If you return a borrowed reference, Python will proceed to call Py_DECREF (when it's done with the reference), and eventually that will cause the object to be freed while it is still in use.
(It's not uncommon for that "eventual" freeing to happen during interpreter exit, in which case it may go unnoticed, but it's still a mistake to return borrowed references.)

Related

How does PyObject.ob_type get initialized/set in CPython 2.7

I am investigating how isinstance() func works for CPython 2.7
Now I have an example with two Python files: lib1.py lib2.py
In lib1.py:
from a.b import lib2
def func_h():
ob = lib2.X()
print(isinstance(ob, lib2.X))
In lib2.py:
class X(object):
a = 1
The results print: True
Then I dig into CPython source code to
https://github.com/python/cpython/blob/ad65d09fd02512b2ccf500f6c11063f705c9cd28/Objects/abstract.c#L2945
where CPython did this check:
if (Py_TYPE(inst) == (PyTypeObject *)cls)
return 1;
where Py_TYPE() is macro
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
Does anyone know the clues how CPython inits or sets the ob_type during the program starting?
Normally, this happens in PyObject_Init (or PyObject_InitVar, which I won't mention again, but there are equivalent variations across the board), or the PyObject_INIT macro (which does the same thing in a faster way, but one that isn't guaranteed to be binary-compatible with other interpreter builds on the same platform). The docs for PyObject_Init say:
Initialize a newly-allocated object op with its type and initial reference. Returns the initialized object. If type indicates that the object participates in the cyclic garbage detector, it is added to the detector’s set of observed objects. Other fields of the object are not affected.
You can see the source in object.c:
PyObject *
PyObject_Init(PyObject *op, PyTypeObject *tp)
{
if (op == NULL)
return PyErr_NoMemory();
/* Any changes should be reflected in PyObject_INIT (objimpl.h) */
Py_TYPE(op) = tp;
_Py_NewReference(op);
return op;
}
For more details, see the comments in objimpl.h.
When an object is constructed from Python (or via the high-level C API):
The type's __new__ method or tp_new slot gets called.
This usually inherits from or supers to object_new, which calls PyType_GenericNew.
… or it delegates to some other constructor (which ultimately gets you back here)
… or returns some already existing object
… but if not, it must call tp_alloc manually.
PyType_GenericNew calls the type's tp_alloc slot (there's no Python special method for this).
This usually inherits from or supers to PyType_GenericAlloc, which calls the PyObject_INIT macro.
… but if not, tp_alloc must call one of the PyObject_Init-family functions or macros, or do the same thing itself.
Code in C extension modules, and internal interpreter code may:
Use the same high-level API
… or call PyObject_New, which allocates the object and calls PyObject_Init on it, and casts the result pointer
… or just call PyObject_Init directly (when it knows the type it's constructing doesn't customize tp_new, tp_alloc, or tp_init)
… or construct objects manually, but at some point it must call one of the PyObject_Init family directly or indirectly, or do the same thing itself, just as with custom tp_alloc
… or allocate constant objects statically rather than on the heap, like PyNone and many builtin and extension type objects, in which case the type (which also has to be a static constant, of course) is just specified in the struct initializer.

Are __next__ and __str__ invoked by the equivalent next and str functions internally?

From Learning python book 5th Edition:
Page 421, footnote2:
Technically speaking, the for loop calls the internal equivalent of I.__next__, instead of the next(I) used here, though there is rarely any difference between the two. Your manual iterations can generally use either call scheme.
What does this exactly mean? Does it mean that that I.__next__ is invoked by a C function instead of str builtin function in the forloop or any builtin iteration contexts?
Page 914:
__str__ is tried first for the print operation and the str built-in function (the internal equivalent of which print runs). It generally should return a user-friendly display.
Aside from book, Does Python calls __str__ or __next__ using C functions internally as I understood from the book?
Python C implementations use C functions that are essentially the same thing as the Python functions, in that the Python functions like str() and next() are usually thin wrappers around the C functions.
These C functions then take care of calling the right hook; this could be the C version of the hook (a slot in a structure pointing to a function), or the Python function on a class.
Now, both str() and next() are a little more than wrappers here, because there is additional functionality defined by these functions that require a little more implementation work; next() takes a 2nd argument that defines a default, for example.
So I'll take len() as an example instead. The function is defined in the builtin_len() C function:
static PyObject *
builtin_len(PyObject *self, PyObject *v)
{
Py_ssize_t res;
res = PyObject_Size(v);
if (res < 0 && PyErr_Occurred())
return NULL;
return PyInt_FromSsize_t(res);
}
Note the call PyObject_Size(); that's what C code would use to get the length of an object. The rest is just error handling and producing a Python int object.
PyObject_Size() then is implemented like this:
Py_ssize_t
PyObject_Size(PyObject *o)
{
PySequenceMethods *m;
if (o == NULL) {
null_error();
return -1;
}
m = o->ob_type->tp_as_sequence;
if (m && m->sq_length)
return m->sq_length(o);
return PyMapping_Size(o);
}
It takes a PyObject structure, finds the ob_type structure from there, which has an optional tp_as_sequence structure, which can define a sq_length function pointer. If it exists, it is called to produce the actual length. Different types can define that function, and a special C structure for Python instances can handle redirecting back to a Python method.
All this shows that Python's internal implementation uses a lot of abstractions to implement objects, allowing both C-defined types and Python classes to be treated the same, mostly. If you want to dig deeper, the Python documentation has full coverage of the C-API, including a dedicated tutorial.
Circling back to your original two functions, the internal equivalent of next() is PyIter_Next(), and str(), as used for string conversions of arbitrary objects, is PyObject_Str().

inverse of id() function to demonstarte call be ref and value [duplicate]

This question already has answers here:
Is it possible to dereference variable id's?
(4 answers)
Closed 3 years ago.
Let's say I have an id of a Python object, which I retrieved by doing id(thing). How do I find thing again by the id number I was given?
If the object is still there, this can be done by ctypes:
import ctypes
a = "hello world"
print ctypes.cast(id(a), ctypes.py_object).value
output:
hello world
If you don't know whether the object is still there, this is a recipe for undefined behavior and weird crashes or worse, so be careful.
You'll probably want to consider implementing it another way. Are you aware of the weakref module?
(Edited) The Python weakref module lets you keep references, dictionary references, and proxies to objects without having those references count in the reference counter. They're like symbolic links.
You can use the gc module to get all the objects currently tracked by the Python garbage collector.
import gc
def objects_by_id(id_):
for obj in gc.get_objects():
if id(obj) == id_:
return obj
raise Exception("No found")
Short answer, you can't.
Long answer, you can maintain a dict for mapping IDs to objects, or look the ID up by exhaustive search of gc.get_objects(), but this will create one of two problems: either the dict's reference will keep the object alive and prevent GC, or (if it's a WeakValue dict or you use gc.get_objects()) the ID may be deallocated and reused for a completely different object.
Basically, if you're trying to do this, you probably need to do something differently.
Just mentioning this module for completeness. This code by Bill Bumgarner includes a C extension to do what you want without looping throughout every object in existence.
The code for the function is quite straightforward. Every Python object is represented in C by a pointer to a PyObject struct. Because id(x) is just the memory address of this struct, we can retrieve the Python object just by treating x as a pointer to a PyObject, then calling Py_INCREF to tell the garbage collector that we're creating a new reference to the object.
static PyObject *
di_di(PyObject *self, PyObject *args)
{
PyObject *obj;
if (!PyArg_ParseTuple(args, "l:di", &obj))
return NULL;
Py_INCREF(obj);
return obj;
}
If the original object no longer exists then the result is undefined. It may crash, but it could also return a reference to a new object that's taken the location of the old one in memory.
eGenix mxTools library does provide such a function, although marked as "expert-only": mx.Tools.makeref(id)
This will do:
a = 0
id_a = id(a)
variables = {**locals(), **globals()}
for var in variables:
exec('var_id=id(%s)'%var)
if var_id == id_a:
exec('the_variable=%s'%var)
print(the_variable)
print(id(the_variable))
But I suggest implementing a more decent way.

How to pass an array from C to an embedded python script

I am running to some problems and would like some help. I have a piece code, which is used to embed a python script. This python script contains a function which will expect to receive an array as an argument (in this case I am using numpy array within the python script).
I would like to know how can I pass an array from C to the embedded python script as an argument for the function within the script. More specifically can someone show me a simple example of this.
Really, the best answer here is probably to use numpy arrays exclusively, even from your C code. But if that's not possible, then you have the same problem as any code that shares data between C types and Python types.
In general, there are at least five options for sharing data between C and Python:
Create a Python list or other object to pass.
Define a new Python type (in your C code) to wrap and represent the array, with the same methods you'd define for a sequence object in Python (__getitem__, etc.).
Cast the pointer to the array to intptr_t, or to explicit ctypes type, or just leave it un-cast; then use ctypes on the Python side to access it.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), and use struct or ctypes on the Python side to access it.
Create an object matching the buffer protocol, and again use struct or ctypes on the Python side.
In your case, you want to use numpy.arrays in Python. So, the general cases become:
Create a numpy.array to pass.
(probably not appropriate)
Pass the pointer to the array as-is, and from Python, use ctypes to get it into a type that numpy can convert into an array.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), which is already a type that numpy can convert into an array.
Create an object matching the buffer protocol, and which again I believe numpy can convert directly.
For 1, here's how to do it with a list, just because it's a very simple example (and I already wrote it…):
PyObject *makelist(int array[], size_t size) {
PyObject *l = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
return l;
}
And here's the numpy.array equivalent (assuming you can rely on the C array not to be deleted—see Creating arrays in the docs for more details on your options here):
PyObject *makearray(int array[], size_t size) {
npy_int dim = size;
return PyArray_SimpleNewFromData(1, &dim, (void *)array);
}
At any rate, however you do this, you will end up with something that looks like a PyObject * from C (and has a single refcount), so you can pass it as a function argument, while on the Python side it will look like a numpy.array, list, bytes, or whatever else is appropriate.
Now, how do you actually pass function arguments? Well, the sample code in Pure Embedding that you referenced in your comment shows how to do this, but doesn't really explain what's going on. There's actually more explanation in the extending docs than the embedding docs, specifically, Calling Python Functions from C. Also, keep in mind that the standard library source code is chock full of examples of this (although some of them aren't as readable as they could be, either because of optimization, or just because they haven't been updated to take advantage of new simplified C API features).
Skip the first example about getting a Python function from Python, because presumably you already have that. The second example (and the paragraph right about it) shows the easy way to do it: Creating an argument tuple with Py_BuildValue. So, let's say we want to call a function you've got stored in myfunc with the list mylist returned by that makelist function above. Here's what you do:
if (!PyCallable_Check(myfunc)) {
PyErr_SetString(PyExc_TypeError, "function is not callable?!");
return NULL;
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
Py_DECREF(arglist);
return result;
You can skip the callable check if you're sure you've got a valid callable object, of course. (And it's usually better to check when you first get myfunc, if appropriate, because you can give both earlier and better error feedback that way.)
If you want to actually understand what's going on, try it without Py_BuildValue. As the docs say, the second argument to [PyObject_CallObject][6] is a tuple, and PyObject_CallObject(callable_object, args) is equivalent to apply(callable_object, args), which is equivalent to callable_object(*args). So, if you wanted to call myfunc(mylist) in Python, you have to turn that into, effectively, myfunc(*(mylist,)) so you can translate it to C. You can construct a tuple like this:
PyObject *arglist = PyTuple_Pack(1, mylist);
But usually, Py_BuildValue is easier (especially if you haven't already packed everything up as Python objects), and the intention in your code is clearer (just as using PyArg_ParseTuple is simpler and clearer than using explicit tuple functions in the other direction).
So, how do you get that myfunc? Well, if you've created the function from the embedding code, just keep the pointer around. If you want it passed in from the Python code, that's exactly what the first example does. If you want to, e.g., look it up by name from a module or other context, the APIs for concrete types like PyModule and abstract types like PyMapping are pretty simple, and it's generally obvious how to convert Python code into the equivalent C code, even if the result is mostly ugly boilerplate.
Putting it all together, let's say I've got a C array of integers, and I want to import mymodule and call a function mymodule.myfunc(mylist) that returns an int. Here's a stripped-down example (not actually tested, and no error handling, but it should show all the parts):
int callModuleFunc(int array[], size_t size) {
PyObject *mymodule = PyImport_ImportModule("mymodule");
PyObject *myfunc = PyObject_GetAttrString(mymodule, "myfunc");
PyObject *mylist = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
int retval = (int)PyInt_AsLong(result);
Py_DECREF(result);
Py_DECREF(arglist);
Py_DECREF(mylist);
Py_DECREF(myfunc);
Py_DECREF(mymodule);
return retval;
}
If you're using C++, you probably want to look into some kind of scope-guard/janitor/etc. to handle all those Py_DECREF calls, especially once you start doing proper error handling (which usually means early return NULL calls peppered through the function). If you're using C++11 or Boost, unique_ptr<PyObject, Py_DecRef> may be all you need.
But really, a better way to reduce all that ugly boilerplate, if you plan to do a lot of C<->Python communication, is to look at all of the familiar frameworks designed for improving extending Python—Cython, boost::python, etc. Even though you're embedding, you're effectively doing the same work as extending, so they can help in the same ways.
For that matter, some of them also have tools to help the embedding part, if you search around the docs. For example, you can write your main program in Cython, using both C code and Python code, and cython --embed. You may want to cross your fingers and/or sacrifice some chickens, but if it works, it's amazingly simple and productive. Boost isn't nearly as trivial to get started, but once you've got things together, almost everything is done in exactly the way you'd expect, and just works, and that's just as true for embedding as extending. And so on.
The Python function will need a Python object to be passed in. Since you want that Python object to be a NumPy array, you should use one of the NumPy C-API functions for creating arrays; PyArray_SimpleNewFromData() is probably a good start. It will use the buffer provided, without copying the data.
That said, it is almost always easier to write the main program in Python and use a C extension module for the C code. This approach makes it easier to let Python do the memory management, and the ctypes module together with Numpy's cpython extensions make it easy to pass a NumPy array to a C function.

Boost.Python function pointers as class constructor argument

I have a C++ class that requires a function pointer in it's constructor (float(*myfunction)(vector<float>*))
I've already exposed some function pointers to Python.
The ideal way to use this class is something like this:
import mymodule
mymodule.some_class(mymodule.some_function)
So I tell Boost about this class like so:
class_<SomeClass>("some_class", init<float(*)(vector<float>*)>);
But I get:
error: no matching function for call to 'register_shared_ptr1(Sample (*)(std::vector<double, std::allocator<double> >*))'
when I try to compile it.
So, does anyone have any ideas on how I can fix the error without losing the flexibility gained from function pointers (ie no falling back to strings that indicate which function to call)?
Also, the main point of writing this code in C++ is for speed. So it would be nice if I was still able to keep that benefit (the function pointer gets assigned to a member variable during initialization and will get called over a million times later on).
OK, so this is a fairly difficult question to answer in general. The root cause of your problem is that there really is no python type which is exactly equivalent to a C function pointer. Python functions are sort-of close, but their interface doesn't match for a few reasons.
Firstly, I want to mention the technique for wrapping a constructor from here:
http://wiki.python.org/moin/boost.python/HowTo#namedconstructors.2BAC8factories.28asPythoninitializers.29. This lets you write an __init__ function for your object that doesn't directly correspond to an actual C++ constructor. Note also, that you might have to specify boost::python::no_init in the boost::python::class_ construction, and then def a real __init__ function later, if your object isn't default-constructible.
Back to the question:
Is there only a small set of functions that you'll usually want to pass in? In that case, you could just declare a special enum (or specialized class), make an overload of your constructor that accepts the enum, and use that to look up the real function pointer. You can't directly call the functions yourself from python using this approach, but it's not that bad, and the performance will be the same as using real function pointers.
If you want to provide a general approach that will work for any python callable, things get more complex. You'll have to add a constructor to your C++ object that accepts a general functor, e.g. using boost::function or std::tr1::function. You could replace the existing constructor if you wanted, because function pointers will convert to this type correctly.
So, assuming you've added a boost::function constructor to SomeClass, you should add these functions to your python wrapping code:
struct WrapPythonCallable
{
typedef float * result_type;
explicit WrapPythonCallable(const boost::python::object & wrapped)
: wrapped_(wrapped)
{ }
float * operator()(vector<float>* arg) const
{
//Do whatever you need to do to convert into a
//boost::python::object here
boost::python::object arg_as_python_object = /* ... */;
//Call out to python with the object - note that wrapped_
//is callable using an operator() overload, and returns
//a boost::python::object.
//Also, the call can throw boost::python::error_already_set -
//you might want to handle that here.
boost::python::object result_object = wrapped_(arg_as_python_object);
//Do whatever you need to do to extract a float * from result_object,
//maybe using boost::python::extract
float * result = /* ... */;
return result;
}
boost::python::object wrapped_;
};
//This function is the "constructor wrapper" that you'll add to SomeClass.
//Change the return type to match the holder type for SomeClass, like if it's
//held using a shared_ptr.
std::auto_ptr<SomeClass> CreateSomeClassFromPython(
const boost::python::object & callable)
{
return std::auto_ptr<SomeClass>(
new SomeClass(WrapPythonCallable(callable)));
}
//Later, when telling Boost.Python about SomeClass:
class_<SomeClass>("some_class", no_init)
.def("__init__", make_constructor(&CreateSomeClassFromPython));
I've left out details on how to convert pointers to and from python - that's obviously something that you'll have to work out, because there are object lifetime issues there.
If you need to call the function pointers that you'll pass in to this function from Python, then you'll need to def these functions using Boost.Python at some point. This second approach will work fine with these def'd functions, but calling them will be slow, because objects will be unnecessarily converted to and from Python every time they're called.
To fix this, you can modify CreateSomeClassFromPython to recognize known or common function objects, and replace them with their real function pointers. You can compare python objects' identity in C++ using object1.ptr() == object2.ptr(), equivalent to id(object1) == id(object2) in python.
Finally, you can of course combine the general approach with the enum approach. Be aware when doing this, that boost::python's overloading rules are different from C++'s, and this can bite you when dealing with functions like CreateSomeClassFromPython. Boost.Python tests functions in the order that they are def'd to see if the runtime arguments can be converted to the C++ argument types. So, CreateSomeClassFromPython will prevent single-argument constructors def'd later than it from being used, because its argument matches any python object. Be sure to put it after other single-argument __init__ functions.
If you find yourself doing this sort of thing a lot, then you might want to look at the general boost::function wrapping technique (mentioned on the same page with the named constructor technique): http://wiki.python.org/moin/boost.python/HowTo?action=AttachFile&do=view&target=py_boost_function.hpp.

Categories

Resources