operator.index with custom class instance

operator.index with custom class instance - python

I have a simple class below,
class MyClass(int):
def __index__(self):
return 1
According to operator.index documentation,
operator.index(a)
Return a converted to an integer. Equivalent to a.__index__()
But when I use operator.index with MyClass instance, I got 100 instead of 1 (I am getting 1 if I use a.__index__()). Why is that?.
>>> a = MyClass(100)
>>>
>>> import operator
>>> print(operator.index(a))
100
>>> print(a.__index__())
1

This actually appears to be a deep-rooted issue in cpython. If you look at the source code for operator.py, you can see the definition of index:
def index(a):
"Same as a.__index__()."
return a.__index__()
So...why is it not equivalent? It's literally calling __index__. Well, at the bottom of the source, there's the culprit:
try:
from _operator import *
except ImportError:
pass
else:
from _operator import __doc__
It's overwriting the definitions with a native _operator module. In fact, if you comment this out (either by modifying the actual library or making your own fake operator.py* and importing that), it works. So, we can find the source code for the native _operator library, and look at the related part:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
{
return PyNumber_Index(a);
}
So, it's a wrapper around the PyNumber_Index function. PyNumber_Index is a wrapper around _PyNumber_Index, so we can look at that:
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item;
}
if (!_PyIndex_Check(item)) {
PyErr_Format(PyExc_TypeError,
"'%.200s' object cannot be interpreted "
"as an integer", Py_TYPE(item)->tp_name);
return NULL;
}
result = Py_TYPE(item)->tp_as_number->nb_index(item);
if (!result || PyLong_CheckExact(result))
return result;
if (!PyLong_Check(result)) {
PyErr_Format(PyExc_TypeError,
"__index__ returned non-int (type %.200s)",
Py_TYPE(result)->tp_name);
Py_DECREF(result);
return NULL;
}
/* Issue #17576: warn if 'result' not of exact type int. */
if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
"__index__ returned non-int (type %.200s). "
"The ability to return an instance of a strict subclass of int "
"is deprecated, and may be removed in a future version of Python.",
Py_TYPE(result)->tp_name)) {
Py_DECREF(result);
return NULL;
}
return result;
}
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
You can see before it even calls nb_index (the C name for __index__), it calls PyLong_Check on the argument, and if it's true, it just returns the item with no modification. PyLong_Check is a macro that checks for long subtyping (int in python is a PyLong):
#define PyLong_Check(op) \
PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LONG_SUBCLASS)
#define PyLong_CheckExact(op) Py_IS_TYPE(op, &PyLong_Type)
So, basically, the takeaway is that for whatever reason, probably for speed, int subclasses don't get their __index__ method called, and instead just get _PyLong_Copy'd to the resulting return value, but only in the native _operator module, and not in the non-native operator.py. This conflict of implementation as well as inconsistency in documentation leads me to believe that this is an issue, either in the documentation or the implementation, and you may want to raise it as one.
It's likely a documentation and not an implementation issue, as cpython has a habit of sacrificing correctness for speed: (nan,) == (nan,) but nan != nan.
* You may have to name it something like fake_operator.py then import it with import fake_operator as operator

This is because your type is an int subclass. __index__ will not be used because the instance is already an integer. That much is by design, and unlikely to be considered a bug in CPython. PyPy behaves the same.
In _operator.c:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
/*[clinic end generated code: output=d972b0764ac305fc input=6f54d50ea64a579c]*/
{
return PyNumber_Index(a);
}
Note that operator.py Python code is not used generally, this code is only a fallback in the case that compiled _operator module is not available. That explains why the result a.__index__() differs.
In abstract.c, cropped after the relevant PyLong_Check part:
/* Return an exact Python int from the object item.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
...
/* Return a Python int from the object item.
Can return an instance of int subclass.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item; /* <---- short-circuited here */
}
...
}
The documentation for operator.index is inaccurate, so this may be considered a minor documentation issue:
>>> import operator
>>> operator.index.__doc__
'Same as a.__index__()'
So, why isn't __index__ considered for integers? The probable answer is found in PEP 357, under the discussion section titled Speed:
Implementation should not slow down Python because integers and long integers used as indexes will complete in the same number of instructions. The only change will be that what used to generate an error will now be acceptable.
We do not want to slow down the most common case for slicing with integers, having to check for an nb_index slot every time.

Update
This answer is incorrect; I misread the documentation. See Aplet123's answer instead. Tl;dr the problem is actually that the C implementation doesn't match the documentation and Python implementation. The C implementation is more like a if isinstance(a, int) else a.__index__().
To prove it, try defining MyClass.__int__(). The outcome will be the same.
Original answer
See the documentation for object.__index__():
object.__index__(self)
Called to implement operator.index(), and whenever Python needs to losslessly convert the numeric object to an integer object (such as in slicing, or in the built-in bin(), hex() and oct() functions). Presence of this method indicates that the numeric object is an integer type. Must return an integer.
If __int__(), __float__() and __complex__() are not defined then corresponding built-in functions int(), float() and complex() fall back to __index__().
(added bold)
a.__int__() exists, so its return value is used instead.
>>> a.__int__
<method-wrapper '__int__' of MyClass object at 0x7f2c5f0f4ec8>
>>> a.__int__()
100

Related

Updating elements of an array using the Python3/C API

I have a module method which takes in a python list, and then outputs the same list with all items multiplied by 100.
I've attemped to follow the C intro here as close as possible but still running into issues.
static PyObject *
test_update_list(PyObject *self, PyObject *args)
{
PyObject *listObj = NULL;
PyObject *item = NULL;
PyObject *mult = PyLong_FromLong(100);
PyObject *incremented_item = NULL;
if (!PyArg_ParseTuple(args, "O", &listObj))
{
return NULL;
}
/* get the number of lines passed to us */
Py_ssize_t numLines = PyList_Size(listObj);
/* should raise an error here. */
if (numLines < 0) return NULL; /* Not a list */
for (Py_ssize_t i=0; i<numLines; i++) {
// pick the item
item = PyList_GetItem(listObj, i);
if (mult == NULL)
goto error;
// increment it
incremented_item = PyNumber_Add(item, mult);
if (incremented_item == NULL)
goto error;
// update the list item
if (PyObject_SetItem(listObj, i, incremented_item) < 0)
goto error;
}
error:
Py_XDECREF(item);
Py_XDECREF(mult);
Py_XDECREF(incremented_item);
return listObj;
};
The above complies fine, however when I run in ipython, I get the below error.
If I take away the error handling I get a seg fault.
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
SystemError: null argument to internal routine
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
<ipython-input-3-da275aa3369f> in <module>()
----> 1 testadd.test_update_list([1,2,3])
SystemError: <built-in function ulist> returned a result with an error set
Any help is appreciated.

So you have a number of issues that all need to be corrected. I've listed them all under separate headings so you can go through them one at a time.
Always returning listObj
When you get an error in your for loop, you would goto the error label, which was still returning the list. By returning this list you hide that there was an error in your function. You must always return NULL when you expect your function to raise an exception.
Does not increment listObj ref count on return
When your function is invoked you are given a borrowed reference to your arguments. When you return one of those arguments you are creating a new reference to your list, and so must increment its reference count. Otherwise the interpreter will have a reference count that is one lower than the number of actual references to the object. This will end up with a bug where the interpreter deallocates your list when there is only 1 reference rather than 0! This could result in a seg fault, or it could in the worst case scenario result in random parts of the program access the that has since been deallocated and allocated for some other object.
Uses PyObject_SetItem with primitive
PyObject_SetItem can be used with dicts and other class that implements obj[key] = val. So you cannot supply it with an argument of type Py_ssize_t. Instead, use PyList_SetItem which only accepts Py_ssize_t as its index argument.
Bad memory handling of item and incremented_item
PyObject_SetItem and PyList_SetItem both handle decreasing the reference count of the object that was already at the position that was being set. So we don't need to worry about managing the reference count of item as we are only working with a reference borrowed from the list. These pair of functions also steal a reference to incremented_item, and so we don't need to worry about managing its reference count either.
Memory leak on incorrect arguments
For example, when you call your function with an int. You will create a new reference to the 100 int object, but because you return NULL rather than goto error, this reference will be lost. As such you need to handle such scenarios differently. In my solution, I move the PyLong_FromLong call to after the arg and type checking. In this way we are only create this new* object once we are guaranteed it will be used.
Working code
Side note: I removed the goto statements as there was only one left, and so it made more sense to do the error handling at that point rather than later.
static PyObject *
testadd_update_list(PyObject *self, PyObject *args)
{
PyObject *listObj = NULL;
PyObject *item = NULL;
PyObject *mult = NULL;
PyObject *incremented_item = NULL;
Py_ssize_t numLines;
if (!PyArg_ParseTuple(args, "O:update_list", &listObj))
{
return NULL;
}
if (!PyList_Check(listObj)) {
PyErr_BadArgument();
return NULL;
}
/* get the number of lines passed to us */
// Don't want to rely on the error checking of this function as it gives a weird stack trace.
// Instead, we use Py_ListCheck() and PyErr_BadArgument() as above. Since list is definitely
// a list now, then PyList_Size will never throw an error, and so we could use
// PyList_GET_SIZE(listObj) instead.
numLines = PyList_Size(listObj);
// only initialise mult here, otherwise the above returns would create a memory leak
mult = PyLong_FromLong(100);
if (mult == NULL) {
return NULL;
}
for (Py_ssize_t i=0; i<numLines; i++) {
// pick the item
// It is possible for this line to raise an error, but our invariants should
// ensure no error is ever raised. `list` is always of type list and `i` is always
// in bounds.
item = PyList_GetItem(listObj, i);
// increment it, and check for type errors or memory errors
incremented_item = PyNumber_Add(item, mult);
if (incremented_item == NULL) {
// ERROR!
Py_DECREF(mult);
return NULL;
}
// update the list item
// We definitely have a list, and our index is in bounds, so we should never see an error
// here.
PyList_SetItem(listObj, i, incremented_item);
// PyList_SetItem steals our reference to incremented_item, and so we must be careful in
// how we handle incremented_item now. Either incremented_item will not be our
// responsibility any more or it is NULL. As such, we can just remove our Py_XDECREF call
}
// success!
// We are returning a *new reference* to listObj. We must increment its ref count as a result!
Py_INCREF(listObj);
Py_DECREF(mult);
return listObj;
}
Footnote:
* PyLong_FromLong(100) doesn't actually create a new object, but rather returns a new reference to an existing object. Integers with low values (0 <= i < 128 I think) are all cached and this same object is returned when needed. This is an implementation detail that is meant to avoid high levels of allocating and deallocating integers for small values, and so improve the performance of Python.

Python C extension segfault

I'm venturing into C extensions for the first time, and am somewhat new to C as well. I've got a working C extension, however, if i repeatedly call the utility in python, I eventually get a segmentation fault: 11.
#include <Python.h>
static PyObject *getasof(PyObject *self, PyObject *args) {
PyObject *fmap;
long dt;
if (!PyArg_ParseTuple(args, "Ol", &fmap, &dt))
return NULL;
long length = PyList_Size(fmap);
for (int i = 0; i < length; i++) {
PyObject *event = PyList_GetItem(fmap, i);
long dti = PyInt_AsLong(PyList_GetItem(event, 0));
if (dti > dt) {
PyObject *output = PyList_GetItem(event, 1);
return output;
}
}
Py_RETURN_NONE;
};
The function args are
a time series (list of lists): ex [[1, 'a'], [5, 'b']]
a time (long): ex 4
And it's supposed to iterate over the list of lists til it finds a value greater than the time given. Then return that value. As I mentioned, it correctly returns the answer, but if I call it enough times, it segfaults.
My gut feeling is that this has to do with reference counting, but I'm not familiar enough with the concept to know if this is the direct cause.
Any help would be appreciated.

"My gut feeling is that this has to do with reference counting..." Your instincts are correct.
PyList_GetItem returns a borrowed reference, which means your function doesn't "own" a reference to the item. So there is a problem here:
PyObject *output = PyList_GetItem(event, 1);
return output;
You don't own a reference to the item, but you return it to the caller, so the caller doesn't own a reference either. The caller will run into a problem if the item is garbage collected while the caller is still trying to use it. So you'll need to increase the reference count of the item before you return it:
PyObject *output = PyList_GetItem(event, 1);
Py_INCREF(output);
return output;
That assumes that PyList_GetItem(event, 1) doesn't fail! Except for PyArg_ParseTuple, you aren't checking the return values of the C API functions, which means you are assuming the input argument always has the exact structure that you expect. That's fine while you're testing code and figuring out how this works, but eventually you should be checking the return values of the C API functions for failure, and handling it appropriately.

Do PyObject_GetItem and PyObject_SetItem work on PyType_List and PyType_Dict types?

Documentation for PyObject_GetItem and PyObject_SetItem here states:
PyObject* PyObject_GetItem(PyObject *o, PyObject *key)
Return value: New reference.
Return element of o corresponding to the object key or NULL on failure.
This is the equivalent of the Python expression o[key].
int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *v)
Map the object key to the value v. Returns -1 on failure.
This is the equivalent of the Python statement o[key] = v.
foo[key] syntax implies PyType_Dict
However, it doesn't state whether it also works on PyType_List, in which case key would be an index, i.e. a positive PyType_Long, or maybe a type converts into it, e.g. a PyType_Bytes containing "42".
Does this function work for both containers?
I would expect it to; such a design to be in keeping with Python's "it does everything you would expect it to do" philosophy.
Furthermore, the project I'm looking at has a comment forewarning:
// PyObject_SetItem is too weird to be using from C++
// so it is intentionally omitted.
Should I be worried about this? What could it possibly mean? And has it been fixed for Python3?

Both functions work for all containers that support indexed accesses, be they dict, list, tuple, string, bytes, and so on.
I'm not sure why PyCXX has that comment; it may be due to the fact that Python's dynamic typing does not always mesh well with languages with static typing.

The answer is that it supports both!
You can find from the source code:
pi#piBookAir.local ~ /Users/pi/Downloads/Python-3.4.2:
~ grep -R "PyObject_GetItem" .
:
./Objects/abstract.c:PyObject_GetItem(PyObject *o, PyObject *key)
:
And looking in abstract.c:
PyObject *
PyObject_GetItem(PyObject *o, PyObject *key)
{
PyMappingMethods *m;
if (o == NULL || key == NULL)
return null_error();
m = o->ob_type->tp_as_mapping;
if (m && m->mp_subscript)
return m->mp_subscript(o, key);
if (o->ob_type->tp_as_sequence) {
if (PyIndex_Check(key)) {
Py_ssize_t key_value;
key_value = PyNumber_AsSsize_t(key, PyExc_IndexError);
if (key_value == -1 && PyErr_Occurred())
return NULL;
return PySequence_GetItem(o, key_value);
}
else if (o->ob_type->tp_as_sequence->sq_item)
return type_error("sequence index must "
"be integer, not '%.200s'", key);
}
return type_error("'%.200s' object is not subscriptable", o);
}

Define eq for Python C extension type

I'm having trouble trying to implement __eq__ for a Rect class I wrote as a C extension. I tried defining a method called __eq__, but Python seems to override it.
static PyObject *
Rect___eq__(Rect *self, PyObject *other)
{
Rect *rect = (Rect *) other;
if (self->x != rect->x || self->y != rect->y ||
self->width != rect->width || self->height != rect->height) {
Py_RETURN_FALSE;
} else {
Py_RETURN_TRUE;
}
}
static PyMethodDef Rect_methods[] = {
{"__eq__", (PyCFunction)Rect___eq__, METH_VARARGS,
"Compare Rects" },
{NULL} /* Sentinel */
};
It seems no matter what I do, Python defaults to "is" behavior:
>>> a = Rect(1, 2, 3, 4)
>>> b = Rect(1, 2, 3, 4)
>>> a == b
False
>>> a == a
True

When working with new types defined in C, you need to define tp_richcompare. Below is an implementation of rich compare for a type that always compares larger than all other types (except itself):
static PyObject *
Largest_richcompare(PyObject *self, PyObject *other, int op)
{
PyObject *result = NULL;
if (UndefinedObject_Check(other)) {
result = Py_NotImplemented;
}
else {
switch (op) {
case Py_LT:
result = Py_False;
break;
case Py_LE:
result = (LargestObject_Check(other)) ? Py_True : Py_False;
break;
case Py_EQ:
result = (LargestObject_Check(other)) ? Py_True : Py_False;
break;
case Py_NE:
result = (LargestObject_Check(other)) ? Py_False : Py_True;
break;
case Py_GT:
result = (LargestObject_Check(other)) ? Py_False : Py_True;
break;
case Py_GE:
result = Py_True;
break;
}
}
Py_XINCREF(result);
return result;
}
If you are using Python 3.x, you add it to the type object like this:
(richcmpfunc)&Largest_richcompare, /* tp_richcompare */
If you are using Python 2.x, there is an extra step involved. Rich comparisons were added during the lifetime of Python 2.x and for a few versions of Python, a C extension could optionally define tp_richcomare. To inform Python 2.x that your type implements rich comparisons, you need to modify tp_flags by or-ing in Py_TPFLAGS_HAVE_RICHCOMPARE.
Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_RICH_COMPARE, /* tp_flags */

When you declare your PyTypeObject, there's a field for "rich comparison" function which corresponds to __cmp__ on python functions (http://docs.python.org/py3k/extending/newtypes.html#object-comparison)(which is called in the documentations "nonrich" as opposed to __eq__, __gt__, etc. which are "rich"). Semantics aside, it basically provides the same functionality, though I'm not too sure why __eq__ doesn't work...
As another aside, I'd suggest anyone else writing C extensions classes/module to take a look at Cython, which does add a dependency (though it's only a build dependency) but makes writing extensions much less of a head-ache.

Implementing nb_inplace_add results in returning a read-only buffer object

I'm writing an implementation of the in-place add operation. But, for some reason, I sometimes get a read-only buffer as result(while I'm adding a custom extension class and an integer...).
The relevant code is:
static PyObject *
ModPoly_InPlaceAdd(PyObject *self, PyObject *other)
{
if (!ModPoly_Check(self)) {
//Since it's in-place addition the control flow should never
// enter here(I suppose)
if (!ModPoly_Check(other)) {
PyErr_SetString(PyExc_TypeError, "Neither argument is a ModPolynomial.");
return NULL;
}
return ModPoly_InPlaceAdd(other, self);
} else {
if (!PyInt_Check(other) && !PyLong_Check(other)) {
Py_INCREF(Py_NotImplemented);
return Py_NotImplemented;
}
}
ModPoly *Tself = (ModPoly *)self;
PyObject *tmp, *tmp2;
tmp = PyNumber_Add(Tself->ob_item[0], other);
tmp2 = PyNumber_Remainder(tmp, Tself->n_modulus);
Py_DECREF(tmp);
tmp = Tself->ob_item[0];
Tself->ob_item[0] = tmp2;
Py_DECREF(tmp);
return (PyObject *)Tself;
}
If instead of returning (PyObject*)Tself(or simply "self"), I raise an exception, the original object gets update correctly[checked using some printf]. If I use the Py_RETURN_NONE macro, it correctly turns the ModPoly into None (in the python side).
What am I doing wrong? I'm returning a pointer to a ModPoly object, how can this become a buffer? And I don't see any operation on those pointers.
example usage:
>>> from algebra import polynomials
>>> pol = polynomials.ModPolynomial(3,17)
>>> pol += 5
>>> pol
<read-only buffer ptr 0xf31420, size 4 at 0xe6faf0>
I've tried change the return line into:
printf("%d\n", (int)ModPoly_Check(self));
return self;
and it prints 1 when adding in-place (meaning that the value returned is of type ModPolynomial...)

According to the documentation, the inplace add operation for an object returns a new reference.
By returning self directly without calling Py_INCREF on it, your object will be freed while it is still referenced. If some other object is allocated the same piece of memory, those references would now give you the new object.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

operator.index with custom class instance - python

Related

Updating elements of an array using the Python3/C API

Python C extension segfault

Do PyObject_GetItem and PyObject_SetItem work on PyType_List and PyType_Dict types?

Define eq for Python C extension type

Implementing nb_inplace_add results in returning a read-only buffer object

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

operator.index with custom class instance - python

Related

Updating elements of an array using the Python3/C API

Python C extension segfault

Do PyObject_GetItem and PyObject_SetItem work on PyType_List and PyType_Dict types?

Define __eq__ for Python C extension type

Implementing nb_inplace_add results in returning a read-only buffer object

Categories

Resources

Define eq for Python C extension type