Equality Comparison with NumPy Instance Invokes `__bool__` - python

I have defined a class where its __ge__ method returns an instance of itself, and whose __bool__ method is not allowed to be invoked (similar to a Pandas Series).
Why is X.__bool__ invoked during np.int8(0) <= x, but not for any of the other examples? Who is invoking it? I have read the Data Model docs but I haven’t found my answer there.
import numpy as np
import pandas as pd
class X:
def __bool__(self):
print(f"{self}.__bool__")
assert False
def __ge__(self, other):
print(f"{self}.__ge__")
return X()
x = X()
np.int8(0) <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D90>.__bool__
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "<stdin>", line 4, in __bool__
# AssertionError
0 <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5DF0>
x >= np.int8(0)
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D30>
pd_ge = pd.Series.__ge__
def ge_wrapper(self, other):
print("pd.Series.__ge__")
return pd_ge(self, other)
pd.Series.__ge__ = ge_wrapper
pd_bool = pd.Series.__bool__
def bool_wrapper(self):
print("pd.Series.__bool__")
return pd_bool(self)
pd.Series.__bool__ = bool_wrapper
np.int8(0) <= pd.Series([1,2,3])
# Console output:
# pd.Series.__ge__
# 0 True
# 1 True
# 2 True
# dtype: bool

I suspect that np.int8.__le__ is defined so that instead of returning NotImplemented and letting X.__ge__ take over, it instead tries to return something like not (np.int(8) > x), and then np.int8.__gt__ raises NotImplemented. Once X.__gt__(x, np.int8(0)) returns an instance of X rather than a Boolean value, then we need to call x.__bool__() in order to compute the value of not x.
(Still trying to track down where int8.__gt__ is defined to confirm.)
(Update: not quite. int8 uses a single generic rich comparison function that simply converts the value to a 0-dimensional array, then returns the result of PyObject_RichCompare on the array and x.)
I did find this function that appears to ultimately implement np.int8.__le__:
static NPY_INLINE int
rational_le(rational x, rational y) {
return !rational_lt(y,x);
}
It's not clear to me how we avoid getting to this function if one of the arguments (like X) would not be a NumPy type. I think I give up.

TL;DR
X.__array_priority__ = 1000
The biggest hint is that it works with a pd.Series.
First I tried having X inherit from pd.Series. This worked (i.e. __bool__ no longer called).
To determine whether NumPy is using an isinstance check or duck-typing approach, I removed the explicit inheritance and added (based on this answer):
#property
def __class__(self):
return pd.Series
The operation no longer worked (i.e. __bool__ was called).
So now I think we can conclude NumPy is using a duck-typing approach. So I checked to see what attributes are being accessed on X.
I added the following to X:
def __getattribute__(self, item):
print("getattr", item)
return object.__getattribute__(self, item)
Again instantiating X as x, and invoking np.int8(0) <= x, we get:
getattr __array_priority__
getattr __array_priority__
getattr __array_priority__
getattr __array_struct__
getattr __array_interface__
getattr __array__
getattr __array_prepare__
<__main__.X object at 0x000002022AB5DBE0>.__ge__
<__main__.X object at 0x000002021A73BE50>.__bool__
getattr __array_struct__
getattr __array_interface__
getattr __array__
Traceback (most recent call last):
File "<stdin>", line 32, in <module>
np.int8(0) <= x
File "<stdin>", line 21, in __bool__
assert False
AssertionError
Ah-ha! What is __array_priority__? Who cares, really. With a little digging, all we need to know is that NDFrame (from which pd.Series inherits) sets this value as 1000.
If we add X.__array_priority__ = 1000, it works! __bool__ is no longer called.
What made this so difficult (I believe) is that the NumPy code didn't show up in the call stack because it is written in C. I could investigate further if I tried out the suggestion here.

Related

behavior of builtin functions when assigned as class attributes

I would like to assign a function as class attribute, and have it so when accessed through instance, it is still unbounded. I understand that this can be achieved with using staticmethod descriptor. But it seems the behavior is different for the builtin functions, and I would like to replicate that.
def abs_(value):
return abs(value)
class Test:
func_1 = abs
func_2 = len
func_3 = abs_
func_4 = staticmethod(abs_)
>>> test = Test()
>>> test.func_1
<built-in function abs>
>>> test.func_2
<built-in function len>
>>> test.func_3
<bound method abs_ of <__main__.Test object at 0x10436d910>>
In this case, the builtin function are unbound, and the defined function abs_ is bound to the instance. And obviously all functions work except func_3 since it is bound method.
>>> test.func_1(-1)
1
>>> test.func_3(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: abs_() takes 1 positional argument but 2 were given
How does builtin function achieve this, and is there a way to replicate the behavior (remain unbound)? Thank you!

compare two custom lists python

I'm having trouble comparing two list of objects in python
I'm converting a message into
class StatusMessage(object):
def __init__(self, conversation_id, platform):
self.__conversation_id = str(conversation_id)
self.__platform = str(platform)
#property
def conversation_id(self):
return self.__conversation_id
#property
def platform(self):
return self.__platform
Now when I create two lists of type StatusMessage
>>> expected = []
>>> expected.append(StatusMessage(1, "abc"))
>>> expected.append(StatusMessage(2, "bbc"))
>>> actual = []
>>> actual.append(StatusMessage(1, "abc"))
>>> actual.append(StatusMessage(2, "bbc"))
and then I compare the two lists using
>>> cmp(actual, expected)
or
>>> len(set(expected_messages_list).difference(actual_list)) == 0
I keep getting failures.
When I debug and actually compare for each item within the list like
>>> actual[0].conversation_id == expected[0].conversation_id
>>> actual[0].platform == expected[0].platform
then I always see
True
Doing below returns -1
>>> cmp(actual[0], expected[0])
why is this so. What am I missing???
You must tell python how to check two instances of class StatusMessage for equality.
For example, adding the method
def __eq__(self,other):
return (self is other) or (self.conversation_id, self.platform) == (other.conversation_id, other.platform)
will have the following effect:
>>> cmp(expected,actual)
0
>>> expected == actual
True
If you want to use cmp with your StatusMessage objects, consider implementing the __lt__ and __gt__ methods as well. I don't know by which rule you want to consider one instance lesser or greater than another instance.
In addition, consider returning False or error-checking for comparing a StatusMessage object with an arbitrary object that has no conversation_id or platform attribute. Otherwise, you will get an AttributeError:
>>> actual[0] == 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "a.py", line 16, in __eq__
return (self is other) or (self.conversation_id, self.platform) == (other.conversation_id, other.platform)
AttributeError: 'int' object has no attribute 'conversation_id'
You can find one reason why the self is other check is a good idea here (possibly unexpected results in multithreaded applications).
Because you are trying to compare two custom objects, you have to define what makes the objects equal or not. You do this by defining the __eq__() method on the StatusMessage class:
class StatusMessage(object):
def __eq__(self, other):
return self.conversation_id == other.conversation_id and
self.platform == other.platform

Fraction object doesn't have __int__ but int(Fraction(...)) still works

In Python, when you have an object you can convert it to an integer using the int function.
For example int(1.3) will return 1. This works internally by using the __int__ magic method of the object, in this particular case float.__int__.
In Python Fraction objects can be used to construct exact fractions.
from fractions import Fraction
x = Fraction(4, 3)
Fraction objects lack an __int__ method, but you can still call int() on them and get a sensible integer back. I was wondering how this was possible with no __int__ method being defined.
In [38]: x = Fraction(4, 3)
In [39]: int(x)
Out[39]: 1
The __trunc__ method is used.
>>> class X(object):
def __trunc__(self):
return 2.
>>> int(X())
2
__float__ does not work
>>> class X(object):
def __float__(self):
return 2.
>>> int(X())
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
int(X())
TypeError: int() argument must be a string, a bytes-like object or a number, not 'X'
The CPython source shows when __trunc__ is used.

Method inside a method in Python

I have seen source code where more than one methods are called on an object eg x.y().z() Can someone please explain this to me, does this mean that z() is inside y() or what?
This calls the method y() on object x, then the method z() is called on the result of y() and that entire line is the result of method z().
For example
friendsFavePizzaToping = person.getBestFriend().getFavoritePizzaTopping()
This would result in friendsFavePizzaTopping would be the person's best friend's favorite pizza topping.
Important to note: getBestFriend() must return an object that has the method getFavoritePizzaTopping(). If it does not, an AttributeError will be thrown.
Each method is evaluated in turn, left to right. Consider:
>>> s='HELLO'
>>> s.lower()
'hello'
>>> s='HELLO '
>>> s.lower()
'hello '
>>> s.lower().strip()
'hello'
>>> s.lower().strip().upper()
'HELLO'
>>> s.lower().strip().upper().replace('H', 'h')
'hELLO'
The requirement is that the object to the left in the chain has to have availability of the method on the right. Often that means that the objects are similar types -- or at least share compatible methods or an understood cast.
As an example, consider this class:
class Foo:
def __init__(self, name):
self.name=name
def m1(self):
return Foo(self.name+'=>m1')
def m2(self):
return Foo(self.name+'=>m2')
def __repr__(self):
return '{}: {}'.format(id(self), self.name)
def m3(self):
return .25 # return is no longer a Foo
Notice that as a type of immutable, each return from Foo is a new object (either a new Foo for m1, m2 or a new float). Now try those methods:
>>> foo
4463545376: init
>>> foo.m1()
4463545304: init=>m1
^^^^ different object id
>>> foo
4463545376: init
^^^^ foo still the same because you need to assign it to change
Now assign:
>>> foo=foo.m1().m2()
>>> foo
4464102576: init=>m1=>m2
Now use m3() and it will be a float; not a Foo anymore:
>>> foo=foo.m1().m2().m3()
>>> foo
.25
Now a float -- can't use foo methods anymore:
>>> foo.m1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'm1'
But you can use float methods:
>>> foo.as_integer_ratio()
(1, 4)
In the case of:
x.y().z()
You're almost always looking at immutable objects. Mutable objects don't return anything that would HAVE a function like that (for the most part, but I'm simplifying). For instance...
class x:
def __init__(self):
self.y_done = False
self.z_done = False
def y(self):
new_x = x()
new_x.y_done = True
return new_x
def z(self):
new_x = x()
new_x.z_done = True
return new_x
You can see that each of x.y and x.z returns an x object. That object is used to make the consecutive call, e.g. in x.y().z(), x.z is not called on x, but on x.y().
x.y().z() =>
tmp = x.y()
result = tmp.z()
In #dawg's excellent example, he's using strings (which are immutable in Python) whose methods return strings.
string = 'hello'
string.upper() # returns a NEW string with value "HELLO"
string.upper().replace("E","O") # returns a NEW string that's based off "HELLO"
string.upper().replace("E","O") + "W"
# "HOLLOW"
The . "operator" is Python syntax for attribute access. x.y is (nearly) identical to
getattr(x, 'y')
so x.y() is (nearly) identical to
getattr(x, 'y')()
(I say "nearly identical" because it's possible to customize attribute access for a user-defined class. From here on out, I'll assume no such customization is done, and you can assume that x.y is in fact identical to getattr(x, 'y').)
If the thing that x.y() returns has an attribute z such that
foo = getattr(x, 'y')
bar = getattr(foo(), 'z')
is legal, then you can chain the calls together without needing the name foo in the middle:
bar = getattr(getattr(x, 'y')(), 'z')
Converting back to dot notation gives you
bar = getattr(x.y(), 'z')
or simply
bar = x.y().z()
x.y().z() means that the x object has the method y() and the result of x.y() object has the method z() . Now if you first want to apply the method y() on x and then on the result want to apply the z() method, you will write x.y().z(). This is like,
val = x.y()
result = val.z()
Example:
my_dict = {'key':'value'}
my_dict is a dict type object. my_dict.get('key') returns 'value' which is a str type object. now I can apply any method of str type object on it. which will be like,
my_dict.get('key').upper()
This will return 'VALUE'.
That is (sometimes a sign of) bad code.
It violates The law of Demeter. Here is a quote from Wikipedia explaining what is meant:
Each unit should have only limited knowledge about other units: only units "closely" related to the current unit.
Each unit should only talk to its friends; don't talk to strangers.
Only talk to your immediate friends.
Suppose you have a car, which itself has an engine:
class Car:
def __init__(self):
self._engine=None
#property
def engine(self):
return self._engine
#engine.setter
def engine(self, value):
self._engine = value
class Porsche_engine:
def start(self):
print("starting")
So if you make a new car and set the engine to Porsche you could do the following:
>>> from car import *
>>> c=Car()
>>> e=Porsche_engine()
>>> c.engine=e
>>> c.engine.start()
starting
If you are maing this call from an Object, it has not only knowledge of a Car object, but has too knowledge of Engine, which is bad design.
Additionally: if you do not know whether a Car has an engine, calling directly start
>>> c=Car()
>>> c.engine.start()
May result in an Error
AttributeError: 'NoneType' object has no attribute 'start'
Edit:
To avoid (further) misunterstandings and misreadings, from what I am saying.
There are two usages:
1) as I pointed out, Objects calling methods on other objects, returned from a third object is a violation of LoD. This is one way to read the question.
2) an exception to that is method chaining, which is not bad design.
And a better design would be, if the Car itself had a start()-Method which delegates to the engine.

What is the defined behavior in Python for no return statement being reached?

Given the following Python code:
def avg(a):
if len(a):
return sum(a) / len(a)
What is the language defined behavior of avg when the length of a is zero or is its behavior unspecified by the language and thus should not be counted upon in Python code?
The default return value is None.
From the documentation on Calls:
A call always returns some value, possibly None, unless it raises an exception. How this value is computed depends on the type of the callable object.
If len(a) is 0, that will be treated as a False-like value, and your return statement won't be reached. When the flow of control drops out of the bottom of a function with no explicit return statement being reached, Python functions implicitly return None:
>>> print(avg([]))
None
If len(a) is not defined - in other words, if the object has no __len__() method - you'll get a TypeError:
>>> print(avg(False))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in avg
TypeError: object of type 'bool' has no len()

Categories

Resources