Add (+) a custom class object to a numeric object in Python - python

I know how to use magic methods to define mathematical operators in a custom class.
class CustomDataClass:
def __init__(self, data):
self.data = data
def __str__(self):
return str(self.data)
def __add__(self, obj):
self.data+= obj
return self
a = CustomDataClass(1)
Then in the console you could do
print(a+1)
>>> 2
What if I want to do it in the opposite order?
print(1+a)
This would throw an error. I am curious how do I go about it. How did they do this in pandas for example?
import pandas as pd
a = pd.Series(1)
print(1+a)
>>> 0 2
dtype: int64

You can implement __radd__ which will be called if the l.h.s. __add__ method returns NotImplemented (which is the case for 1 + a).
Quoting the docs on __r*__ methods:
These functions are only called if the left operand does not support the corresponding operation and the operands are of different types. For instance, to evaluate the expression x - y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented.

Related

Modify an attribute of an already defined class in Python (and run its definition again)

I am trying to modify an already defined class by changing an attribute's value. Importantly, I want this change to propagate internally.
For example, consider this class:
class Base:
x = 1
y = 2 * x
# Other attributes and methods might follow
assert Base.x == 1
assert Base.y == 2
I would like to change x to 2, making it equivalent to this.
class Base:
x = 2
y = 2 * x
assert Base.x == 2
assert Base.y == 4
But I would like to make it in the following way:
Base = injector(Base, x=2)
Is there a way to achieve this WITHOUT recompile the original class source code?
The effect you want to achieve belongs to the realm of "reactive programing" - a programing paradigm (from were the now ubiquitous Javascript library got its name as an inspiration).
While Python has a lot of mechanisms to allow that, one needs to write his code to actually make use of these mechanisms.
By default, plain Python code as the one you put in your example, uses the Imperative paradigm, which is eager: whenever an expression is encoutered, it is executed, and the result of that expression is used (in this case, the result is stored in the class attribute).
Python's advantages also can make it so that once you write a codebase that will allow some reactive code to take place, users of your codebase don't have to be aware of that, and things work more or less "magically".
But, as stated above, that is not free. For the case of being able to redefine y when x changes in
class Base:
x = 1
y = 2 * x
There are a couple paths that can be followed - the most important is that, at the time the "*" operator is executed (and that happens when Python is parsing the class body), at least one side of the operation is not a plain number anymore, but a special object which implements a custom __mul__ method (or __rmul__) in this case. Then, instead of storing a resulting number in y, the expression is stored somewhere, and when y is retrieved either as a class attribute, other mechanisms force the expression to resolve.
If you want this at instance level, rather than at class level, it would be easier to implement. But keep in mind that you'd have to define each operator on your special "source" class for primitive values.
Also, both this and the easier, instance descriptor approach using property are "lazily evaluated": that means, the value for y is calcualted when it is to be used (it can be cached if it will be used more than once). If you want to evaluate it whenever x is assigned (and not when y is consumed), that will require other mechanisms. Although caching the lazy approach can mitigate the need for eager evaluation to the point it should not be needed.
1 - Before digging there
Python's easiest way to do code like this is simply to write the expressions to be calculated as functions - and use the property built-in as a descriptor to retrieve these values. The drawback is small:
you just have to wrap your expressions in a function (and then, that function
in something that will add the descriptor properties to it, such as property). The gain is huge: you are free to use any Python code inside your expression, including function calls, object instantiation, I/O, and the like. (Note that the other approach requires wiring up each desired operator, just to get started).
The plain "101" approach to have what you want working for instances of Base is:
class Base:
x = 1
#property
def y(self):
return self.x * 2
b = Base()
b.y
-> 2
Base.x = 3
b.y
-> 6
The work of property can be rewritten so that retrieving y from the class, instead of an instance, achieves the effect as well (this is still easier than the other approach).
If this will work for you somehow, I'd recommend doing it. If you need to cache y's value until x actually changes, that can be done with normal coding
2 - Exactly what you asked for, with a metaclass
as stated above, Python'd need to know about the special status of your y attribute when calculcating its expression 2 * x. At assignment time, it would be already too late.
Fortunately Python 3 allow class bodies to run in a custom namespace for the attribute assignment by implementing the __prepare__ method in a metaclass, and then recording all that takes place, and replacing primitive attributes of interest by special crafted objects implementing __mul__ and other special methods.
Going this way could even allow values to be eagerly calculated, so they can work as plain Python objects, but register information so that a special injector function could recreate the class redoing all the attributes that depend on expressions. It could also implement lazy evaluation, somewhat as described above.
from collections import UserDict
import operator
class Reactive:
def __init__(self, value):
self._initial_value = value
self.values = {}
def __set_name__(self, owner, name):
self.name = name
self.values[owner] = self._initial_value
def __get__(self, instance, owner):
return self.values[owner]
def __set__(self, instance, value):
raise AttributeError("value can't be set directly - call 'injector' to change this value")
def value(self, cls=None):
return self.values.get(cls, self._initial_value)
op1 = value
#property
def result(self):
return self.value
# dynamically populate magic methods for operation overloading:
for name in "mul add sub truediv pow contains".split():
op = getattr(operator, name)
locals()[f"__{name}__"] = (lambda operator: (lambda self, other: ReactiveExpr(self, other, operator)))(op)
locals()[f"__r{name}__"] = (lambda operator: (lambda self, other: ReactiveExpr(other, self, operator)))(op)
class ReactiveExpr(Reactive):
def __init__(self, value, op2, operator):
self.op2 = op2
self.operator = operator
super().__init__(value)
def result(self, cls):
op1, op2 = self.op1(cls), self.op2
if isinstance(op1, Reactive):
op1 = op1.result(cls)
if isinstance(op2, Reactive):
op2 = op2.result(cls)
return self.operator(op1, op2)
def __get__(self, instance, owner):
return self.result(owner)
class AuxDict(UserDict):
def __init__(self, *args, _parent, **kwargs):
self.parent = _parent
super().__init__(*args, **kwargs)
def __setitem__(self, item, value):
if isinstance(value, self.parent.reacttypes) and not item.startswith("_"):
value = Reactive(value)
super().__setitem__(item, value)
class MetaReact(type):
reacttypes = (int, float, str, bytes, list, tuple, dict)
def __prepare__(*args, **kwargs):
return AuxDict(_parent=__class__)
def __new__(mcls, name, bases, ns, **kwargs):
pre_registry = {}
cls = super().__new__(mcls, name, bases, ns.data, **kwargs)
#for name, obj in ns.items():
#if isinstance(obj, ReactiveExpr):
#pre_registry[name] = obj
#setattr(cls, name, obj.result()
for name, reactive in pre_registry.items():
_registry[cls, name] = reactive
return cls
def injector(cls, inplace=False, **kwargs):
original = cls
if not inplace:
cls = type(cls.__name__, (cls.__bases__), dict(cls.__dict__))
for name, attr in cls.__dict__.items():
if isinstance(attr, Reactive):
if isinstance(attr, ReactiveExpr) and name in kwargs:
raise AttributeError("Expression attributes can't be modified by injector")
attr.values[cls] = kwargs.get(name, attr.values[original])
return cls
class Base(metaclass=MetaReact):
x = 1
y = 2 * x
And, after pasting the snippet above in a REPL, here is the
result of using injector:
In [97]: Base2 = injector(Base, x=5)
In [98]: Base2.y
Out[98]: 10
The idea is complicated with that aspect that Base class is declared with dependent dynamically evaluated attributes. While we can inspect class's static attributes, I think there's no other way of getting dynamic expression except for parsing the class's sourcecode, find and replace the "injected" attribute name with its value and exec/eval the definition again. But that's the way you wanted to avoid. (moreover: if you expected injector to be unified for all classes).
If you want to proceed to rely on dynamically evaluated attributes define the dependent attribute as a lambda function.
class Base:
x = 1
y = lambda: 2 * Base.x
Base.x = 2
print(Base.y()) # 4

Special method like __str__ that returns a number representation of an object

Say I have a Python class as follows:
class TestClass():
value = 20
def __str__(self):
return str(self.value)
The __str__ method will automatically be called any time I try to use an instance of TestClass as a string, like in print. Is there any equivalent for treating it as a number? For example, in
an_object = TestClass()
if an_object > 30:
...
where some hypothetical __num__ function would be automatically called to interpret the object as a number. How could this be easily done?
Ideally I'd like to avoid overloading every normal mathematical operator.
You can provide __float__(), __int__(), and/or __complex__() methods to convert objects to numbers. There is also a __round__() method you can provide for custom rounding. Documentation here. The __bool__() method technically fits here too, since Booleans are a subclass of integers in Python.
While Python does implicitly convert objects to strings for e.g. print(), it never converts objects to numbers without you saying to. Thus, Foo() + 42 isn't valid just because Foo has an __int__ method. You have to explicitly use int() or float() or complex() on them. At least that way, you know what you're getting just by reading the code.
To get classes to actually behave like numbers, you have to implement all the special methods for the operations that numbers participate in, including arithmetic and comparisons. As you note, this gets annoying. You can, however, write a mixin class so that at least you only have to write it once. Such as:
class NumberMixin(object):
def __eq__(self, other): return self.__num__() == self.__getval__(other)
# other comparison methods
def __add__(self, other): return self.__num__() + self.__getval__(other)
def __radd__(self, other): return self.__getval__(other) + self.__num__()
# etc., I'm not going to write them all out, are you crazy?
This class expects two special methods on the class it's mixed in with.
__num__() - converts self to a number. Usually this will be an alias for the conversion method for the most precise type supported by the object. For example, your class might have __int__() and __float__() methods, but __int__() will truncate the number, so you assign __num__ = __float__ in your class definition. On the other hand, if your class has a natural integral value, you might want to provide __float__ so it can also be converted to a float, but you'd use __num__ = __int__ since it should behave like an integer.
__getval__() - a static method that obtains the numeric value from another object. This is useful when you want to be able to support operations with objects other than numeric types. For example, when comparing, you might want to be able to compare to objects of your own type, as well as to traditional numeric types. You can write __getval__() to fish out the right attribute or call the right method of those other objects. Of course with your own instances you can just rely on float() to do the right thing, but __getval__() lets you be as flexible as you like in what you accept.
A simple example class using this mixin:
class FauxFloat(NumberMixin):
def __init__(self, value): self.value = float(value)
def __int__(self): return int(self.value)
def __float__(self): return float(self.value)
def __round__(self, digits=0): return round(self.value, digits)
def __str__(self): return str(self.value)
__repr__ = __str__
__num__ = __float__
#staticmethod
def __getval__(obj):
if isinstance(obj, FauxFloat):
return float(obj)
if hasattr(type(obj), "__num__") and callable(type(obj).__num__):
return type(obj).__num__(obj) # don't call dunder method on instance
try:
return float(obj)
except TypeError:
return int(obj)
ff = FauxFloat(42)
print(ff + 13) # 55.0
For extra credit, you could register your class so it'll be seen as a subclass of an appropriate abstract base class:
import numbers
numbers.Real.register(FauxFloat)
issubclass(FauxFloat, numbers.Real) # True
For extra extra credit, you might also create a global num() function that calls __num__() on objects that have it, otherwise falling back to the older methods.
In case of numbers it a bit more complicated. But its possible! You have to override your class operators to fit your needs.
operator.__lt__(a, b) # lower than
operator.__le__(a, b) # lower equal
operator.__eq__(a, b) # equal
operator.__ne__(a, b) # not equal
operator.__ge__(a, b) # greater equial
operator.__gt__(a, b) # greater than
Python Operators
Looks like you need __gt__ method.
class A:
val = 0
def __gt__(self, other):
return self.val > other
a = A()
a.val = 12
a > 10
If you just wanna cast object to int - you should define __int__ method (or __float__).

Overriding special methods on builtin types

Can magic methods be overridden outside of a class?
When I do something like this
def __int__(x):
return x + 5
a = 5
print(int(a))
it prints '5' instead of '10'. Do I do something wrong or magic methods just can't be overridden outside of a class?
Short answer; not really.
You cannot arbitrarily change the behaviour of int() a builtin function (*which internally calls __int__()) on arbitrary builtin types such as int(s).
You can however change the behaviour of custom objects like this:
Example:
class Foo(object):
def __init__(self, value):
self.value = value
def __add__(self, other):
self.value += other
def __repr__(self):
return "<Foo(value={0:d})>".format(self.value)
Demo:
>>> x = Foo(5)
>>> x + 5
>>> x
<Foo(value=10)>
This overrides two things here and implements two special methods:
__repr__() which get called by repr()
__add__() which get called by the + operator.
Update: As per the comments above; techincally you can redefine the builtin function int; Example:
def int(x):
return x + 5
int(5) # returns 10
However this is not recommended and does not change the overall behaviour of the object x.
Update #2: The reason you cannot change the behaviour of bultin types (without modifying the underlying source or using Cuthon or ctypes) is because builtin types in Python are not exposed or mutable to the user unlike Homoiconic Languages (See: Homoiconicity). -- Even then I'm not really sure you can with Cython/ctypes; but the reason question is "Why do you want to do this?"
Update #3: See Python's documentation on Data Model (object.__complex__ for example).
You can redefine a top-level __int__ function, but nobody ever calls that.
As implied in the Data Model documentation, when you write int(x), that calls x.__int__(), not __int__(x).
And even that isn't really true. First, __int__ is a special method, meaning it's allowed to call type(x).__int__(x) rather than x.__int__(), but that doesn't matter here. Second, it's not required to call __int__ unless you give it something that isn't already an int (and call it with the one-argument form). So, it could be as if it's was written like this:
def int(x, base=None):
if base is not None:
return do_basey_stuff(x, base)
if isinstance(x, int):
return x
return type(x).__int__(x)
So, there is no way to change what int(5) will do… short of just shadowing the builtin int function with a different builtin/global/local function of the same name, of course.
But what if you wanted to, say, change int(5.5)? That's not an int, so it's going to call float.__int__(5.5). So, all we have to do is monkeypatch that, right?
Well, yes, except that Python allows builtin types to be immutable, and most of the builtin types in CPython are. So, if you try it:
>>> _real_float_int = float.__int__
>>> def _float_int(self):
... return _real_float_int(self) + 5
>>> _float_int(5.5)
10
>>> float.__int__ = _float_int
TypeError: can't set attributes of built-in/extension type 'float'
However, if you're defining your own types, that's a different story:
>>> class MyFloat(float):
... def __int__(self):
... return super().__int__() + 5
>>> f = MyFloat(5.5)
>>> int(f)
10

How can I delegate to the __add__ method of a superclass?

Say I have this class:
class MyString(str):
def someExtraMethod(self):
pass
I'd like to be able to do
a = MyString("Hello ")
b = MyString("World")
(a + b).someExtraMethod()
("a" + b).someExtraMethod()
(a + "b").someExtraMethod()
Running as is:
AttributeError: 'str' object has no attribute 'someExtraMethod'
Obviously that doesn't work. So I add this:
def __add__(self, other):
return MyString(super(MyString, self) + other)
def __radd__(self, other):
return MyString(other + super(MyString, self))
TypeError: cannot concatenate 'str' and 'super' objects
Hmmm, ok. super doesn't appear to respect operator overloading. Perhaps:
def __add__(self, other):
return MyString(super(MyString, self).__add__(other))
def __radd__(self, other):
return MyString(super(MyString, self).__radd__(other))
AttributeError: 'super' object has no attribute '__radd__'
Still no luck. What should I be doing here?
I usually use the super(MyClass, self).__method__(other) syntax, in your case this does not work because str does not provide __radd__. But you can convert the instance of your class into a string using str.
To who said that its version 3 works: it does not:
>>> class MyString(str):
... def __add__(self, other):
... print 'called'
... return MyString(super(MyString, self).__add__(other))
...
>>> 'test' + MyString('test')
'testtest'
>>> ('test' + MyString('test')).__class__
<type 'str'>
And if you implement __radd__ you get an AttributeError(see the note below).
Anyway, I'd avoid using built-ins as base type. As you can see some details may be tricky, and also you must redefine all the operations that they support, otherwise the object will become an instance of the built-in and not an instance of your class.
I think in most cases it's easier to use delegation instead of inheritance.
Also, if you simply want to add a single method, then you may try to use a function on plain strings instead.
I add here a bit of explanation about why str does not provide __radd__ and what's going on when python executes a BINARY_ADD opcode(the one that does the +).
The abscence of str.__radd__ is due to the fact that string objects implements the concatenation operator of sequences and not the numeric addition operation. The same is true for the other sequences such as list or tuple. These are the same at "Python level" but actually have two different "slots" in the C structures.
The numeric + operator, which in python is defined by two different methods(__add__ and __radd__) is actually a single C function that is called with swapped arguments to simulate a call to __radd__.
Now, you could think that implementing only MyString.__add__ would fix your problem, since str does not implement __radd__, but that's not true:
>>> class MyString(str):
... def __add__(self, s):
... print '__add__'
... return MyString(str(self) + s)
...
>>> 'test' + MyString('test')
'testtest'
As you can see MyString.__add__ is not called, but if we swap arguments:
>>> MyString('test') + 'test'
__add__
'testtest'
It is called, so what's happening?
The answer is in the documentation which states that:
For objects x and y, first x.__op__(y) is tried. If this is not
implemented or returns NotImplemented, y.__rop__(x) is tried. If this
is also not implemented or returns NotImplemented, a TypeError
exception is raised. But see the following exception:
Exception to the previous item: if the left operand is an instance of
a built-in type or a new-style class, and the right operand is an
instance of a proper subclass of that type or class and overrides the
base’s __rop__() method, the right operand’s __rop__() method is tried
before the left operand’s __op__() method.
This is done so that a subclass can completely override binary
operators. Otherwise, the left operand’s __op__() method would always
accept the right operand: when an instance of a given class is
expected, an instance of a subclass of that class is always
acceptable.
Which means you must implement all the methods of str and the __r*__ methods, otherwise you'll still have problems with argument order.
May be:
class MyString(str):
def m(self):
print(self)
def __add__(self, other):
return MyString(str(self) + other)
def __radd__(self, other):
return MyString(other + str(self))
a = MyString("Hello ")
b = MyString("World")
(a + b).m()
Update: you last version with super works for me

Reuse existing objects for immutable objects?

In Python, how is it possible to reuse existing equal immutable objects (like is done for str)? Can this be done just by defining a __hash__ method, or does it require more complicated measures?
If you want to create via the class constructor and have it return a previously created object then you will need to provide a __new__ method (because by the time you get to __init__ the object has already been created).
Here is a simple example - if the value used to initialise has been seen before then a previously created object is returned rather than a new one created:
class Cached(object):
"""Simple example of immutable object reuse."""
def __init__(self, i):
self.i = i
def __new__(cls, i, _cache={}):
try:
return _cache[i]
except KeyError:
# you must call __new__ on the base class
x = super(Cached, cls).__new__(cls)
x.__init__(i)
_cache[i] = x
return x
Note that for this example you can use anything to initialise as long as it's hashable. And just to show that objects really are being reused:
>>> a = Cached(100)
>>> b = Cached(200)
>>> c = Cached(100)
>>> a is b
False
>>> a is c
True
There are two 'software engineering' solutions to this that don't require any low-level knowledge of Python. They apply in the following scenarios:
First Scenario: Objects of your class are 'equal' if they are constructed with the same constructor parameters, and equality won't change over time after construction. Solution: Use a factory that hashses the constructor parameters:
class MyClass:
def __init__(self, someint, someotherint):
self.a = someint
self.b = someotherint
cachedict = { }
def construct_myobject(someint, someotherint):
if (someint, someotherint) not in cachedict:
cachedict[(someint, someotherint)] = MyClass(someint, someotherint)
return cachedict[(someint, someotherint)]
This approach essentially limits the instances of your class to one unique object per distinct input pair. There are obvious drawbacks as well: not all types are easily hashable and so on.
Second Scenario: Objects of your class are mutable and their 'equality' may change over time. Solution: define a class-level registry of equal instances:
class MyClass:
registry = { }
def __init__(self, someint, someotherint, third):
MyClass.registry[id(self)] = (someint, someotherint)
self.someint = someint
self.someotherint = someotherint
self.third = third
def __eq__(self, other):
return MyClass.registry[id(self)] == MyClass.registry[id(other)]
def update(self, someint, someotherint):
MyClass.registry[id(self)] = (someint, someotherint)
In this example, objects with the same someint, someotherint pair are equal, while the third parameter does not factor in. The trick is to keep the parameters in registry in sync. As an alternative to update, you could override getattr and setattr for your class instead; this would ensure that any assignment foo.someint = y would be kept synced with your class-level dictionary. See an example here.
I believe you would have to keep a dict {args: object} of instances already created, then override the class' __new__ method to check in that dictionary, and return the relevant object if it already existed. Note that I haven't implemented or tested this idea. Of course, strings are handled at the C level.

Categories

Resources