Suppose I want to have a wrapper class Image for numpy array. My goal is to let it behave just like a 2D array but with some additional functionality (which is not important here). I am doing so because inheriting numpy array is way more troublesome.
import numpy as np
class Image(object):
def __init__(self, data: np.ndarray):
self._data = np.array(data)
def __getitem__(self, item):
return self._data.__getitem__(item)
def __setitem__(self, key, value):
self._data.__setitem__(key, value)
def __getattr__(self, item):
# delegates array's attributes and methods, except dunders.
try:
return getattr(self._data, item)
except AttributeError:
raise AttributeError()
# binary operations
def __add__(self, other):
return Image(self._data.__add__(other))
def __sub__(self, other):
return Image(self._data.__sub__(other))
# many more follow ... How to avoid this redundancy?
As you can see, I want to have all the magic methods for numeric operations, just like a normal numpy array, but with the return values as Image type. So the implementations of these magic methods, i.e. the __add__, __sub__, __truediv__ and so on, are almost the same and it's kind of silly. My question is if there is a way to avoid this redundancy?
Beyond what specifically I am doing here, is there a way to code up the magic methods in one place via some meta-programming technique, or it's just impossible? I searched some about python metaclass, but it's still not clear to me.
Notice __getattr__ won't handle delegates for magic methods. See this.
Edit
Just to clarify, I understand inheritance is a general solution for a problem like this, though my experience is very limited. But I feel inheriting numpy array really isn't a good idea. Because numpy array needs to handle view-casting and ufuncs (see this). And when you use your subclass in other py-libs, you also need to think how your array subclass gets along with other array subclasses. See my stupid gh-issue. That's why I am looking for alternatives.
The magic methods are always looked up in the class and bypass getattribute entirely so you must define them in the class. https://docs.python.org/3/reference/datamodel.html#special-lookup
However, you can save yourself some typing:
import operator
def make_bin_op(oper):
def op(self, other):
if isinstance(other, Image):
return Image(oper(self._data, other._data))
else:
return Image(oper(self._data, other))
return op
class Image:
...
__add__ = make_bin_op(operator.add)
__sub__ = make_bin_op(operator.sub)
If you want you could make a dict of operator dunder names and the corresponding operators and add them with a decorator. e.g.
OPER_DICT = {'__add__' : operator.add, '__sub__' : operator.sub, ...}
def add_operators(cls):
for k,v in OPER_DICT.items():
setattr(cls, k, make_bin_op(v))
#add_operators
class Image:
...
You could use a metaclass to do the same thing. However, you probably don't want use a metaclass unless you really understand what's going on.
What you're after is the concept called inheritance, a key part of object-oriented programming (see Wikipedia here.
When you define your class with class Image(object):, what that means is that Image is a subclass of object, which is a built-in type that does very little. Your functionality is added on to that more-or-less blank concept. But if instead you defined your class with class Image(np.array):, then Image would be a subclass of array, which means it would inherit all the default functionality of the array class. Essentially, any class method you want to leave as is you simply shouldn't redefine. If you don't write a __getitem__ function, it uses the one defined in array.
If you need to add additional functionality in any of those functions, you can still redefine them (called overriding) and then use super().__getitem__ (or whatever) to access the function defined in the inherited class. This often happens with __init__ for example.
For a more thorough explanation, take a look at the chapter on inheritance in Think Python.
Related
I want to define a pair of classes that are almost identical, except that the class methods are decorated in two different ways. Currently, I just have a factory function that takes the decorator as an argument, constructs the class using that decorator, and returns the class. Greatly simplified, something like this works:
# Defined in mymodule.py
def class_factory(decorator):
class C:
#decorator
def fancy_func(self, x):
# some fanciness
return x
return C
C1 = class_factory(decorator1)
C2 = class_factory(decorator2)
And I can use these as usual:
import mymodule
c1 = mymodule.C1()
c2 = mymodule.C2()
I'm not entirely comfortable with this, for a number of reasons. First, a purely aesthetic reason: the types of both objects display as mymodule.class_factory.<locals>.C. They're not actually identical, but they look like it, and it causes problems with the documentation. Second, my class is pretty complicated. I'd actually like to use inheritance and mixins and so on, but in any case, those other classes also need access to the decorators. So currently, I make several factories, and call the parent class factories inside the child class factory, and the child inherits from the parents created in this way. But this means I can't really use the resulting parents as classes outside the factory.
So my questions are
Is there a better design pattern for this sort of thing? It would be really convenient if there were some way to use inheritance, where the decorators are actually methods in a class, and I inherit in two different ways.
Is there anything wrong with changing the <locals> part of the class name by just altering C.__qualname__ before returning?
To be a bit more specific: I want one version of the class to work extremely quickly with numpy arrays, and I want another version of the class to work with arbitrary python objects — especially sympy expressions. So for the first, I decorate with #numba.guvectorize (and relatives). This means I actually need to pass numba some signatures, so I can't just rely on numba falling back to object mode for the second case. But for simplicity, I think we can ignore the issue of signatures here. For the second case, I basically make a no-op decorator that ignores signatures and does nothing to the function.
Here's an approach using __init_subclass__. I use keyword arguments here, but you could easily change it so the decorators are defined as methods on C1 and C2 and are applied in __init_subclass__.
def passthru(f):
return f
class BaseC:
def __init_subclass__(cls, /, decorator=passthru, **kwargs):
super().__init_subclass__(**kwargs)
# if you also have class attributes or methods you don't want to decorate,
# you might need to maintain an explicit list of decoratable methods
for attr in dir(cls):
if not attr.startswith('__'):
setattr(cls, attr, decorator(getattr(cls, attr)))
def fancy_func(self, x):
# some fanciness
return x
def two(f):
return lambda self, x: "surprise"
class C1(BaseC):
pass
class C2(BaseC, decorator=two):
pass
print(C1().fancy_func(42))
print(C2().fancy_func(42))
# further subclassing
class C3(C2):
pass
print(C3().fancy_func(42))
I took #Jasmijn's suggestion of using __init_subclass__. But since I really need multiple decorators (jit, guvectorize, and sometimes neither even when using numba with other methods), I tweaked it a little. Rather than jitting every public method, I use decorators to flag methods with attributes explaining how to compile them.
I decorate the individual methods much like I would have originally, indicating whether to jit or whatnot. But these decorators don't actually do any compilation; they just add hidden attributes to the functions indicating whether and how to apply the actual decorators. Then, when a subclass is created, __init_subclass__ loops through, looking for these attributes on all the subclass's methods, and applying any requested compilation.
I turn this into a pretty general class, named Jitter below. Any class that wants the option of jitting in multiple ways can just inherit from this class and decorate methods with Jitter.jit or Jitter.guvectorize. By default, nothing much happens to those functions, so the first child class of Jitter can be used with sympy, for example. But I can also inherit from such a class while adding the relevant keyword(s) to the class definition, enabling jitting in the subclass. Here's the Jitter class:
class Jitter:
def jit(f):
f._jit = True
return f
def guvectorize(*args, **kwargs):
def wrapper(f):
f._guvectorize = (args, kwargs)
return f
return wrapper
def __init_subclass__(cls, /, jit=None, guvectorize=None, **kwargs):
super().__init_subclass__(**kwargs)
for attr_name in dir(cls):
attr = getattr(cls, attr_name)
if jit is not None and hasattr(attr, '_jit'):
setattr(cls, attr_name, jit(attr))
elif guvectorize is not None and hasattr(attr, '_guvectorize'):
args, kwargs = getattr(attr, '_guvectorize')
setattr(cls, attr_name, guvectorize(*args, **kwargs)(attr))
Now, I can inherit from this class very conveniently:
import numba as nb
class Adder(Jitter):
#Jitter.jit
def add(x, y):
return x + y
class NumbaAdder(Adder, jit=nb.njit):
pass
Here, Adder.add is a regular python function that just happens to have a _jit attribute, but NumbaAdder.add is a numba jit function. For more realistic code, I would use the same Jitter class and the same NumbaAdder class, but would put all the complexity into the Adder class.
Note that we could decorate with Adder.jit, but this would be precisely the same as decorating with Jitter.jit, because Adder.jit doesn't get changed (if at all) until after the decorators in the class definition have already been applied, so we still need to loop through and apply the jit functions with __init_subclass__.
I would like to create a class which defines a particular interface, and then require all subclasses to conform to this interface. For example, I would like to define a class
class Interface:
def __init__(self, arg1):
pass
def foo(self, bar):
pass
and then be assured that if I am holding any element a which has type A, a subclass of Interface, then I can call a.foo(2) it will work.
It looked like this question almost addressed the problem, but in that case it is up to the subclass to explicitly change it's metaclass.
Ideally what I'm looking for is something similar to Traits and Impls from Rust, where I can specify a particular Trait and a list of methods that trait needs to define, and then I can be assured that any object with that Trait has those methods defined.
Is there any way to do this in Python?
So, first, just to state the obvious - Python has a built-in mechanism to test for the existence of methods and attributes in derived classes - it just does not check their signature.
Second, a nice package to look at is zope.interface. Despte the zope namespace, it is a complete stand-alone package that allows really neat methods of having objects that can expose multiple interfaces, but just when needed - and then frees-up the namespaces. It sure involve some learning until one gets used to it, but it can be quite powerful and provide very nice patterns for large projects.
It was devised for Python 2, when Python had a lot less features than nowadays - and I think it does not perform automatic interface checking (one have to manually call a method to find-out if a class is compliant) - but automating this call would be easy, nonetheless.
Third, the linked accepted answer at How to enforce method signature for child classes? almost works, and could be good enough with just one change. The problem with that example is that it hardcodes a call to type to create the new class, and do not pass type.__new__ information about the metaclass itself. Replace the line:
return type(name, baseClasses, d)
for:
return super().__new__(cls, name, baseClasses, d)
And then, make the baseclass - the one defining your required methods use the metaclass - it will be inherited normally by any subclasses. (just use Python's 3 syntax for specifying metaclasses).
Sorry - that example is Python 2 - it requires change in another line as well, I better repost it:
from types import FunctionType
# from https://stackoverflow.com/a/23257774/108205
class SignatureCheckerMeta(type):
def __new__(mcls, name, baseClasses, d):
#For each method in d, check to see if any base class already
#defined a method with that name. If so, make sure the
#signatures are the same.
for methodName in d:
f = d[methodName]
for baseClass in baseClasses:
try:
fBase = getattr(baseClass, methodName)
if not inspect.getargspec(f) == inspect.getargspec(fBase):
raise BadSignatureException(str(methodName))
except AttributeError:
#This method was not defined in this base class,
#So just go to the next base class.
continue
return super().__new__(mcls, name, baseClasses, d)
On reviewing that, I see that there is no mechanism in it to enforce that a method is actually implemented. I.e. if a method with the same name exists in the derived class, its signature is enforced, but if it does not exist at all in the derived class, the code above won't find out about it (and the method on the superclass will be called - that might be a desired behavior).
The answer:
Fourth -
Although that will work, it can be a bit rough - since it does any method that override another method in any superclass will have to conform to its signature. And even compatible signatures would break. Maybe it would be nice to build upon the ABCMeta and #abstractmethod existind mechanisms, as those already work all corner cases. Note however that this example is based on the code above, and check signatures at class creation time, while the abstractclass mechanism in Python makes it check when the class is instantiated. Leaving it untouched will enable you to work with a large class hierarchy, which might keep some abstractmethods in intermediate classes, and just the final, concrete classes have to implement all methods.
Just use this instead of ABCMeta as the metaclass for your interface classes, and mark the methods you want to check the interface as #abstractmethod as usual.
class M(ABCMeta):
def __init__(cls, name, bases, attrs):
errors = []
for base_cls in bases:
for meth_name in getattr(base_cls, "__abstractmethods__", ()):
orig_argspec = inspect.getfullargspec(getattr(base_cls, meth_name))
target_argspec = inspect.getfullargspec(getattr(cls, meth_name))
if orig_argspec != target_argspec:
errors.append(f"Abstract method {meth_name!r} not implemented with correct signature in {cls.__name__!r}. Expected {orig_argspec}.")
if errors:
raise TypeError("\n".join(errors))
super().__init__(name, bases, attrs)
You could follow the pyspark pattern, where the method of the base class performs (optional) argument validity checking, and then calls a "non-public" method of the subclass, for example:
class Regressor():
def fit(self, X, y):
self._check_arguments(X, y)
self._fit(X, y)
def _check_arguments(self, X, y):
if True:
pass
else:
raise ValueError('Invalid arguments.')
class LinearRegressor(Regressor):
def _fit(self, X, y):
# code here
I wrote a class that can handle integers with arbitrary precision (just for learning purposes). The class takes a string representation of an integer and converts it into an instance of BigInt for further calculations.
Often times you need the numbers Zero and One, so I thought it would be helpfull if the class could return these. I tried the following:
class BigInt():
zero = BigInt("0")
def __init__(self, value):
####yada-yada####
This doesn't work. Error: "name 'BigInt' is not defined"
Then I tried the following:
class BigInt():
__zero = None
#staticmethod
def zero():
if BigInt.__zero is None:
BigInt.__zero = BigInt('0')
return BigInt.__zero
def __init__(self, value):
####yada-yada####
This actually works very well. What I don't like is that zero is a method (and thus has to be called with BigInt.zero()) which is counterintuitive since it should just refer to a fixed value.
So I tried changing zero to become a property, but then writing BigInt.zero returns an instance of the class property instead of BigInt because of the decorator used. That instance cannot be used for calculations because of the wrong type.
Is there a way around this issue?
A static property...? We call a static property an "attribute". This is not Java, Python is a dynamically typed language and such a construct would be really overcomplicating matters.
Just do this, setting a class attribute:
class BigInt:
def __init__(self, value):
...
BigInt.zero = BigInt("0")
If you want it to be entirely defined within the class, do it using a decorator (but be aware it's just a more fancy way of writing the same thing).
def add_zero(cls):
cls.zero = cls("0")
return cls
#add_zero
class BigInt:
...
The question is contradictory: static and property don't go together in this way. Static attributes in Python are simply ones that are only assigned once, and the language itself includes a very large number of these. (Most strings are interred, all integers < a certain value are pre-constructed, etc. E.g. the string module.). Easiest approach is to statically assign the attributes after construction as wim illustrates:
class Foo:
...
Foo.first = Foo()
...
Or, as he further suggested, using a class decorator to perform the assignments, which is functionally the same as the above. A decorator is, effectively, a function that is given the "decorated" function as an argument, and must return a function to effectively replace the original one. This may be the original function, say, modified with some annotations, or may be an entirely different function. The original (decorated) function may or may not be called as appropriate for the decorator.
def preload(**values):
def inner(cls):
for k, v in values.items():
setattr(cls, k, cls(v))
return cls
return inner
This can then be used dynamically:
#preload(zero=0, one=1)
class Foo:
...
If the purpose is to save some time on common integer values, a defaultdict mapping integers to constructed BigInts could be useful as a form of caching and streamlined construction / singleton storage. (E.g. BigInt.numbers[27])
However, the problem of utilizing #property at the class level intrigued me, so I did some digging. It is entirely possible to make use of "descriptor protocol objects" (which the #property decorator returns) at the class level if you punt the attribute up the object model hierarchy, to the metaclass.
class Foo(type):
#property
def bar(cls):
print("I'm a", cls)
return 27
class Bar(metaclass=Foo):
...
>>> Bar.bar
I'm a <class '__main__.Bar'>
<<< 27
Notably, this attribute is not accessible from instances:
>>> Bar().bar
AttributeError: 'Bar' object has no attribute 'bar'
Hope this helps!
I have a class that has a numpy.ndarray as a member and behaves similar to ndarray by overloading __getitem__ and __getattr__:
class Foo(object):
def __init__(values):
# numpy.ndarray
self._values = values
def __getitem__(self, key):
return self._values[key]
def __getattr__(self, name):
return getattr(self._values, name)
Thus I can use the numpy method like shape, size, ... directly on an object of this class. I can also do things like obj.__add__(1), which will add 1 to obj._values. However, if I try obj + 1 it raises "unsupported operand type(s)". I would like to get the same behaviour for obj + 1 as obj.__add__(1). Is this possible without adding __add__ to Foo?
I can see what you're trying to do here, but it's not going to work the way you think it should. This is a very non-obvious subtlety in Python.
What you're thinking is that what you do obj + 1 Python is actually calling obj.__add__(1) and that, failing to find an __add__ attribute on obj it will fall through to its __getattr__.
But this is not exactly how it works for arithmetic operators, the implementation of which is actually significantly more complicated. In this case if obj does not have an __add__ method, it will attempt to call the right-hand operand's __radd__ (for "right add") method to see if 1 knows how to add with the left-hand operator. It does not so you get an exception.
There are other subtleties involving type slots that I won't get into.
If you want your class to act as a proxy for an ndarray you have a few options. It really depends on what you're actually trying to accomplish, which you might consider asking in a separate question. You might just be able to subclass ndarray directly and implement your additional functionality in the subclass.
If you don't want a subclass of ndarray you might also consider using a proxy, such as the ObjectProxy from wrapt. This may or may not be what you want. It will make an object that walks, talks, quacks like, and is even named ndarray, though you can still subclass ObjectProxy to override methods that you don't want proxied.
Otherwise there's the tedious manual method.
I have a class which is essentially a collection/list of things. But I want to add some extra functions to this list. What I would like, is the following:
I have an instance li = MyFancyList(). Variable li should behave as it was a list whenever I use it as a list: [e for e in li], li.expand(...), for e in li.
Plus it should have some special functions like li.fancyPrint(), li.getAMetric(), li.getName().
I currently use the following approach:
class MyFancyList:
def __iter__(self):
return self.li
def fancyFunc(self):
# do something fancy
This is ok for usage as iterator like [e for e in li], but I do not have the full list behavior like li.expand(...).
A first guess is to inherit list into MyFancyList. But is that the recommended pythonic way to do? If yes, what is to consider? If no, what would be a better approach?
If you want only part of the list behavior, use composition (i.e. your instances hold a reference to an actual list) and implement only the methods necessary for the behavior you desire. These methods should delegate the work to the actual list any instance of your class holds a reference to, for example:
def __getitem__(self, item):
return self.li[item] # delegate to li.__getitem__
Implementing __getitem__ alone will give you a surprising amount of features, for example iteration and slicing.
>>> class WrappedList:
... def __init__(self, lst):
... self._lst = lst
... def __getitem__(self, item):
... return self._lst[item]
...
>>> w = WrappedList([1, 2, 3])
>>> for x in w:
... x
...
1
2
3
>>> w[1:]
[2, 3]
If you want the full behavior of a list, inherit from collections.UserList. UserList is a full Python implementation of the list datatype.
So why not inherit from list directly?
One major problem with inheriting directly from list (or any other builtin written in C) is that the code of the builtins may or may not call special methods overridden in classes defined by the user. Here's a relevant excerpt from the pypy docs:
Officially, CPython has no rule at all for when exactly overridden method of subclasses of built-in types get implicitly called or not. As an approximation, these methods are never called by other built-in methods of the same object. For example, an overridden __getitem__ in a subclass of dict will not be called by e.g. the built-in get method.
Another quote, from Luciano Ramalho's Fluent Python, page 351:
Subclassing built-in types like dict or list or str directly is error-
prone because the built-in methods mostly ignore user-defined
overrides. Instead of subclassing the built-ins, derive your classes
from UserDict , UserList and UserString from the collections
module, which are designed to be easily extended.
... and more, page 370+:
Misbehaving built-ins: bug or feature?
The built-in dict , list and str types are essential building blocks of Python itself, so
they must be fast — any performance issues in them would severely impact pretty much
everything else. That’s why CPython adopted the shortcuts that cause their built-in
methods to misbehave by not cooperating with methods overridden by subclasses.
After playing around a bit, the issues with the list builtin seem to be less critical (I tried to break it in Python 3.4 for a while but did not find a really obvious unexpected behavior), but I still wanted to post a demonstration of what can happen in principle, so here's one with a dict and a UserDict:
>>> class MyDict(dict):
... def __setitem__(self, key, value):
... super().__setitem__(key, [value])
...
>>> d = MyDict(a=1)
>>> d
{'a': 1}
>>> class MyUserDict(UserDict):
... def __setitem__(self, key, value):
... super().__setitem__(key, [value])
...
>>> m = MyUserDict(a=1)
>>> m
{'a': [1]}
As you can see, the __init__ method from dict ignored the overridden __setitem__ method, while the __init__ method from our UserDict did not.
The simplest solution here is to inherit from list class:
class MyFancyList(list):
def fancyFunc(self):
# do something fancy
You can then use MyFancyList type as a list, and use its specific methods.
Inheritance introduces a strong coupling between your object and list. The approach you implement is basically a proxy object.
The way to use heavily depends of the way you will use the object. If it have to be a list, then inheritance is probably a good choice.
EDIT: as pointed out by #acdr, some methods returning list copy should be overriden in order to return a MyFancyList instead a list.
A simple way to implement that:
class MyFancyList(list):
def fancyFunc(self):
# do something fancy
def __add__(self, *args, **kwargs):
return MyFancyList(super().__add__(*args, **kwargs))
If you don't want to redefine every method of list, I suggest you the following approach:
class MyList:
def __init__(self, list_):
self.li = list_
def __getattr__(self, method):
return getattr(self.li, method)
This would make methods like append, extend and so on, work out of the box. Beware, however, that magic methods (e.g. __len__, __getitem__ etc.) are not going to work in this case, so you should at least redeclare them like this:
class MyList:
def __init__(self, list_):
self.li = list_
def __getattr__(self, method):
return getattr(self.li, method)
def __len__(self):
return len(self.li)
def __getitem__(self, item):
return self.li[item]
def fancyPrint(self):
# do whatever you want...
Please note, that in this case if you want to override a method of list (extend, for instance), you can just declare your own so that the call won't pass through the __getattr__ method. For instance:
class MyList:
def __init__(self, list_):
self.li = list_
def __getattr__(self, method):
return getattr(self.li, method)
def __len__(self):
return len(self.li)
def __getitem__(self, item):
return self.li[item]
def fancyPrint(self):
# do whatever you want...
def extend(self, list_):
# your own version of extend
Based on the two example methods you included in your post (fancyPrint, findAMetric), it doesn't seem that you need to store any extra state in your lists. If this is the case, you're best off simple declaring these as free functions and ignoring subtyping altogether; this completely avoids problems like list vs UserList, fragile edge cases like return types for __add__, unexpected Liskov issues, &c. Instead, you can write your functions, write your unit tests for their output, and rest assured that everything will work exactly as intended.
As an added benefit, this means your functions will work with any iterable types (such as generator expressions) without any extra effort.