Python class inheritance call order - python

There is a famous Python example
class A(object):
def go(self):
print("go A go!")
class B(A):
def go(self):
super(B, self).go()
print("go B go!")
class C(A):
def go(self):
super(C, self).go()
print("go C go!")
class D(B,C):
def go(self):
super(D, self).go()
print("go D go!")
d = D()
d.go()
#go A go!
#go C go!
#go B go!
#go D go!
I have several questions. The first one is B calls A and C calls A so I expect A to appear twice. The second question is about the order.

Since Python 2.3, method resolution has used an algorithm called C3 Linearization (borrowed from Dylan). Wikipedia has a nice article on it.
As the name implies, the idea is to force the method resolution graph to be a straight line, even if the inheritance graph isn't. Which means A is not going to appear twice, by design.
Why? Well, for one thing, it completely avoids the "diamond problem" that plagues multiple inheritance in many other languages. (Or maybe it's more accurate to say that many other languages either ban MI or restrict it to pure "interfaces" because they didn't have a solution to the problem as it exists in C++.)
The original Python explanation—including the motivation behind it—is available in The Python 2.3 Method Resolution Order. This is a bit technical, but worth reading if you're interested.
You might also want to read the original Dylan paper, which goes into more detail about what's wrong with non-linear MRO graphs, and the challenges of coming up with a linearization that's monotonic (i.e., it goes in the order you expect, or at least the order you expect once you get over the fact that it's linear), and so on.
And if you want a deeper understanding of how type() works under the covers, or just want to see what's changed between 2.3 and 3.7 (e.g., the way __mro__ gets created and updated—although magic 3.x super is elsewhere), there's really no better place than the CPython source.

The class super does not just recover the superclass. It instantiate an object which recovers methods in the context of a given method resolution order. Every class has a mro that you can access through the __mro__ attribute.
D.__mro__ # (D, B, C, A, object)
So when given a class and an instance, super first recovers the mro from that instance. When you try to recover an attribute from the super object, it returns it from the first class following the provided class that has such an attribute.
If you were to implement the behaviour of super in Python, it would look something like this.
class super:
def __init__(self, cls, instance):
if not isinstance(cls, type):
raise TypeError('super() argument 1 must be type')
if isinstance(instance, cls):
self.mro = type(instance).__mro__
elif isinstance(instance, type) and issubclass(instance, cls):
self.mro = instance.__mro__
else:
raise TypeError('super(type, obj): obj must be an instance or subtype of type')
self.cls = cls
self.instance = instance
def __getattr__(self, attr):
cls_index = self.mro.index(self.cls)
for supercls in self.mro[cls_index + 1:]:
if hasattr(supercls, attr): break
# The actual implementation binds instances to methods before returning
return getattr(supercls, attr)
So back to your example, when you call super(B, self).go, it recovers the __mro__ of self, which is of type D. It then picks go from the first class following B in the mro that has such an attribute.
So in this case since self.__mro__ is (D, B, C, A, object), the first class following B that has the attribute go is C and not A.
If you want details on how Python determines the mro, then I suggest abarnert's answer.

Related

Better inheritance when calling grandparent method

In a python program, I implemented some class A, and a subclass B which extends some of A's methods and implements new ones. So it's something that looks like this.
class A:
def method(self):
print("This is A's method.")
class B(A):
def method(self):
super().method()
print("This is B's method.")
def another_method(self):
pass
Later, I wanted to use an object which would have access to all of B's methods, except for a small change in method: this object would have to first call A.method, and then do other things, different from what I added in B.method. I thought it was natural to introduce a class C which would inherit from B but modify method, so I defined C like this.
class C(B):
def method(self):
super(B, self).method()
print("This is C's method.")
Everything seems to work as I expect. However, I stumbled across this question and this one, which both address similar problems as what I described here. In both posts, someone quickly added a comment to say that calling a grandparent method in a child when the parent has overridden this method is a sign that there is something wrong with the inheritance design.
How should I have coded this? What would be a better design? Of course this is a toy example and I guess the answer might depend on the actual classes I defined. To give a more complete picture, let's say the method method from the example represents a single method in my actual program, but the method another_method represents many different methods.

Multiple inheritance: overridden methods containing super()-calls

With the file super5.py:
class A:
def m(self):
print("m of A called")
class B(A):
def m(self):
print("m of B called")
super().m()
class C(A):
def m(self):
print("m of C called")
super().m()
class D(B,C):
def m(self):
print("m of D called")
super().m()
we can do the following:
>>> from super5 import D
>>> x = D()
>>> x.m()
m of D called
m of B called
m of C called
m of A called
To me, this doesn't make sense, because when I execute x.m(), I expect the following to happen:
The first line of m of D is executed and thus "m of D called" is output.
The second line, super().m() is executed, which first takes us to m of B.
In m of B, "m of B called" is first output, and then, m of A is executed due to the super.m() call in m of B, and "m of A called" is output.
m of C is executed in a fashion analogous to 3.
As you can see, what I expect to see is:
m of D called
m of B called
m of A called
m of C called
m of A called
Why am I wrong? Is python somehow keeping track of the number of super() calls to a particular superclass and limiting the execution to 1?
No, Python keep a track of all super classes in a special __mro__ attribute (Method Resolution Order in new-style classes):
print(D.__mro__)
You get:
(<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>, <class 'object'>)
So, when you call super, it follow this list in order.
See this question: What does mro() do?.
Everything is explained in the official document in the chapter "Multiple Inheritance".
For most purposes, in the simplest cases, you can think of the search for attributes inherited from a parent class as depth-first, left-to-right, not searching twice in the same class where there is an overlap in the hierarchy. Thus, if an attribute is not found in DerivedClassName, it is searched for in Base1, then (recursively) in the base classes of Base1, and if it was not found there, it was searched for in Base2, and so on.
In fact, it is slightly more complex than that; the method resolution order changes dynamically to support cooperative calls to super(). This approach is known in some other multiple-inheritance languages as call-next-method and is more powerful than the super call found in single-inheritance languages.
Dynamic ordering is necessary because all cases of multiple inheritance exhibit one or more diamond relationships (where at least one of the parent classes can be accessed through multiple paths from the bottommost class). For example, all classes inherit from object, so any case of multiple inheritance provides more than one path to reach object. To keep the base classes from being accessed more than once, the dynamic algorithm linearizes the search order in a way that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning that a class can be subclassed without affecting the precedence order of its parents). Taken together, these properties make it possible to design reliable and extensible classes with multiple inheritance.

Python Parent/Child class method call

Python 2.7.6 on Linux.
I'm using a test class that inherits from a parent. The parent class holds a number of fields that are common to many child classes, and I need to call the parent setUp method to initialize the fields. Is calling ParentClass.setUp(self) the correct way to do this? Here's a simple example:
class RESTTest(unittest.TestCase):
def setUp(self):
self.host = host
self.port = port
self.protocol = protocol
self.context = context
class HistoryTest(RESTTest):
def setUp(self):
RESTTest.setUp(self)
self.endpoint = history_endpoint
self.url = "%s://%s:%s/%s/%s" %(self.protocol, self.host, self.port, self.context, self.endpoint)
def testMe(self):
self.assertTrue(True)
if __name__ == '__main__':
unittest.main()
Is this correct? It seems to work.
You would use super for that.
super(ChildClass, self).method(args)
class HistoryTest(RESTTest):
def setUp(self):
super(HistoryTest, self).method(args)
...
In Python 3 you may write:
class HistoryTest(RESTTest):
def setUp(self):
super().method(args)
...
which is simpler.
See this answer:
super() lets you avoid referring to the base class explicitly, which can be nice. But the main advantage comes with multiple inheritance, where all sorts of fun stuff can happen. See the standard docs on super if you haven't already.
Multiple inheritance
To (try to) answer the question in your comment:
How do you specify which super method you want to call?
From what I understand of the philosophy of multiple inheritance (in Python), you don't. I mean, super, along with the Method Resolution Order (MRO) should do things right and select the appropriate methods. (Yes methods is a plural, see below.)
There are a lot of blog posts / SO answers about this you can find with keywords "multiple inheritance", "diamond", "MRO", "super", etc. This article provides a Python 3 example I found surprising and didn't find in other sources:
class A:
def m(self):
print("m of A called")
class B(A):
def m(self):
print("m of B called")
super().m()
class C(A):
def m(self):
print("m of C called")
super().m()
class D(B,C):
def m(self):
print("m of D called")
super().m()
D().m()
m of D called
m of B called
m of C called
m of A called
See? Both B.m() and C.m() are called thanks to super, which seems like the right thing to do considering D inherits from both B and C.
I suggest you play with this example like I just did. Adding a few prints, you'll see that, when calling D().m(), the super().m() statement in class B itself calls C.m(). Whereas, of course, if you call B().m() (B instance, not D instance), only A.m() is called. In other words, super().m() in B is aware of the class of the instance it is dealing with and behaves accordingly.
Using super everywhere sounds like the silver bullet, but you need to make sure all classes in the inheritance schema are cooperative (another keyword to dig for) and don't break the chain, for instance when expecting additional parameters in child classes.

How does Python's super() actually work, in the general case?

There are a lot of great resources on super(), including this great blog post that pops up a lot, as well as many questions on Stack Overflow. However I feel like they all stop short of explaining how it works in the most general case (with arbitrary inheritance graphs), as well as what is going on under the hood.
Consider this basic example of diamond inheritance:
class A(object):
def foo(self):
print 'A foo'
class B(A):
def foo(self):
print 'B foo before'
super(B, self).foo()
print 'B foo after'
class C(A):
def foo(self):
print 'C foo before'
super(C, self).foo()
print 'C foo after'
class D(B, C):
def foo(self):
print 'D foo before'
super(D, self).foo()
print 'D foo after'
If you read up on Python's rules for method resolution order from sources like this or look up the wikipedia page for C3 linearization, you will see that the MRO must be (D, B, C, A, object). This is of course confirmed by D.__mro__:
(<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <type 'object'>)
And
d = D()
d.foo()
prints
D foo before
B foo before
C foo before
A foo
C foo after
B foo after
D foo after
which matches the MRO. However, consider that above super(B, self).foo() in B actually calls C.foo, whereas in b = B(); b.foo() it would simply go straight to A.foo. Clearly using super(B, self).foo() is not simply a shortcut for A.foo(self) as is sometimes taught.
super() is then obviously aware of the previous calls before it and the overall MRO the chain is trying to follow. I can see two ways this might be accomplished. The first is to do something like passing the super object itself as the self argument to the next method in the chain, which would act like the original object but also contain this information. However this also seems like it would break a lot of things (super(D, d) is d is false) and by doing a little experimenting I can see this isn't the case.
The other option is to have some sort of global context that stores the MRO and the current position in it. I imagine the algorithm for super goes something like:
Is there currently a context we are working in? If not, create one which contains a queue. Get the MRO for the class argument, push all elements except for the first into the queue.
Pop the next element from the current context's MRO queue, use it as the current class when constructing the super instance.
When a method is accessed from the super instance, look it up in the current class and call it using the same context.
However, this doesn't account for weird things like using a different base class as the first argument to a call to super, or even calling a different method on it. I would like to know the general algorithm for this. Also, if this context exists somewhere, can I inspect it? Can I muck with it? Terrible idea of course, but Python typically expects you to be a mature adult even if you're not.
This also introduces a lot of design considerations. If I wrote B thinking only of its relation to A, then later someone else writes C and a third person writes D, my B.foo() method has to call super in a way that is compatible with C.foo() even though it didn't exist at the time I wrote it! If I want my class to be easily extensible I will need to account for this, but I am not sure if it is more complicated than simply making sure all versions of foo have identical signatures. There is also the question of when to put code before or after the call to super, even if it does not make any difference considering B's base classes only.
super() is then obviously aware of the previous calls before it
It's not. When you do super(B, self).foo, super knows the MRO because that's just type(self).__mro__, and it knows that it should start looking for foo at the point in the MRO immediately after B. A rough pure-Python equivalent would be
class super(object):
def __init__(self, klass, obj):
self.klass = klass
self.obj = obj
def __getattr__(self, attrname):
classes = iter(type(self.obj).__mro__)
# search the MRO to find self.klass
for klass in classes:
if klass is self.klass:
break
# start searching for attrname at the next class after self.klass
for klass in classes:
if attrname in klass.__dict__:
attr = klass.__dict__[attrname]
break
else:
raise AttributeError
# handle methods and other descriptors
try:
return attr.__get__(self.obj, type(self.obj))
except AttributeError:
return attr
If I wrote B thinking only of its relation to A, then later someone else writes C and a third person writes D, my B.foo() method has to call super in a way that is compatible with C.foo() even though it didn't exist at the time I wrote it!
There's no expectation that you should be able to multiple-inherit from arbitrary classes. Unless foo is specifically designed to be overloaded by sibling classes in a multiple-inheritance situation, D should not exist.

Not inherit some (chosen) methods from derived classes

Say that, for example, I have two (python (3.3)) classes a and b with their own methods:
class a:
def m1(self):
print("Hi 1")
def m2(self):
print("Hi 2")
##...other methods...
class b(a):
def k1(self):
print("Other hi")
How do I make it so that class b inherits all methods from a except (for example) m2? (besides copy/paste, that doesn't count.) So the expression a.m2() would be legitimate, but b.m2() would throw an AttributeError.
You can get the effect that you want by making 'a' and 'b' siblings rather than parent and child. This might work for you:
class p:
def m1(self):
print("Hi 1")
class a(p):
def m2(self):
print("Hi 2")
class b(a):
def k1(self):
print("Other hi")
So these methods are now all valid, the others will throw AttributeErrors:
a.m1()
a.m2()
b.m1()
b.k1()
Why would you want to do that? The whole point of class inheritance is to be able to test that instances of b are also instances of a; isinstance(b(), a) is True for a reason. By removing methods from b you are breaking that model badly.
Instead make a have fewer methods, and add c to have those that b doesn't need:
class a:
def m1(self):
print("Hi 1")
##...other methods...
class b(a):
def k1(self):
print("Other hi")
class c(a):
def m2(self):
print("Hi 2")
Or, you could not inherit from a and just copy methods from a as needed:
class b:
# copied methods
m1 = a.m1
def k1(self):
print("Other hi")
Now b isa a is no longer true, the expectation that all of a's methods are implemented won't be there anymore.
If a is entirely out of your control and there are too many methods to copy, perhaps use proxying with __getattr__ and passing through anything but m2. A last ditch method could be to implement m2 and raise AttributeError, but that should be a last resort only.
If b inherits from a, it's supposed to inherit every method. That's inheritance about. But if it's necessary to do this (note that is not recommended) you could override the m2() method, so it raises an Exception when called.
class b(a):
def m2(self):
raise Exception("...")
That does not really make sense. The point of inheritance is so that the inheriting object is exactly compatible to the base type. This is the Liskov substitution principle: Wherever an object of the original type is acceptable, an object of the derived type also will be.
If you change your derived type to not have some members of the base type, then you are violating this principle. With the dynamic typing in Python, it wouldn’t be that much of a problem but it’s still violating the idea behind it.
So you really just shouldn’t do that.

Categories

Resources