How does Python's super() actually work, in the general case?

How does Python's super() actually work, in the general case? - python

There are a lot of great resources on super(), including this great blog post that pops up a lot, as well as many questions on Stack Overflow. However I feel like they all stop short of explaining how it works in the most general case (with arbitrary inheritance graphs), as well as what is going on under the hood.
Consider this basic example of diamond inheritance:
class A(object):
def foo(self):
print 'A foo'
class B(A):
def foo(self):
print 'B foo before'
super(B, self).foo()
print 'B foo after'
class C(A):
def foo(self):
print 'C foo before'
super(C, self).foo()
print 'C foo after'
class D(B, C):
def foo(self):
print 'D foo before'
super(D, self).foo()
print 'D foo after'
If you read up on Python's rules for method resolution order from sources like this or look up the wikipedia page for C3 linearization, you will see that the MRO must be (D, B, C, A, object). This is of course confirmed by D.__mro__:
(<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <type 'object'>)
And
d = D()
d.foo()
prints
D foo before
B foo before
C foo before
A foo
C foo after
B foo after
D foo after
which matches the MRO. However, consider that above super(B, self).foo() in B actually calls C.foo, whereas in b = B(); b.foo() it would simply go straight to A.foo. Clearly using super(B, self).foo() is not simply a shortcut for A.foo(self) as is sometimes taught.
super() is then obviously aware of the previous calls before it and the overall MRO the chain is trying to follow. I can see two ways this might be accomplished. The first is to do something like passing the super object itself as the self argument to the next method in the chain, which would act like the original object but also contain this information. However this also seems like it would break a lot of things (super(D, d) is d is false) and by doing a little experimenting I can see this isn't the case.
The other option is to have some sort of global context that stores the MRO and the current position in it. I imagine the algorithm for super goes something like:
Is there currently a context we are working in? If not, create one which contains a queue. Get the MRO for the class argument, push all elements except for the first into the queue.
Pop the next element from the current context's MRO queue, use it as the current class when constructing the super instance.
When a method is accessed from the super instance, look it up in the current class and call it using the same context.
However, this doesn't account for weird things like using a different base class as the first argument to a call to super, or even calling a different method on it. I would like to know the general algorithm for this. Also, if this context exists somewhere, can I inspect it? Can I muck with it? Terrible idea of course, but Python typically expects you to be a mature adult even if you're not.
This also introduces a lot of design considerations. If I wrote B thinking only of its relation to A, then later someone else writes C and a third person writes D, my B.foo() method has to call super in a way that is compatible with C.foo() even though it didn't exist at the time I wrote it! If I want my class to be easily extensible I will need to account for this, but I am not sure if it is more complicated than simply making sure all versions of foo have identical signatures. There is also the question of when to put code before or after the call to super, even if it does not make any difference considering B's base classes only.

super() is then obviously aware of the previous calls before it
It's not. When you do super(B, self).foo, super knows the MRO because that's just type(self).__mro__, and it knows that it should start looking for foo at the point in the MRO immediately after B. A rough pure-Python equivalent would be
class super(object):
def __init__(self, klass, obj):
self.klass = klass
self.obj = obj
def __getattr__(self, attrname):
classes = iter(type(self.obj).__mro__)
# search the MRO to find self.klass
for klass in classes:
if klass is self.klass:
break
# start searching for attrname at the next class after self.klass
for klass in classes:
if attrname in klass.__dict__:
attr = klass.__dict__[attrname]
break
else:
raise AttributeError
# handle methods and other descriptors
try:
return attr.__get__(self.obj, type(self.obj))
except AttributeError:
return attr
If I wrote B thinking only of its relation to A, then later someone else writes C and a third person writes D, my B.foo() method has to call super in a way that is compatible with C.foo() even though it didn't exist at the time I wrote it!
There's no expectation that you should be able to multiple-inherit from arbitrary classes. Unless foo is specifically designed to be overloaded by sibling classes in a multiple-inheritance situation, D should not exist.

Related

Can a dynamically added function access the owner object in python?

I'm making a program in python in which specific instances of an object must be decorated with new functions built at runtime.
I've seen very simple examples of adding functions to objects through MethodType:
import types
def foo():
print("foo")
class A:
bar = "bar"
a = A()
a.foo = types.MethodType(foo, a)
But none of the examples I've seen show how a function added in this manner can reference to the new owner's attributes. As far as I know, even though this binds the foo() function to the instance a, foo() must still be a pure function, and cannot contain references to anything local.
In my case, I need functions to change attributes of the object they are added to. Here are two examples of the kind of thing I need to be able to do:
class A:
foo = "foo"
def printme():
print(foo)
def nofoo():
foo = "bar"
def printBar():
if foo != "foo"
self.printme()
I would then need a way to add a copy of a nofoo() or printBar() to an A object in such a way that they can access the object attributes named foo and the function named printme() correctly.
So, is this possible? Is there a way to do this kind of programming in vanilla Python? or at least Is there a programming pattern that achieves this kind of behavior?
P.S.: In my system, I also add attributes dynamically to objects. Your first thought then might be "How can I ever be sure that the object I'm adding the nofoo() function to actually has an attribute named foo?", but I also have a fairly robust tag system that makes sure that I never try to add a nofoo() function to an object that hasn't a foo variable. The reason I mention this is that solutions that look at the class definition aren't very useful to me.

As said in the comments, your function actually must take at least one parameter: self, the instance the method is being called on. The self parameter can be used as it would be used in a normal instance method. Here is an example:
>>> from types import MethodType
>>>
>>> class Class:
def method(self):
print('method run')
>>> cls = Class()
>>>
>>> def func(self): # must accept one argument, `self`
self.method()
>>> cls.func = MethodType(func, cls)
>>> cls.func()
method run
>>>
Without your function accepting self, an exception would be raised:
>>> def func():
self.method()
>>> cls.func = MethodType(func, cls)
>>> cls.func()
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
cls.func()
TypeError: func() takes 0 positional arguments but 1 was given
>>>

class A:
def __init__(self):
self.foo = "foo"
def printme(self):
print(self.foo)
def nofoo(self):
self.foo = "bar"
a.nofoo = types.MethodType(nofoo, a)
a.nofoo()
a.printme()
prints
bar

It's not entirely clear what you're trying to do, and I'm worried that whatever it is may be a bad idea. However, I can explain how to do what you're asking, even if it isn't what you want, or should want. I'll point out that it's very uncommon to want to do the second version below, and even rarer to want to do the third version, but Python does allow them both, because "even rarer than very uncommon" still isn't "never". And, in the same spirit…
The short answer is "yes". A dynamically-added method can access the owner object exactly the same way a normal method can.
First, here's a normal, non-dynamic method:
class C:
def meth(self):
return self.x
c = C()
c.x = 3
c.meth()
Obviously, with a normal method like this, when you call c.meth(), the c ends up as the value of the self parameter, so self.x is c.x, which is 3.
Now, here's how you dynamically add a method to a class:
class C:
pass
c = C()
c.x = 3
def meth(self):
print(self.x)
C.meth = meth
c.meth()
This is actually doing exactly the same thing. (Well, we've left another name for the same function object sitting around in globals, but that's the only difference) If C.meth is the same function it was in the first version, then obviously whatever magic made c.meth() work in the first version will do the exact same thing here.
(This used to be slightly more complicated in Python 2, because of unbound methods, and classic classes too… but fortunately you don't have to worry about that.)
Finally, here's how you dynamically add a method to an instance:
class C:
pass
c = C()
c.x = 3
def meth(self):
print(self.x)
c.meth = types.MethodType(meth, c)
c.meth()
Here, you actually have to know the magic that makes c.meth() work in the first two cases. So read the Descriptor HOWTO. After that, it should be obvious.
But if you just want to pretend that Guido is a wizard (Raymond definitely is a wizard) and it's magic… Well, in the first two versions, Guido's magic wand creates a special bound method object whenever you ask for c.meth, but even he isn't magical enough to do that when C.meth doesn't exist. But we can painstakingly create that same bound method object and store it as c.meth. After that, we're going to get the same thing we stored whenever we ask for c.meth, which we explicitly built as the same thing we got in the first two examples, so it'll obviously do the same thing.
But what if we did this:
class C:
pass
c = C()
c.x = 3
def meth(self):
print(self.x)
c.meth = meth
c.meth(c)
Here, you're not letting Guido do his descriptor magic to create c.meth, and you're not doing it manually, you're just sticking a regular function there. Which means if you want anything to show up as the self parameter, you have to explicitly pass it as an argument, as in that silly c.meth(c) line at the end. But if you're willing to do that, then even this one works. No matter how self ends up as c, self.x is going to be c.x.

Python class inheritance call order

There is a famous Python example
class A(object):
def go(self):
print("go A go!")
class B(A):
def go(self):
super(B, self).go()
print("go B go!")
class C(A):
def go(self):
super(C, self).go()
print("go C go!")
class D(B,C):
def go(self):
super(D, self).go()
print("go D go!")
d = D()
d.go()
#go A go!
#go C go!
#go B go!
#go D go!
I have several questions. The first one is B calls A and C calls A so I expect A to appear twice. The second question is about the order.

Since Python 2.3, method resolution has used an algorithm called C3 Linearization (borrowed from Dylan). Wikipedia has a nice article on it.
As the name implies, the idea is to force the method resolution graph to be a straight line, even if the inheritance graph isn't. Which means A is not going to appear twice, by design.
Why? Well, for one thing, it completely avoids the "diamond problem" that plagues multiple inheritance in many other languages. (Or maybe it's more accurate to say that many other languages either ban MI or restrict it to pure "interfaces" because they didn't have a solution to the problem as it exists in C++.)
The original Python explanation—including the motivation behind it—is available in The Python 2.3 Method Resolution Order. This is a bit technical, but worth reading if you're interested.
You might also want to read the original Dylan paper, which goes into more detail about what's wrong with non-linear MRO graphs, and the challenges of coming up with a linearization that's monotonic (i.e., it goes in the order you expect, or at least the order you expect once you get over the fact that it's linear), and so on.
And if you want a deeper understanding of how type() works under the covers, or just want to see what's changed between 2.3 and 3.7 (e.g., the way __mro__ gets created and updated—although magic 3.x super is elsewhere), there's really no better place than the CPython source.

The class super does not just recover the superclass. It instantiate an object which recovers methods in the context of a given method resolution order. Every class has a mro that you can access through the __mro__ attribute.
D.__mro__ # (D, B, C, A, object)
So when given a class and an instance, super first recovers the mro from that instance. When you try to recover an attribute from the super object, it returns it from the first class following the provided class that has such an attribute.
If you were to implement the behaviour of super in Python, it would look something like this.
class super:
def __init__(self, cls, instance):
if not isinstance(cls, type):
raise TypeError('super() argument 1 must be type')
if isinstance(instance, cls):
self.mro = type(instance).__mro__
elif isinstance(instance, type) and issubclass(instance, cls):
self.mro = instance.__mro__
else:
raise TypeError('super(type, obj): obj must be an instance or subtype of type')
self.cls = cls
self.instance = instance
def __getattr__(self, attr):
cls_index = self.mro.index(self.cls)
for supercls in self.mro[cls_index + 1:]:
if hasattr(supercls, attr): break
# The actual implementation binds instances to methods before returning
return getattr(supercls, attr)
So back to your example, when you call super(B, self).go, it recovers the __mro__ of self, which is of type D. It then picks go from the first class following B in the mro that has such an attribute.
So in this case since self.__mro__ is (D, B, C, A, object), the first class following B that has the attribute go is C and not A.
If you want details on how Python determines the mro, then I suggest abarnert's answer.

Multiple inheritance: overridden methods containing super()-calls

With the file super5.py:
class A:
def m(self):
print("m of A called")
class B(A):
def m(self):
print("m of B called")
super().m()
class C(A):
def m(self):
print("m of C called")
super().m()
class D(B,C):
def m(self):
print("m of D called")
super().m()
we can do the following:
>>> from super5 import D
>>> x = D()
>>> x.m()
m of D called
m of B called
m of C called
m of A called
To me, this doesn't make sense, because when I execute x.m(), I expect the following to happen:
The first line of m of D is executed and thus "m of D called" is output.
The second line, super().m() is executed, which first takes us to m of B.
In m of B, "m of B called" is first output, and then, m of A is executed due to the super.m() call in m of B, and "m of A called" is output.
m of C is executed in a fashion analogous to 3.
As you can see, what I expect to see is:
m of D called
m of B called
m of A called
m of C called
m of A called
Why am I wrong? Is python somehow keeping track of the number of super() calls to a particular superclass and limiting the execution to 1?

No, Python keep a track of all super classes in a special __mro__ attribute (Method Resolution Order in new-style classes):
print(D.__mro__)
You get:
(<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>, <class 'object'>)
So, when you call super, it follow this list in order.
See this question: What does mro() do?.
Everything is explained in the official document in the chapter "Multiple Inheritance".
For most purposes, in the simplest cases, you can think of the search for attributes inherited from a parent class as depth-first, left-to-right, not searching twice in the same class where there is an overlap in the hierarchy. Thus, if an attribute is not found in DerivedClassName, it is searched for in Base1, then (recursively) in the base classes of Base1, and if it was not found there, it was searched for in Base2, and so on.
In fact, it is slightly more complex than that; the method resolution order changes dynamically to support cooperative calls to super(). This approach is known in some other multiple-inheritance languages as call-next-method and is more powerful than the super call found in single-inheritance languages.
Dynamic ordering is necessary because all cases of multiple inheritance exhibit one or more diamond relationships (where at least one of the parent classes can be accessed through multiple paths from the bottommost class). For example, all classes inherit from object, so any case of multiple inheritance provides more than one path to reach object. To keep the base classes from being accessed more than once, the dynamic algorithm linearizes the search order in a way that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning that a class can be subclassed without affecting the precedence order of its parents). Taken together, these properties make it possible to design reliable and extensible classes with multiple inheritance.

Understanding python's super method, Why D().test() will return 'B->C' and not 'B-->A'

I have looked at other question here regarding python's super() method but I am still finding it difficult to understand the whole concept.
I am also looking at the example in the book pro python
The example referenced there is
class A(object):
def test(self):
return 'A'
class B(A):
def test(self):
return 'B-->' + super(B, self).test()
class C(A):
def test(self):
return 'C'
class D(B, C):
pass
>>> A().test()
'A'
>>> B().test()
'B-->A'
>>> C().test()
'C'
>>> D().test()
'B-->C'
>>> A.__mro__
(__main__.A, object)
>>> B.__mro__
(__main__.B, __main__.A, object)
>>> C.__mro__
(__main__.C, __main__.A, object)
>>> D.__mro__
(__main__.D, __main__.B, __main__.C, __main__.A, object)
Why doing D().test() we get the output as 'B-->C' instead of 'B-->A'
The explanation in the book is
In the most common case, which includes the usage shown here, super() takes two arguments: a
class and an instance of that class. As our example here has shown, the instance object determines
which MRO will be used to resolve any attributes on the resulting object. The provided class determines
a subset of that MRO, because super() only uses those entries in the MRO that occur after the class
provided.
I still find the explanation a bit difficult to understand. This might be a possible duplicate and questions similar to this has been asked many times, but if I get an understanding of this I might be able to understand the rest of other questions better.
Understanding Python super() with __init__() methods
What does 'super' do in Python?
python, inheritance, super() method
[python]: confused by super()

If you want to know why Python chose this specific MRO algorithm, the discussion is in the mailing list archives, and briefly summarized in The Python 2.3 Method Resolution Order.
But really, it comes down to this: Python 2.2's method resolution was broken when dealing with multiple inheritance, and the first thing anyone suggested to fix it was to borrow the C3 algorithm from Dylan, and nobody had any problem with it or suggested anything better, and therefore Python uses C3.
If you're more interested in the general advantages (and disadvantages) of C3 against other algorithms…
BrenBarn's and florquake's answers give the basics to this question. Python's super() considered super! from Raymond Hettinger's blog is a much longer and more detailed discussion in the same vein, and definitely worth reading.
A Monotonic Superclass Linearlization for Dylan is the original paper describing the design. Of course Dylan is a very different language from Python, and this is an academic paper, but the rationale is still pretty nice.
Finally, The Python 2.3 Method Resolution Order (the same docs linked above) has some discussion on the benefits.
And you'd need to learn a lot about the alternatives, and about how they are and aren't appropriate to Python, to go any farther. Or, if you want deeper information on SO, you'll need to ask more specific questions.
Finally, if you're asking the "how" question:
When you call D().test(), it's obviously calling the code you defined in B's test method. And B.__mro__ is (__main__.B, __main__.A, object). So, how can that super(B, self).test() possibly call C's test method instead of A's?
The key here is that the MRO is based on the type of self, not based on the type B where the test method was defined. If you were to print(type(self)) inside the test functions, you'd see that it's D, not B.
So, super(B, self) actually gets self.__class__.__mro__ (in this case, D.__mro__), finds B in the list, and returns the next thing after it. Pretty simpler.
But that doesn't explain how the MRO works, just what it does. How does D().test() call the method from B, but with a self that's a D?
First, notice that D().test, D.test and B.test are not the same function, because they're not functions at all; they're methods. (I'm assuming Python 2.x here. Things are a little different—mainly simpler—in 3.x.)
A method is basically an object with im_func, im_class, and im_self members. When you call a method, all you're doing is calling its im_func, with its im_self (if not None) crammed in as an extra argument at the start.
So, our three examples all have the same im_func, which actually is the function you defined inside B. But the first two have D rather than B for im_class, and the first also has a D instance instead of None for im_self. So, that's how calling it ends up passing the D instance as self.
So, how does D().test end up with that im_self and im_class? Where does that get created? That's the fun part. For a full description, read the Descriptor HowTo Guide, but briefly:
Whenever you write foo.bar, what actually happens is equivalent to a call to getattr(foo, 'bar'), which does something like this (ignoring instance attributes, __getattr__, __getattribute__, slots, builtins, etc.):
def getattr(obj, name):
for cls in obj.__class__.__mro__:
try:
desc = cls.__dict__[name]
except KeyError:
pass
else:
return desc.get(obj.__class__, obj)
That .get() at the end is the magic bit. If you look at a function—say, B.test.im_func, you'll see that it actually has a get method. And what it does is to create a bound method, with im_func as itself, im_class as the class obj.__class__, and im_self as the object obj.

The short answer is that the method resolution order is roughly "breadth first". That is, it goes through all the base classes at a given level of ancestry before going to any of their superclasses. So if D inherits from B and C, which both inherit from A, the MRO always has B and C before A.
Another way to think about it is that if the order went B->A, then A.test would be called before C.test, even though C is a subclass of A. You generally want a subclass implementation to be invoked before the superclass one (because the subclass one might want to totally override the superclass and not invoke it at all).
A longer explanation can be found here. You can also find useful information by googling or searching Stackoverflow for question about "Python method resolution order" or "Python MRO".

super() is basically how you tell Python "Do what this object's other classes say."
When each of your classes has only one parent (single inheritance), super() will simply refer you to the parent class. (I guess you've already understood this part.)
But when you use multiple base classes, as you did in your example, things start to get a little more complicated. In this case, Python ensures that if you call super() everywhere, every class's method gets called.
A (somewhat nonsensical) example:
class Animal(object):
def make_sounds(self):
pass
class Duck(Animal):
def make_sounds(self):
print 'quack!'
super(Duck, self).make_sounds()
class Pig(Animal):
def make_sounds(self):
print 'oink!'
super(Pig, self).make_sounds()
# Let's try crossing some animals
class DuckPig(Duck, Pig):
pass
my_duck_pig = DuckPig()
my_duck_pig.make_sounds()
# quack!
# oink!
You would want your DuckPig to say quack! and oink!, after all, it's a pig and a duck, right? Well, that's what super() is for.

Python: Calling subclass's method within super method call

EDIT: My assumptions were wrong; the code below does in fact work like I wanted it to. The behaviour I was observing turned out to be caused by another bit of the code I was working with that I overlooked because it looked completely unrelated.
So I have some code structured like this:
class B(object):
def foo(self):
print "spam"
def bar(self):
do_something()
self.foo()
do_something_else()
class C(B):
def foo(self):
print "ham"
def bar(self):
super(C, self).bar()
print "eggs"
c = C()
c.bar()
This would do_something(), print "spam", do_something_else(), and then print "eggs". However, what I want to do is do_something(), print "ham", do_something_else(), and then print "eggs". In other words, I want to call B's bar method from C's bar method, but I want it to call C's foo method, not B's.
Is there a way to do this? Note that in the actual code I'm dealing with, both the B class and the code that actually calls c.bar() are part of an evolving third-party library, so mucking with that would be a last resort.

The code you have posted does what you want.
When B.bar calls self.foo() when self is an object of type C, it will call C.foo.
This works because self.foo is looked up first on self then on self's actual type's method-resolution-order C3-linearised class-and-base-classes tuple (here (C, B, object)). This will return C.foo before B.foo.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does Python's super() actually work, in the general case? - python

Related

Can a dynamically added function access the owner object in python?

Python class inheritance call order

Multiple inheritance: overridden methods containing super()-calls

Understanding python's super method, Why D().test() will return 'B->C' and not 'B-->A'

Python: Calling subclass's method within super method call

Categories

Resources