class Foo(object):
pass
foo = Foo()
def bar(self):
print 'bar'
Foo.bar = bar
foo.bar() #bar
Coming from JavaScript, if a "class" prototype was augmented with a certain attribute. It is known that all instances of that "class" would have that attribute in its prototype chain, hence no modifications has to be done on any of its instances or "sub-classes".
In that sense, how can a Class-based language like Python achieve Monkey patching?
The real question is, how can it not? In Python, classes are first-class objects in their own right. Attribute access on instances of a class is resolved by looking up attributes on the instance, and then the class, and then the parent classes (in the method resolution order.) These lookups are all done at runtime (as is everything in Python.) If you add an attribute to a class after you create an instance, the instance will still "see" the new attribute, simply because nothing prevents it.
In other words, it works because Python doesn't cache attributes (unless your code does), because it doesn't use negative caching or shadowclasses or any of the optimization techniques that would inhibit it (or, when Python implementations do, they take into account the class might change) and because everything is runtime.
I just read through a bunch of documentation, and as far as I can tell, the whole story of how foo.bar is resolved, is as follows:
Can we find foo.__getattribute__ by the following process? If so, use the result of foo.__getattribute__('bar').
(Looking up __getattribute__ will not cause infinite recursion, but the implementation of it might.)
(In reality, we will always find __getattribute__ in new-style objects, as a default implementation is provided in object - but that implementation is of the following process. ;) )
(If we define a __getattribute__ method in Foo, and access foo.__getattribute__, foo.__getattribute__('__getattribute__') will be called! But this does not imply infinite recursion - if you are careful ;) )
Is bar a "special" name for an attribute provided by the Python runtime (e.g. __dict__, __class__, __bases__, __mro__)? If so, use that. (As far as I can tell, __getattribute__ falls into this category, which avoids infinite recursion.)
Is bar in the foo.__dict__ dict? If so, use foo.__dict__['bar'].
Does foo.__mro__ exist (i.e., is foo actually a class)? If so,
For each base-class base in foo.__mro__[1:]:
(Note that the first one will be foo itself, which we already searched.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
For each base-class base of foo.__class__.__mro__:
(Note that this recursion is not a problem: those attributes should always exist, and fall into the "provided by the Python runtime" case. foo.__class__.__mro__[0] will always be foo.__class__, i.e. Foo in our example.)
(Note that we do this even if foo.__mro__ exists. This is because classes have a class, too: its name is type, and it provides, among other things, the method used to calculate __mro__ attributes in the first place.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
If we still haven't found something to use: can we find foo.__getattr__ by the preceding process? If so, use the result of foo.__getattr__('bar').
If everything failed, raise AttributeError.
bar.__get__ is not really a function - it's a "method-wrapper" - but you can imagine it being implemented vaguely like this:
# Somewhere in the Python internals
class __method_wrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, obj, cls):
return lambda *args, **kwargs: func(obj, *args, **kwargs)
# Except it actually returns a "bound method" object
# that uses cls for its __repr__
# and there is a __repr__ for the method_wrapper that I *think*
# uses the hashcode of the underlying function, rather than of itself,
# but I'm not sure.
# Automatically done after compiling bar
bar.__get__ = __method_wrapper(bar)
The "binding" that happens within the __get__ automatically attached to bar (called a descriptor), by the way, is more or less the reason why you have to specify self parameters explicitly for Python methods. In Javascript, this itself is magical; in Python, it is merely the process of binding things to self that is magical. ;)
And yes, you can explicitly set a __get__ method on your own objects and have it do special things when you set a class attribute to an instance of the object and then access it from an instance of that other class. Python is extremely reflective. :) But if you want to learn how to do that, and get a really full understanding of the situation, you have a lot of reading to do. ;)
Related
This question already has answers here:
What is the purpose of the `self` parameter? Why is it needed?
(26 answers)
Closed 6 months ago.
When defining a method on a class in Python, it looks something like this:
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
But in some other languages, such as C#, you have a reference to the object that the method is bound to with the "this" keyword without declaring it as an argument in the method prototype.
Was this an intentional language design decision in Python or are there some implementation details that require the passing of "self" as an argument?
I like to quote Peters' Zen of Python. "Explicit is better than implicit."
In Java and C++, 'this.' can be deduced, except when you have variable names that make it impossible to deduce. So you sometimes need it and sometimes don't.
Python elects to make things like this explicit rather than based on a rule.
Additionally, since nothing is implied or assumed, parts of the implementation are exposed. self.__class__, self.__dict__ and other "internal" structures are available in an obvious way.
It's to minimize the difference between methods and functions. It allows you to easily generate methods in metaclasses, or add methods at runtime to pre-existing classes.
e.g.
>>> class C:
... def foo(self):
... print("Hi!")
...
>>>
>>> def bar(self):
... print("Bork bork bork!")
...
>>>
>>> c = C()
>>> C.bar = bar
>>> c.bar()
Bork bork bork!
>>> c.foo()
Hi!
>>>
It also (as far as I know) makes the implementation of the python runtime easier.
I suggest that one should read Guido van Rossum's blog on this topic - Why explicit self has to stay.
When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '#classmethod' or '#staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.
I reject hacks like special-casing '#classmethod' and '#staticmethod'.
Python doesn't force you on using "self". You can give it whatever name you want. You just have to remember that the first argument in a method definition header is a reference to the object.
Also allows you to do this: (in short, invoking Outer(3).create_inner_class(4)().weird_sum_with_closure_scope(5) will return 12, but will do so in the craziest of ways.
class Outer(object):
def __init__(self, outer_num):
self.outer_num = outer_num
def create_inner_class(outer_self, inner_arg):
class Inner(object):
inner_arg = inner_arg
def weird_sum_with_closure_scope(inner_self, num)
return num + outer_self.outer_num + inner_arg
return Inner
Of course, this is harder to imagine in languages like Java and C#. By making the self reference explicit, you're free to refer to any object by that self reference. Also, such a way of playing with classes at runtime is harder to do in the more static languages - not that's it's necessarily good or bad. It's just that the explicit self allows all this craziness to exist.
Moreover, imagine this: We'd like to customize the behavior of methods (for profiling, or some crazy black magic). This can lead us to think: what if we had a class Method whose behavior we could override or control?
Well here it is:
from functools import partial
class MagicMethod(object):
"""Does black magic when called"""
def __get__(self, obj, obj_type):
# This binds the <other> class instance to the <innocent_self> parameter
# of the method MagicMethod.invoke
return partial(self.invoke, obj)
def invoke(magic_self, innocent_self, *args, **kwargs):
# do black magic here
...
print magic_self, innocent_self, args, kwargs
class InnocentClass(object):
magic_method = MagicMethod()
And now: InnocentClass().magic_method() will act like expected. The method will be bound with the innocent_self parameter to InnocentClass, and with the magic_self to the MagicMethod instance. Weird huh? It's like having 2 keywords this1 and this2 in languages like Java and C#. Magic like this allows frameworks to do stuff that would otherwise be much more verbose.
Again, I don't want to comment on the ethics of this stuff. I just wanted to show things that would be harder to do without an explicit self reference.
I think it has to do with PEP 227:
Names in class scope are not accessible. Names are resolved in the
innermost enclosing function scope. If a class definition occurs in a
chain of nested scopes, the resolution process skips class
definitions. This rule prevents odd interactions between class
attributes and local variable access. If a name binding operation
occurs in a class definition, it creates an attribute on the resulting
class object. To access this variable in a method, or in a function
nested within a method, an attribute reference must be used, either
via self or via the class name.
I think the real reason besides "The Zen of Python" is that Functions are first class citizens in Python.
Which essentially makes them an Object. Now The fundamental issue is if your functions are object as well then, in Object oriented paradigm how would you send messages to Objects when the messages themselves are objects ?
Looks like a chicken egg problem, to reduce this paradox, the only possible way is to either pass a context of execution to methods or detect it. But since python can have nested functions it would be impossible to do so as the context of execution would change for inner functions.
This means the only possible solution is to explicitly pass 'self' (The context of execution).
So i believe it is a implementation problem the Zen came much later.
As explained in self in Python, Demystified
anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit). This is the reason the first parameter of a function in class must be the object itself.
class Point(object):
def __init__(self,x = 0,y = 0):
self.x = x
self.y = y
def distance(self):
"""Find distance from origin"""
return (self.x**2 + self.y**2) ** 0.5
Invocations:
>>> p1 = Point(6,8)
>>> p1.distance()
10.0
init() defines three parameters but we just passed two (6 and 8). Similarly distance() requires one but zero arguments were passed.
Why is Python not complaining about this argument number mismatch?
Generally, when we call a method with some arguments, the corresponding class function is called by placing the method's object before the first argument. So, anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit).
This is the reason the first parameter of a function in class must be the object itself. Writing this parameter as self is merely a convention. It is not a keyword and has no special meaning in Python. We could use other names (like this) but I strongly suggest you not to. Using names other than self is frowned upon by most developers and degrades the readability of the code ("Readability counts").
...
In, the first example self.x is an instance attribute whereas x is a local variable. They are not the same and lie in different namespaces.
Self Is Here To Stay
Many have proposed to make self a keyword in Python, like this in C++ and Java. This would eliminate the redundant use of explicit self from the formal parameter list in methods. While this idea seems promising, it's not going to happen. At least not in the near future. The main reason is backward compatibility. Here is a blog from the creator of Python himself explaining why the explicit self has to stay.
The 'self' parameter keeps the current calling object.
class class_name:
class_variable
def method_name(self,arg):
self.var=arg
obj=class_name()
obj.method_name()
here, the self argument holds the object obj. Hence, the statement self.var denotes obj.var
There is also another very simple answer: according to the zen of python, "explicit is better than implicit".
Lately, I've been studying Python's class instantiation process to really understand what happen under the hood when creating a class instance. But, while playing around with test code, I came across something I don't understand.
Consider this dummy class
class Foo():
def test(self):
print("I'm using test()")
Normally, if I wanted to use Foo.test instance method, I would go and create an instance of Foo and call it explicitly like so,
foo_inst = Foo()
foo_inst.test()
>>>> I'm using test()
But, I found that calling it that way ends up with the same result,
Foo.test(Foo)
>>>> I'm using test()
Here I don't actually create an instance, but I'm still accessing Foo's instance method. Why and how is this working in the context of Python ? I mean self normally refers to the current instance of the class, but I'm not technically creating a class instance in this case.
print(Foo()) #This is a Foo object
>>>><__main__.Foo object at ...>
print(Foo) #This is not
>>>> <class '__main__.Foo'>
Props to everyone that led me there in the comments section.
The answer to this question rely on two fundamentals of Python:
Duck-typing
Everything is an object
Indeed, even if self is Python's idiom to reference the current class instance, you technically can pass whatever object you want because of how Python handle typing.
Now, the other confusion that brought me here is that I wasn't creating an object in my second example. But, the thing is, Foo is already an object internally.
This can be tested empirically like so,
print(type(Foo))
<class 'type'>
So, we now know that Foo is an instance of class type and therefore can be passed as self even though it is not an instance of itself.
Basically, if I were to manipulate self as if it was a Foo object in my test method, I would have problem when calling it like my second example.
A few notes on your question (and answer). First, everything is, really an object. Even a class is an object, so, there is the class of the class (called metaclass) which is type in this case.
Second, more relevant to your case. Methods are, more or less, class, not instance attributes. In python, when you have an object obj, instance of Class, and you access obj.x, python first looks into obj, and then into Class. That's what happens when you access a method from an instance, they are just special class attributes, so they can be access from both instance and class. And, since you are not using any instance attributes of the self that should be passed to test(self) function, the object that is passed is irrelevant.
To understand that in depth, you should read about, descriptor protocol, if you are not familiar with it. It explains a lot about how things work in python. It allows python classes and objects to be essentially dictionaries, with some special attributes (very similar to javascript objects and methods)
Regarding the class instantiation, see about __new__ and metaclasses.
I have a Python class Foo and a more memory-conscious version LightFoo. The information contained in their attributes is ultimately the same but is encoded differently.
A handful of Foo's methods will be completely re-written for LightFoo, but for most of them it will be fine to cast the LightFoo instance to a Foo and call the corresponding Foo method. For example, LightFoo might include:
def total(self):
self.fooize().total()
If Foo has 100 methods, though, this gets really tedious. What would be really convenient is to set up LightFoo to inherit from Foo and have the casting step somehow inserted by default for all methods not found in LightFoo. I'm pretty sure this isn't possible, but it seems like there must be a better approach than writing a block like the one above for each of Foo's methods.
If fooize() is doing some custom processing to convert the object to a Foo, you can do this by just defining __getattr__, which is called when you access an attribute that can't be found. You can call fooize() there.
def __getattr__(self, name):
return getattr(self.fooize(), name)
Otherwise, you can just inherit from Foo and let the nonexistent methods fall back to the superclass.
Recently I faced a problem in a C-based python extension while trying to instantiate objects without calling its constructor -- which is a requirement of the extension.
The class to be used to create instances is obtained dynamically: at some point, I have an instance x whose class I wish to use to create other instances, so I store x.__class__ for later use -- let this value be klass.
At a later point, I invoke PyInstance_NewRaw(klass, PyDict_New()) and then, the problem arises. It seems that if klass is an old-style class, the result of that call is the desired new instance. However, if it is a new-style class, the result is NULL and the exception raised is:
SystemError: ../Objects/classobject.c:521: bad argument to internal function
For the record, I'm using Python version 2.7.5. Googling around, I observed no more than one other person looking for a solution (and it seemed to me he was doing a workaround, but didn't detailed it).
For the record #2: the instances the extension is creating are proxies for these same x instances -- the x.__class__ and x.__dict__'s are known, so the extension is spawning new instances based on __class__ (using the aforementioned C function) and setting the respective __dict__ to the new instance (those __dict__'s have inter-process shared-memory data). Not only is conceptually problematic to call an instance's __init__ a second time (first: it's state is already know, second: the expected behavior for ctors is that they should be called exactly once for each instance), it is also impractical, since the extension cannot figure out the arguments and their order to call the __init__() for each instance in the system. Also, changing the __init__ of each class in the system whose instances may be proxies and making them aware there is a proxy mechanism they will be subjected to is conceptually problematic (they shouldn't know about it) and impractical.
So, my question is: how to perform the same behavior of PyInstance_NewRaw regardless of the instance's class style?
The type of new-style classes isn't instance, it's the class itself. So, the PyInstance_* methods aren't even meaningful for new-style classes.
In fact, the documentation explicitly explains this:
Note that the class objects described here represent old-style classes, which will go away in Python 3. When creating new types for extension modules, you will want to work with type objects (section Type Objects).
So, you will have to write code that checks whether klass is an old-style or new-style class and does the appropriate thing for each case. An old-style class's type is PyClass_Type, while a new-style class's type is either PyType_Type, or a custom metaclass.
Meanwhile, there is no direct equivalent of PyInstance_NewRaw for new-style classes. Or, rather, the direct equivalent—calling its tp_alloc slot and then adding a dict—will give you a non-functional class. You could try to duplicate all the other appropriate work, but that's going to be tricky. Alternatively, you could use tp_new, but that will do the wrong thing if there's a custom __new__ function in the class (or any of its bases). See the rejected patches from #5180 for some ideas.
But really, what you're trying to do is probably not a good idea in the first place. Maybe if you explained why this is a requirement, and what you're trying to do, there would be a better way to do it.
If the goal is to build objects by creating a new uninitialized instance of the class, then copying over its _dict__ from an initialized prototype, there's a much easier solution that I think will work for you:
__class__ is a writeable attribute. So (showing it in Python; the C API is basically the same, just a lot more verbose, and I'd probably screw up the refcounting somewhere):
class NewStyleDummy(object):
pass
def make_instance(cls, instance_dict):
if isinstance(cls, types.ClassType):
obj = do_old_style_thing(cls)
else:
obj = NewStyleDummy()
obj.__class__ = cls
obj.__dict__ = instance_dict
return obj
The new object will be an instance of cls—in particular, it will have the same class dictionary, including the MRO, metaclass, etc.
This won't work if cls has a metaclass that's required for its construction, or a custom __new__ method, or __slots__… but then your design of copying over the __dict__ doesn't make any sense in those cases anyway. I believe that in any case where anything could possibly work, this simple solution will work.
Calling cls.__new__ seems like a good solution at first, but it actually isn't. Let me explain the background.
When you do this:
foo = Foo(1, 2)
(where Foo is a new-style class), it gets converted into something like this pseudocode:
foo = Foo.__new__(1, 2)
if isinstance(foo, Foo):
foo.__init__(1, 2)
The problem is that, if Foo or one of its bases has defined a __new__ method, it will expect to get the arguments from the constructor call, just like an __init__ method will.
As you explained in your question, you don't know the constructor call arguments—in fact, that's the main reason you can't call the normal __init__ method in the first place. So, you can't call __new__ either.
The base implementation of __new__ accepts and ignores any arguments it's given. So, if none of your classes has a __new__ override or a __metaclass__, you will happen to get away with this, because of a quirk in object.__new__ (a quirk which works differently in Python 3.x, by the way). But those are the exact same cases the previous solution can handle, and that solution works for much more obvious reason.
Put another way: The previous solution depends on nobody defining __new__ because it never calls __new__. This solution depends on nobody defining __new__ because it calls __new__ with the wrong arguments.
I am new to python. I think non-class objects do not have bases attribute whereas class objects do have it. But I am not sure. How does python\cpython checks if an object is non-class or class and passes the correct arguments to the object's descriptor attribute accordingly during the attribute access?
============================================
updated:
I was learning how __getattribute__ and descriptor cooperate together to make bounded methods. I was wondering how class object & non-class object invokes the descriptor's __get__ differently. I thought those 2 types of objects shared the same __getattribute__ CPython function and that same function would have to know if the invoking object was a class or non-class. But I was wrong. This article explains it well:
http://docs.python.org/dev/howto/descriptor.html#functions-and-methods
So class object use type.__getattribute__ whereas non-class object use object.__getattribute__. They are different CPython functions. And super has a third __getattribute__ CPython implementation as well.
However, about the super one, the above article states that:
quote and quote
The object returned by super() also has a custom _getattribute_() method for invoking descriptors. The call super(B, obj).m() searches obj._class_._mro_ for the base class A immediately following B and then returns A._dict_['m']._get_(obj, A). If not a descriptor, m is returned unchanged. If not in the dictionary, m reverts to a search using object._getattribute_().
The statement above didn't seem to match my experiment with Python3.1. What I saw is, which is reasonable to me:
super(B, obj).m ---> A.__dict__['m'].__get__(obj, type(obj))
objclass = type(obj)
super(B, objclass).m ---> A.__dict__['m'].__get__(None, objclass)
A was never passed to __get__
It is reasonable to me because I believe objclass (rather than A) 's mro chain is the one needed within m especially for the second case.
Was I doing something wrong? Or I didn't understand it correctly?
As the commenters asked: Why do you care? Usually that's a sign of not using Python the way it was meant to be used.
A very powerful concept of Python is duck typing. You don't care about the type or class of an object as long as it exposes the attributes you need.
how about inspect.isclass(objectname)?
more info here: http://docs.python.org/library/inspect.html