This question already has answers here:
What is the purpose of the `self` parameter? Why is it needed?
(26 answers)
Closed 6 months ago.
When defining a method on a class in Python, it looks something like this:
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
But in some other languages, such as C#, you have a reference to the object that the method is bound to with the "this" keyword without declaring it as an argument in the method prototype.
Was this an intentional language design decision in Python or are there some implementation details that require the passing of "self" as an argument?
I like to quote Peters' Zen of Python. "Explicit is better than implicit."
In Java and C++, 'this.' can be deduced, except when you have variable names that make it impossible to deduce. So you sometimes need it and sometimes don't.
Python elects to make things like this explicit rather than based on a rule.
Additionally, since nothing is implied or assumed, parts of the implementation are exposed. self.__class__, self.__dict__ and other "internal" structures are available in an obvious way.
It's to minimize the difference between methods and functions. It allows you to easily generate methods in metaclasses, or add methods at runtime to pre-existing classes.
e.g.
>>> class C:
... def foo(self):
... print("Hi!")
...
>>>
>>> def bar(self):
... print("Bork bork bork!")
...
>>>
>>> c = C()
>>> C.bar = bar
>>> c.bar()
Bork bork bork!
>>> c.foo()
Hi!
>>>
It also (as far as I know) makes the implementation of the python runtime easier.
I suggest that one should read Guido van Rossum's blog on this topic - Why explicit self has to stay.
When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '#classmethod' or '#staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.
I reject hacks like special-casing '#classmethod' and '#staticmethod'.
Python doesn't force you on using "self". You can give it whatever name you want. You just have to remember that the first argument in a method definition header is a reference to the object.
Also allows you to do this: (in short, invoking Outer(3).create_inner_class(4)().weird_sum_with_closure_scope(5) will return 12, but will do so in the craziest of ways.
class Outer(object):
def __init__(self, outer_num):
self.outer_num = outer_num
def create_inner_class(outer_self, inner_arg):
class Inner(object):
inner_arg = inner_arg
def weird_sum_with_closure_scope(inner_self, num)
return num + outer_self.outer_num + inner_arg
return Inner
Of course, this is harder to imagine in languages like Java and C#. By making the self reference explicit, you're free to refer to any object by that self reference. Also, such a way of playing with classes at runtime is harder to do in the more static languages - not that's it's necessarily good or bad. It's just that the explicit self allows all this craziness to exist.
Moreover, imagine this: We'd like to customize the behavior of methods (for profiling, or some crazy black magic). This can lead us to think: what if we had a class Method whose behavior we could override or control?
Well here it is:
from functools import partial
class MagicMethod(object):
"""Does black magic when called"""
def __get__(self, obj, obj_type):
# This binds the <other> class instance to the <innocent_self> parameter
# of the method MagicMethod.invoke
return partial(self.invoke, obj)
def invoke(magic_self, innocent_self, *args, **kwargs):
# do black magic here
...
print magic_self, innocent_self, args, kwargs
class InnocentClass(object):
magic_method = MagicMethod()
And now: InnocentClass().magic_method() will act like expected. The method will be bound with the innocent_self parameter to InnocentClass, and with the magic_self to the MagicMethod instance. Weird huh? It's like having 2 keywords this1 and this2 in languages like Java and C#. Magic like this allows frameworks to do stuff that would otherwise be much more verbose.
Again, I don't want to comment on the ethics of this stuff. I just wanted to show things that would be harder to do without an explicit self reference.
I think it has to do with PEP 227:
Names in class scope are not accessible. Names are resolved in the
innermost enclosing function scope. If a class definition occurs in a
chain of nested scopes, the resolution process skips class
definitions. This rule prevents odd interactions between class
attributes and local variable access. If a name binding operation
occurs in a class definition, it creates an attribute on the resulting
class object. To access this variable in a method, or in a function
nested within a method, an attribute reference must be used, either
via self or via the class name.
I think the real reason besides "The Zen of Python" is that Functions are first class citizens in Python.
Which essentially makes them an Object. Now The fundamental issue is if your functions are object as well then, in Object oriented paradigm how would you send messages to Objects when the messages themselves are objects ?
Looks like a chicken egg problem, to reduce this paradox, the only possible way is to either pass a context of execution to methods or detect it. But since python can have nested functions it would be impossible to do so as the context of execution would change for inner functions.
This means the only possible solution is to explicitly pass 'self' (The context of execution).
So i believe it is a implementation problem the Zen came much later.
As explained in self in Python, Demystified
anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit). This is the reason the first parameter of a function in class must be the object itself.
class Point(object):
def __init__(self,x = 0,y = 0):
self.x = x
self.y = y
def distance(self):
"""Find distance from origin"""
return (self.x**2 + self.y**2) ** 0.5
Invocations:
>>> p1 = Point(6,8)
>>> p1.distance()
10.0
init() defines three parameters but we just passed two (6 and 8). Similarly distance() requires one but zero arguments were passed.
Why is Python not complaining about this argument number mismatch?
Generally, when we call a method with some arguments, the corresponding class function is called by placing the method's object before the first argument. So, anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit).
This is the reason the first parameter of a function in class must be the object itself. Writing this parameter as self is merely a convention. It is not a keyword and has no special meaning in Python. We could use other names (like this) but I strongly suggest you not to. Using names other than self is frowned upon by most developers and degrades the readability of the code ("Readability counts").
...
In, the first example self.x is an instance attribute whereas x is a local variable. They are not the same and lie in different namespaces.
Self Is Here To Stay
Many have proposed to make self a keyword in Python, like this in C++ and Java. This would eliminate the redundant use of explicit self from the formal parameter list in methods. While this idea seems promising, it's not going to happen. At least not in the near future. The main reason is backward compatibility. Here is a blog from the creator of Python himself explaining why the explicit self has to stay.
The 'self' parameter keeps the current calling object.
class class_name:
class_variable
def method_name(self,arg):
self.var=arg
obj=class_name()
obj.method_name()
here, the self argument holds the object obj. Hence, the statement self.var denotes obj.var
There is also another very simple answer: according to the zen of python, "explicit is better than implicit".
Related
The following code is of course totally pointless; it's not supposed to
do anything but illustrate what I'm confused about:
class func():
def __call__(self, x):
raise Exception("func.__call__ error")
def double(x):
return 2*x
doubler = func()
doubler.__call__ = double
print doubler(2)
Can someone explain why this works? I would have expected that if I
wanted to set doubler.__call__ to something it would be a function
that takes two variables; I'd expect the code above to raise some sort
of too-many-parameters error. What gets passed to what, when?
(And then: How could I set doubler.__call__ to a function that
will actually have access to both "self" and "x"?)
(Context: An admittedly silly of-academic-interest example of why I might want to set an instance method this way: Each computable instance needs its own Approx method; creating a separate subclass for each instance seems "wrong"...)
Edit. Probably a better example, making it clear it has nothing
to do with magic-method magic:
class func():
def call(self, x):
raise Exception("func.call error")
def double(x):
return 2*x
doubler = func()
doubler.call = double
print doubler.call(2)
On third thought, probably the following is the right way to do it.
(i) Seems cleaner somehow, using the Python object model instead of
tinkering with it (ii) even 24 hours ago with my then much cruder
understanding I would have expected it to work; somehow in this
version it simply seems to make sense to me that the function passed
to the constructor should take only one variable (iii) it seems to
work regardless of whether I inherit from object, which I think means it would also work in 3.0.
class func3(object):
def __init__(self, f):
self.f = f
def __call__(self, x):
return self.f(x)
def double(x):
return 2.0*x
f3=func3(double)
print f3(2)
When you assign to doubler.__call__, you're binding an function to an instance attribute. This hides the class attribute of the same name that was created in the class statement.
Python's method binding only kicks in when you are looking up a class attribute via an instance. If the attribute's value is a descriptor (which functions are), then the descriptor's __get__ method gets called with appropriate parameters. For a function object, that binds the method to the instance (so self gets passed in automatically as the first argument).
Your first example wouldn't actually work in Python 3, only in Python 2. That's because in Python 2 you're creating an "old-style" class, which does all its method lookups on the instance. In new-style classes (which you can get in Python 2 by inheriting from object, or by default in Python 3), __special__ methods, when they're invoked by the interpreter (e.g. when you do doubler(2) to run doubler.__call__) are looked up only in the class, not in the instance's attributes. So your first example won't work with a new-style class, but the version that uses a normal method (call instead of __call__) would be fine.
This is something between an answer to the question and a continuation of the question. I was kindly referred to another thread where more or less the same question was answered. I didn't follow the answers in that thread very well, being ignorant of the things the people there are talking about, hence the Question: Is what I say below correct? (If yes then this is an answer to the question above; if no I'd appreciate someone explaining why not...)
(i) Since I assign a function to an instance of func instead of to the class, it is now an "instance method", as opposed to a "class method".
(ii) And that's why it's not passed the instance as the first parameter; that happens with class methods but not with instance methods...
I'm looking at the source code for a trie implementation
On lines 80-85:
def keys(self, prefix=[]):
return self.__keys__(prefix)
def __keys__(self, prefix=[], seen=[]):
result = []
etc.
What is def __keys__? Is that a magic object that is self-created? If so, is this poor code? Or does __keys__ exist as a standard Python magic method? I can't find it anywhere in the Python documentation, though.
Why is it legal for the function to call self.__keys__ before def __keys__ is even instantiated? Wouldn't def __keys__ have to go before def keys (since keys calls __keys__)?
For your second question, it is legal, the functions for a class are defined when the class gets defined , so you can be sure both functions would be defined before keys() is called, the logic also applies to normal functions, we can do -
>>> def a():
... b()
...
>>> def b():
... print("In B()")
...
>>> a()
In B()
This is legal because both a() and b() are defined before a() is called. It would only be illegal , if you try to call a() before b() gets defined. Please note defining a function does not automatically call it , and python does not validate at time of definition of function whether any functions used in a function is defined or not (untill runtime, when the function is called and in that case it throws a NameError)
For your first question, I do not know of any such magic methods called __keys__() , cannot find it in documentation either.
All of the real "magic methods" are in the data model documentation; __keys__ isn't one of them. The style guide says:
Never invent such names; only use them as documented.
so yes, making up a new one is bad form (the convention would have been to call it _keys).
The second part of your question doesn't make sense; even if this wasn't a class, there is no need to define methods and functions in the order they're called. As long as they exist by the time the call actually gets made, it's not a problem. I tend to define public methods before private ones, even though the former may call the latter, simply for the reader's convenience.
There is no magic method named __keys__(), so as you suspected this is just poor naming.
The code in the class definition can be in any order. All the matters that the definition has been made by the time the actual call is made downstream.
There is no magic method named __keys__, so its just a wrong naming convention. Looking at the code, the author just wanted to have a private method which is used internally, and also from the public method keys. As you can see __keys__ accepts an additional argument.
About the second question, there is no need that you define the functions in the same order as they called. It will be available by the time code is compiled.
The compilation of a class in Python is done way before the class is instantiated.
Whenever class type is created, the body of the class block is compiled and executed. Then, all the functions are transformed either into bound handles (normal functions) or into classmethod/staticmethod objects. Then, when a new instance is created, content of the type's __dict__ is copied over to the instance (and bound handles are transformed into methods).
Therefore, at the moment of calling instance.keys(), the instance already has both keys and __keys__ methods.
Also, there is no __keys__ method in any data mode, as far as I know.
I just can't see why do we need to use #staticmethod. Let's start with an exmaple.
class test1:
def __init__(self,value):
self.value=value
#staticmethod
def static_add_one(value):
return value+1
#property
def new_val(self):
self.value=self.static_add_one(self.value)
return self.value
a=test1(3)
print(a.new_val) ## >>> 4
class test2:
def __init__(self,value):
self.value=value
def static_add_one(self,value):
return value+1
#property
def new_val(self):
self.value=self.static_add_one(self.value)
return self.value
b=test2(3)
print(b.new_val) ## >>> 4
In the example above, the method, static_add_one , in the two classes do not require the instance of the class(self) in calculation.
The method static_add_one in the class test1 is decorated by #staticmethod and work properly.
But at the same time, the method static_add_one in the class test2 which has no #staticmethod decoration also works properly by using a trick that provides a self in the argument but doesn't use it at all.
So what is the benefit of using #staticmethod? Does it improve the performance? Or is it just due to the zen of python which states that "Explicit is better than implicit"?
The reason to use staticmethod is if you have something that could be written as a standalone function (not part of any class), but you want to keep it within the class because it's somehow semantically related to the class. (For instance, it could be a function that doesn't require any information from the class, but whose behavior is specific to the class, so that subclasses might want to override it.) In many cases, it could make just as much sense to write something as a standalone function instead of a staticmethod.
Your example isn't really the same. A key difference is that, even though you don't use self, you still need an instance to call static_add_one --- you can't call it directly on the class with test2.static_add_one(1). So there is a genuine difference in behavior there. The most serious "rival" to a staticmethod isn't a regular method that ignores self, but a standalone function.
Today I suddenly find a benefit of using #staticmethod.
If you created a staticmethod within a class, you don't need to create an instance of the class before using the staticmethod.
For example,
class File1:
def __init__(self, path):
out=self.parse(path)
def parse(self, path):
..parsing works..
return x
class File2:
def __init__(self, path):
out=self.parse(path)
#staticmethod
def parse(path):
..parsing works..
return x
if __name__=='__main__':
path='abc.txt'
File1.parse(path) #TypeError: unbound method parse() ....
File2.parse(path) #Goal!!!!!!!!!!!!!!!!!!!!
Since the method parse is strongly related to the classes File1 and File2, it is more natural to put it inside the class. However, sometimes this parse method may also be used in other classes under some circumstances. If you want to do so using File1, you must create an instance of File1 before calling the method parse. While using staticmethod in the class File2, you may directly call the method by using the syntax File2.parse.
This makes your works more convenient and natural.
I will add something other answers didn't mention. It's not only a matter of modularity, of putting something next to other logically related parts. It's also that the method could be non-static at other point of the hierarchy (i.e. in a subclass or superclass) and thus participate in polymorphism (type based dispatching). So if you put that function outside the class you will be precluding subclasses from effectively overriding it. Now, say you realize you don't need self in function C.f of class C, you have three two options:
Put it outside the class. But we just decided against this.
Do nothing new: while unused, still keep the self parameter.
Declare you are not using the self parameter, while still letting other C methods to call f as self.f, which is required if you wish to keep open the possibility of further overrides of f that do depend on some instance state.
Option 2 demands less conceptual baggage (you already have to know about self and methods-as-bound-functions, because it's the more general case). But you still may prefer to be explicit about self not being using (and the interpreter could even reward you with some optimization, not having to partially apply a function to self). In that case, you pick option 3 and add #staticmethod on top of your function.
Use #staticmethod for methods that don't need to operate on a specific object, but that you still want located in the scope of the class (as opposed to module scope).
Your example in test2.static_add_one wastes its time passing an unused self parameter, but otherwise works the same as test1.static_add_one. Note that this extraneous parameter can't be optimized away.
One example I can think of is in a Django project I have, where a model class represents a database table, and an object of that class represents a record. There are some functions used by the class that are stand-alone and do not need an object to operate on, for example a function that converts a title into a "slug", which is a representation of the title that follows the character set limits imposed by URL syntax. The function that converts a title to a slug is declared as a staticmethod precisely to strongly associate it with the class that uses it.
class Foo(object):
pass
foo = Foo()
def bar(self):
print 'bar'
Foo.bar = bar
foo.bar() #bar
Coming from JavaScript, if a "class" prototype was augmented with a certain attribute. It is known that all instances of that "class" would have that attribute in its prototype chain, hence no modifications has to be done on any of its instances or "sub-classes".
In that sense, how can a Class-based language like Python achieve Monkey patching?
The real question is, how can it not? In Python, classes are first-class objects in their own right. Attribute access on instances of a class is resolved by looking up attributes on the instance, and then the class, and then the parent classes (in the method resolution order.) These lookups are all done at runtime (as is everything in Python.) If you add an attribute to a class after you create an instance, the instance will still "see" the new attribute, simply because nothing prevents it.
In other words, it works because Python doesn't cache attributes (unless your code does), because it doesn't use negative caching or shadowclasses or any of the optimization techniques that would inhibit it (or, when Python implementations do, they take into account the class might change) and because everything is runtime.
I just read through a bunch of documentation, and as far as I can tell, the whole story of how foo.bar is resolved, is as follows:
Can we find foo.__getattribute__ by the following process? If so, use the result of foo.__getattribute__('bar').
(Looking up __getattribute__ will not cause infinite recursion, but the implementation of it might.)
(In reality, we will always find __getattribute__ in new-style objects, as a default implementation is provided in object - but that implementation is of the following process. ;) )
(If we define a __getattribute__ method in Foo, and access foo.__getattribute__, foo.__getattribute__('__getattribute__') will be called! But this does not imply infinite recursion - if you are careful ;) )
Is bar a "special" name for an attribute provided by the Python runtime (e.g. __dict__, __class__, __bases__, __mro__)? If so, use that. (As far as I can tell, __getattribute__ falls into this category, which avoids infinite recursion.)
Is bar in the foo.__dict__ dict? If so, use foo.__dict__['bar'].
Does foo.__mro__ exist (i.e., is foo actually a class)? If so,
For each base-class base in foo.__mro__[1:]:
(Note that the first one will be foo itself, which we already searched.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
For each base-class base of foo.__class__.__mro__:
(Note that this recursion is not a problem: those attributes should always exist, and fall into the "provided by the Python runtime" case. foo.__class__.__mro__[0] will always be foo.__class__, i.e. Foo in our example.)
(Note that we do this even if foo.__mro__ exists. This is because classes have a class, too: its name is type, and it provides, among other things, the method used to calculate __mro__ attributes in the first place.)
Is bar in base.__dict__? If so:
Let x be base.__dict__['bar'].
Can we find (again, recursively, but it won't cause a problem) x.__get__?
If so, use x.__get__(foo, foo.__class__).
(Note that the function bar is, itself, an object, and the Python compiler automatically gives functions a __get__ attribute which is designed to be used this way.)
Otherwise, use x.
If we still haven't found something to use: can we find foo.__getattr__ by the preceding process? If so, use the result of foo.__getattr__('bar').
If everything failed, raise AttributeError.
bar.__get__ is not really a function - it's a "method-wrapper" - but you can imagine it being implemented vaguely like this:
# Somewhere in the Python internals
class __method_wrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, obj, cls):
return lambda *args, **kwargs: func(obj, *args, **kwargs)
# Except it actually returns a "bound method" object
# that uses cls for its __repr__
# and there is a __repr__ for the method_wrapper that I *think*
# uses the hashcode of the underlying function, rather than of itself,
# but I'm not sure.
# Automatically done after compiling bar
bar.__get__ = __method_wrapper(bar)
The "binding" that happens within the __get__ automatically attached to bar (called a descriptor), by the way, is more or less the reason why you have to specify self parameters explicitly for Python methods. In Javascript, this itself is magical; in Python, it is merely the process of binding things to self that is magical. ;)
And yes, you can explicitly set a __get__ method on your own objects and have it do special things when you set a class attribute to an instance of the object and then access it from an instance of that other class. Python is extremely reflective. :) But if you want to learn how to do that, and get a really full understanding of the situation, you have a lot of reading to do. ;)
I'm teaching myself Python and I see the following in Dive into Python section 5.3:
By convention, the first argument of any Python class method (the reference to the current instance) is called self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention.
Considering that self is not a Python keyword, I'm guessing that it can sometimes be useful to use something else. Are there any such cases? If not, why is it not a keyword?
No, unless you want to confuse every other programmer that looks at your code after you write it. self is not a keyword because it is an identifier. It could have been a keyword and the fact that it isn't one was a design decision.
As a side observation, note that Pilgrim is committing a common misuse of terms here: a class method is quite a different thing from an instance method, which is what he's talking about here. As wikipedia puts it, "a method is a subroutine that is exclusively associated either with a class (in which case it is called a class method or a static method) or with an object (in which case it is an instance method).". Python's built-ins include a staticmethod type, to make static methods, and a classmethod type, to make class methods, each generally used as a decorator; if you don't use either, a def in a class body makes an instance method. E.g.:
>>> class X(object):
... def noclass(self): print self
... #classmethod
... def withclass(cls): print cls
...
>>> x = X()
>>> x.noclass()
<__main__.X object at 0x698d0>
>>> x.withclass()
<class '__main__.X'>
>>>
As you see, the instance method noclass gets the instance as its argument, but the class method withclass gets the class instead.
So it would be extremely confusing and misleading to use self as the name of the first parameter of a class method: the convention in this case is instead to use cls, as in my example above. While this IS just a convention, there is no real good reason for violating it -- any more than there would be, say, for naming a variable number_of_cats if the purpose of the variable is counting dogs!-)
The only case of this I've seen is when you define a function outside of a class definition, and then assign it to the class, e.g.:
class Foo(object):
def bar(self):
# Do something with 'self'
def baz(inst):
return inst.bar()
Foo.baz = baz
In this case, self is a little strange to use, because the function could be applied to many classes. Most often I've seen inst or cls used instead.
I once had some code like (and I apologize for lack of creativity in the example):
class Animal:
def __init__(self, volume=1):
self.volume = volume
self.description = "Animal"
def Sound(self):
pass
def GetADog(self, newvolume):
class Dog(Animal):
def Sound(this):
return self.description + ": " + ("woof" * this.volume)
return Dog(newvolume)
Then we have output like:
>>> a = Animal(3)
>>> d = a.GetADog(2)
>>> d.Sound()
'Animal: woofwoof'
I wasn't sure if self within the Dog class would shadow self within the Animal class, so I opted to make Dog's reference the word "this" instead. In my opinion and for that particular application, that was more clear to me.
Because it is a convention, not language syntax. There is a Python style guide that people who program in Python follow. This way libraries have a familiar look and feel. Python places a lot of emphasis on readability, and consistency is an important part of this.
I think that the main reason self is used by convention rather than being a Python keyword is because it's simpler to have all methods/functions take arguments the same way rather than having to put together different argument forms for functions, class methods, instance methods, etc.
Note that if you have an actual class method (i.e. one defined using the classmethod decorator), the convention is to use "cls" instead of "self".