I've been striving mightily for three days to wrap my head around __init__ and "self", starting at Learn Python the Hard Way exercise 42, and moving on to read parts of the Python documentation, Alan Gauld's chapter on Object-Oriented Programming, Stack threads like this one on "self", and this one, and frankly, I'm getting ready to hit myself in the face with a brick until I pass out.
That being said, I've noticed a really common convention in initial __init__ definitions, which is to follow up with (self, foo) and then immediately declare, within that definition, that self.foo = foo.
From LPTHW, ex42:
class Game(object):
def __init__(self, start):
self.quips = ["a list", "of phrases", "here"]
self.start = start
From Alan Gauld:
def __init__(self,val): self.val = val
I'm in that horrible space where I can see that there's just One Big Thing I'm not getting, and I it's remaining opaque no matter how much I read about it and try to figure it out. Maybe if somebody can explain this little bit of consistency to me, the light will turn on. Is this because we need to say that "foo," the variable, will always be equal to the (foo) parameter, which is itself contained in the "self" parameter that's automatically assigned to the def it's attached to?
You might want to study up on object-oriented programming.
Loosely speaking, when you say
class Game(object):
def __init__(self, start):
self.start = start
you're saying:
I have a type of "thing" named Game
Whenever a new Game is created, it will demand me for some extra piece of information, start. (This is because the Game's initializer, named __init__, asks for this information.)
The initializer (also referred to as the "constructor", although that's a slight misnomer) needs to know which object (which was created just a moment ago) it's initializing. That's the first parameter -- which is usually called self by convention (but which you could call anything else...).
The game probably needs to remember what the start I gave it was. So it stores this information "inside" itself, by creating an instance variable also named start (nothing special, it's just whatever name you want), and assigning the value of the start parameter to the start variable.
If it doesn't store the value of the parameter, it won't have that informatoin available for later use.
Hope this explains what's happening.
I'm not quite sure what you're missing, so let me hit some basic items.
There are two "special" intialization names in a Python class object, one that is relatively rare for users to worry about, called __new__, and one that is much more usual, called __init__.
When you invoke a class-object constructor, e.g. (based on your example) x = Game(args), this first calls Game.__new__ to obtain memory in which to hold the object, and then Game.__init__ to fill in that memory. Most of the time, you can allow the underlying object.__new__ to allocate the memory, and you just need to fill it in. (You can use your own allocator for special weird rare cases like objects that never change and may share identities, the way ordinary integers do for instance. It's also for "metaclasses" that do weird stuff. But that's all a topic for much later.)
Your Game.__init__ function is called with "all the arguments to the constructor" plus one stashed in the front, which is the memory allocated for that object itself. (For "ordinary" objects that's mostly a dictionary of "attributes", plus the magic glue for classes, but for objects with __slots__ the attributes dictionary is omitted.) Naming that first argument self is just a convention—but don't violate it, people will hate you if you do. :-)
There's nothing that requires you to save all the arguments to the constructor. You can set any or all instance attributes you like:
class Weird(object):
def __init__(self, required_arg1, required_arg2, optional_arg3 = 'spam'):
self.irrelevant = False
def __str__(self):
...
The thing is that a Weird() instance is pretty useless after initialization, because you're required to pass two arguments that are simply thrown away, and given a third optional argument that is also thrown away:
x = Weird(42, 0.0, 'maybe')
The only point in requiring those thrown-away arguments is for future expansion, as it were (you might have these unused fields during early development). So if you're not immediately using and/or saving arguments to __init__, something is definitely weird in Weird.
Incidentally, the only reason for using (object) in the class definition is to indicate to Python 2.x that this is a "new-style" class (as distinguished from very-old-Python "instance only" classes). But it's generally best to use it—it makes what I said above about object.__new__ true, for instance :-) —until Python 3, where the old-style stuff is gone entirely.
Parameter names should be meaningful, to convey the role they play in the function/method or some information about their content.
You can see parameters of constructors to be even more important because they are often required for the working of the new instance and contain information which is needed in other methods of the class as well.
Imagine you have a Game class which accepts a playerList.
class Game:
def __init__(self, playerList):
self.playerList = playerList # or self.players = playerList
def printPlayerList(self):
print self.playerList # or print self.players
This list is needed in various methods of the class. Hence it makes sense to assign it to self.playerList. You could also assign it to self.players, whatever you feel more comfortable with and you think is understandable. But if you don't assign it to self.<somename> it won't be accessible in other methods.
So there is nothing special about how to name parameters/attributes/etc (there are some special class methods though), but using meaningful names makes the code easier to understand. Or would you understand the meaning of the above class if you had:
class G:
def __init__(self, x):
self.y = x
def ppl(self):
print self.y
? :) It does exactly the same but is harder to understand...
Related
The best way to explain this, I guess, is by example. Writing a game, have a parent class for all character classes that establishes the basic methods and values, with a whole horde of child classes for race and profession.
So you've got Elf, Orc, Human classes that modify basic stats, and then classes like Tank, Rogue, Spellcaster, that do additional modifications. By the use of super(), it's fairly simple to do:
class ElfTank( Elf, Tank )
And get the combined modifications you want.
Is there a way to dynamically create an object and specify its parents without making nine pretedermined classes (in the case of three professions and three races)?
Note: This is an example because it's much easier for me to explain any OOP thing using either PacMan or D&D. The actual application involves anywhere from two to six parents, and would have involved waaaaaaaaay too much backstory.
EDIT: Okay, based on the feedback, and the digging into some of the sidelinks, and some tinkering of my own, I'm going to answer my own question.
These solutions will, technically, work. But, be aware, they're all really bad ideas.
If you don't need to pass any arguments to the constructor:
def makeMeAClass( *parents ):
class dynamicClass( *parents ):
def __init__(self):
super().__init__()
return dynamicClass()
If you want to pass an argument to the constructor, you can simply put it
before you start listing the parent classes. (This will work for any number of arguments, so long as it's a consistent number of arguments).
def MakeMeAClassWithAnArgument( argument, *parents )
class dynamicClass( *parents ):
def __init__(self, argument):
super().__init__(argument)
return dynamicClass( argument )
If you really want to get crazy, you can even have a dynamic number of arguments passed to the constructor, but they have to be specified by keyword.
def MakeMeAClassWithDynamicArguments( *parents, **arguments )
class dynamicClass( *parents ):
def __init__(self, **arguments):
super().__init__(**arguments)
return dynamicClass(**arguments)
But, there are better ways to do it, particularly if you're going to end up with multiple objects of the same parentage-- you'll end up with multiple dynamically-generated object classes that are exactly identical, but taking up multiple places in memory, which sorta defeats a lot of the benefits of OOP.
I have encountered enough resistance to this sort of thing while researching it to accept "Don't Do That" that I haven't really dug further into why. I welcome comments from the more qualified for those who need to hear the followup "No, Really, Don't Do That."
This question already has answers here:
What is the purpose of the `self` parameter? Why is it needed?
(26 answers)
Closed 6 months ago.
When defining a method on a class in Python, it looks something like this:
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
But in some other languages, such as C#, you have a reference to the object that the method is bound to with the "this" keyword without declaring it as an argument in the method prototype.
Was this an intentional language design decision in Python or are there some implementation details that require the passing of "self" as an argument?
I like to quote Peters' Zen of Python. "Explicit is better than implicit."
In Java and C++, 'this.' can be deduced, except when you have variable names that make it impossible to deduce. So you sometimes need it and sometimes don't.
Python elects to make things like this explicit rather than based on a rule.
Additionally, since nothing is implied or assumed, parts of the implementation are exposed. self.__class__, self.__dict__ and other "internal" structures are available in an obvious way.
It's to minimize the difference between methods and functions. It allows you to easily generate methods in metaclasses, or add methods at runtime to pre-existing classes.
e.g.
>>> class C:
... def foo(self):
... print("Hi!")
...
>>>
>>> def bar(self):
... print("Bork bork bork!")
...
>>>
>>> c = C()
>>> C.bar = bar
>>> c.bar()
Bork bork bork!
>>> c.foo()
Hi!
>>>
It also (as far as I know) makes the implementation of the python runtime easier.
I suggest that one should read Guido van Rossum's blog on this topic - Why explicit self has to stay.
When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '#classmethod' or '#staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.
I reject hacks like special-casing '#classmethod' and '#staticmethod'.
Python doesn't force you on using "self". You can give it whatever name you want. You just have to remember that the first argument in a method definition header is a reference to the object.
Also allows you to do this: (in short, invoking Outer(3).create_inner_class(4)().weird_sum_with_closure_scope(5) will return 12, but will do so in the craziest of ways.
class Outer(object):
def __init__(self, outer_num):
self.outer_num = outer_num
def create_inner_class(outer_self, inner_arg):
class Inner(object):
inner_arg = inner_arg
def weird_sum_with_closure_scope(inner_self, num)
return num + outer_self.outer_num + inner_arg
return Inner
Of course, this is harder to imagine in languages like Java and C#. By making the self reference explicit, you're free to refer to any object by that self reference. Also, such a way of playing with classes at runtime is harder to do in the more static languages - not that's it's necessarily good or bad. It's just that the explicit self allows all this craziness to exist.
Moreover, imagine this: We'd like to customize the behavior of methods (for profiling, or some crazy black magic). This can lead us to think: what if we had a class Method whose behavior we could override or control?
Well here it is:
from functools import partial
class MagicMethod(object):
"""Does black magic when called"""
def __get__(self, obj, obj_type):
# This binds the <other> class instance to the <innocent_self> parameter
# of the method MagicMethod.invoke
return partial(self.invoke, obj)
def invoke(magic_self, innocent_self, *args, **kwargs):
# do black magic here
...
print magic_self, innocent_self, args, kwargs
class InnocentClass(object):
magic_method = MagicMethod()
And now: InnocentClass().magic_method() will act like expected. The method will be bound with the innocent_self parameter to InnocentClass, and with the magic_self to the MagicMethod instance. Weird huh? It's like having 2 keywords this1 and this2 in languages like Java and C#. Magic like this allows frameworks to do stuff that would otherwise be much more verbose.
Again, I don't want to comment on the ethics of this stuff. I just wanted to show things that would be harder to do without an explicit self reference.
I think it has to do with PEP 227:
Names in class scope are not accessible. Names are resolved in the
innermost enclosing function scope. If a class definition occurs in a
chain of nested scopes, the resolution process skips class
definitions. This rule prevents odd interactions between class
attributes and local variable access. If a name binding operation
occurs in a class definition, it creates an attribute on the resulting
class object. To access this variable in a method, or in a function
nested within a method, an attribute reference must be used, either
via self or via the class name.
I think the real reason besides "The Zen of Python" is that Functions are first class citizens in Python.
Which essentially makes them an Object. Now The fundamental issue is if your functions are object as well then, in Object oriented paradigm how would you send messages to Objects when the messages themselves are objects ?
Looks like a chicken egg problem, to reduce this paradox, the only possible way is to either pass a context of execution to methods or detect it. But since python can have nested functions it would be impossible to do so as the context of execution would change for inner functions.
This means the only possible solution is to explicitly pass 'self' (The context of execution).
So i believe it is a implementation problem the Zen came much later.
As explained in self in Python, Demystified
anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit). This is the reason the first parameter of a function in class must be the object itself.
class Point(object):
def __init__(self,x = 0,y = 0):
self.x = x
self.y = y
def distance(self):
"""Find distance from origin"""
return (self.x**2 + self.y**2) ** 0.5
Invocations:
>>> p1 = Point(6,8)
>>> p1.distance()
10.0
init() defines three parameters but we just passed two (6 and 8). Similarly distance() requires one but zero arguments were passed.
Why is Python not complaining about this argument number mismatch?
Generally, when we call a method with some arguments, the corresponding class function is called by placing the method's object before the first argument. So, anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit).
This is the reason the first parameter of a function in class must be the object itself. Writing this parameter as self is merely a convention. It is not a keyword and has no special meaning in Python. We could use other names (like this) but I strongly suggest you not to. Using names other than self is frowned upon by most developers and degrades the readability of the code ("Readability counts").
...
In, the first example self.x is an instance attribute whereas x is a local variable. They are not the same and lie in different namespaces.
Self Is Here To Stay
Many have proposed to make self a keyword in Python, like this in C++ and Java. This would eliminate the redundant use of explicit self from the formal parameter list in methods. While this idea seems promising, it's not going to happen. At least not in the near future. The main reason is backward compatibility. Here is a blog from the creator of Python himself explaining why the explicit self has to stay.
The 'self' parameter keeps the current calling object.
class class_name:
class_variable
def method_name(self,arg):
self.var=arg
obj=class_name()
obj.method_name()
here, the self argument holds the object obj. Hence, the statement self.var denotes obj.var
There is also another very simple answer: according to the zen of python, "explicit is better than implicit".
I have been trying to fully understand this for a while now, and practically speaking I think I understand what happens but I can't seem to find anywhere that confirms wether I understood it correctly:
class test(object):
def __init__(self, this):
self.something = this
example = test("writing")
My question is: In the above example, is it correct that self is simply a stand-in for the instance I am creating? Meaning that when i create an instance and assign it to "example", then "example is put in place of self and behind the scenes does something resembling this:
class test(object):
def __init__(example, this):
example.something = this
example = test("writing")
Furthermore, does that also mean that as long as I am still working with this on a class basis (say in tandem with another class) I should still be using self.something, while I should be using example.something if I am working with it on an instance level?
I hope that made somewhat sense, im still trying to wrap my head properly around all of it, so let me know if I need to try and rephrase it.
For reference sake, should someone else end up asking the same, this reply: Python __init__ and self what do they do? almost did the trick for me, and only really left me a bit in doubt about the above questions.
This is correct. self is the instance of the class (i.e. the object) and you use it inside the class code (inside it's methods).
While the first argument can be named something else (example in your second code), the convention is that we always use self or the code might be highly confusing for other programmers. But you got the gist right by doing that, the example variable in the class (i.e. the self in your first code) and the example variable outside of the class is basically the same thing.
By the way, I'd also avoid the following two things:
having a class name that starts with a small leter case,
using a variable name this (since a variable named this does in some other languages essentially what self does in Python).
In Python, variables do not "contain" objects, they refer to them. So:
class test(object):
def __init__(self, this):
self.something = this
example = test("writing")
In this case example is a reference to the new object, but so is self. It is perfectly legal, and common, to have multiple references to the same object.
If you did:
another = example
this would not create a new object but have another reference to the same object. another, example (and self) would be references to the same single object.
You can test this by looking at the object's unique identifier, using id(). Add:
another = example
print id(another)
print id(example)
you will find that their id's are the same.
Python classes have no concept of public/private, so we are told to not touch something that starts with an underscore unless we created it. But does this not require complete knowledge of all classes from which we inherit, directly or indirectly? Witness:
class Base(object):
def __init__(self):
super(Base, self).__init__()
self._foo = 0
def foo(self):
return self._foo + 1
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self._foo = None
Sub().foo()
Expectedly, a TypeError is raised when None + 1 is evaluated. So I have to know that _foo exists in the base class. To get around this, __foo can be used instead, which solves the problem by mangling the name. This seems to be, if not elegant, an acceptable solution. However, what happens if Base inherits from a class (in a separate package) called Sub? Now __foo in my Sub overrides __foo in the grandparent Sub.
This implies that I have to know the entire inheritance chain, including all "private" objects each uses. The fact that Python is dynamically-typed makes this even harder, since there are no declarations to search for. The worst part, however, is probably the fact Base might inherit from object right now, but in some future release, it switches to inheriting from Sub. Clearly if I know Sub is inherited from, I can rename my class, however annoying that is. But I can't see into the future.
Is this not a case where a true private data type would prevent a problem? How, in Python, can I be sure that I'm not accidentally stepping on somebody's toes if those toes might spring into existence at some point in the future?
EDIT: I've apparently not made clear the primary question. I'm familiar with name mangling and the difference between a single and a double underscore. The question is: how do I deal with the fact that I might clash with classes whose existence I don't know of right now? If my parent class (which is in a package I did not write) happens to start inheriting from a class with the same name as my class, even name mangling won't help. Am I wrong in seeing this as a (corner) case that true private members would solve, but that Python has trouble with?
EDIT: As requested, the following is a full example:
File parent.py:
class Sub(object):
def __init__(self):
self.__foo = 12
def foo(self):
return self.__foo + 1
class Base(Sub):
pass
File sub.py:
import parent
class Sub(parent.Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
Sub().foo()
The grandparent's foo is called, but my __foo is used.
Obviously you wouldn't write code like this yourself, but parent could easily be provided by a third party, the details of which could change at any time.
Use private names (instead of protected ones), starting with a double underscore:
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
# ^^
will not conflict with _foo or __foo in Base. This is because Python replaces the double underscore with a single underscore and the name of the class; the following two lines are equivalent:
class Sub(Base):
def x(self):
self.__foo = None # .. is the same as ..
self._Sub__foo = None
(In response to the edit:) The chance that two classes in a class hierarchy not only have the same name, but that they are both using the same property name, and are both using the private mangled (__) form is so minuscule that it can be safely ignored in practice (I for one haven't heard of a single case so far).
In theory, however, you are correct in that in order to formally verify correctness of a program, one most know the entire inheritance chain. Luckily, formal verification usually requires a fixed set of libraries in any case.
This is in the spirit of the Zen of Python, which includes
practicality beats purity.
Name mangling includes the class so your Base.__foo and Sub.__foo will have different names. This was the entire reason for adding the name mangling feature to Python in the first place. One will be _Base__foo, the other _Sub__foo.
Many people prefer to use composition (has-a) instead of inheritance (is-a) for some of these very reasons.
This implies that I have to know the entire inheritance chain. . .
Yes, you should know the entire inheritance chain, or the docs for the object you are directly sub-classing should tell you what you need to know.
Subclassing is an advanced feature, and should be treated with care.
A good example of docs specifying what should be overridden in a subclass is the threading class:
This class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.
How often do you modify base classes in inheritance chains to introduce inheritance from a class with the same name as a subclass further down the chain???
Less flippantly, yes, you have to know the code you are working with. You certainly have to know the public names being used, after all. Python being python, discovering the public names in use by your ancestor classes takes pretty much the same effort as discovering the private ones.
In years of Python programming, I have never found this to be much of an issue in practice. When you're naming instance variables, you should have a pretty good idea whether (a) a name is generic enough that it's likely to be used in other contexts and (b) the class you're writing is likely to be involved in an inheritance hierarchy with other unknown classes. In such cases, you think a bit more carefully about the names you're using; self.value isn't a great idea for an attribute name, and neither is something like Adaptor a great class name.
In contrast, I have run into difficulties with the overuse of double-underscore names a number of times. Python being Python, even "private" names tend to be accessed by code defined outside the class. You might think that it would always be bad practice to let an external function access "private" attributes, but what about things like getattr and hasattr? The invocation of them can be in the class's own code, so the class is still controlling all access to the private attributes, but they still don't work without you doing the name-mangling manually. If Python had actually-enforced private variables you couldn't use functions like those on them at all. These days I tend to reserve double-underscore names for cases when I'm writing something very generic like a decorator, metaclass, or mixin that needs to add a "secret attribute" to the instances of the (unknown) classes it's applied to.
And of course there's the standard dynamic language argument: the reality is that you have to test your code thoroughly to have much justification in making the claim "my software works". Such testing will be very unlikely to miss the bugs caused by accidentally clashing names. If you are not doing that testing, then many more uncaught bugs will be introduced by other means than by accidental name clashes.
In summation, the lack of private variables is just not that big a deal in idiomatic Python code in practice, and the addition of true private variables would cause more frequent problems in other ways IMHO.
Mangling happens with double underscores. Single underscores are more of a "please don't".
You don't need to know all the details of all parent classes (note that deep inheritance is usually best avoided), because you can still dir() and help() and any other form of introspection you can come up with.
As noted, you can use name mangling. However, you can stick with a single underscore (or none!) if you document your code adequately - you should not have so many private variables that this proves to be a problem. Just say if a method relies on a private variable, and add either the variable, or the name of the method to the class docstring to alert users.
Further, if you create unit tests, you should create tests that check invariants on members, and accordingly these should be able to show up such name clashes.
If you really want to have "private" variables, and for whatever reason name-mangling doesn't meet your needs, you can factor your private state into another object:
class Foo(object):
class Stateholder(object): pass
def __init__(self):
self._state = Stateholder()
self.state.private = 1
I'm fairly new to object oriented programming so some of the abstraction ideas are a little blurry to me. I'm writing an interpreter for an old game language. Part of this has made me need to implement custom types from said language and place them on a stack to be manipulated as needed.
Now, I can put a string on a list. I can put a number on a list, and I've even found I can put symbols on a list. But I'm a bit fuzzy on how I would put a custom object instance on a list when I can't just drop it into a variable (since, after all, I don't know how many there will be and can't go about defining them by hand while the code is running :)
I've made a class for one of the simplest data types-- a DBREF. The DBREF just contains a Database reference number. I can't just use an integer, string, dictionary, etc, because there are type-checking mechanisms in the language I have to implement and that would confuse matters, since those are already used elsewhere in their closes analogues.
Here is my code and my reasoning behind it:
class dbref:
dbnumber=0
def __init__(self, number):
global number
dbnumber=number
def getdbref:
global number
return number
I create a class named dbref. All it does (for now) is take a number and store it in a variable. My hope is that if I were to do:
examplelist=[ dbref(5) ]
That the dbref object would be on the stack. Is that possible? Further, will I be able to do:
if typeof(examplelist[0]) is dbref:
print "It's a DBREF."
else:
print "Nope."
...or am I misunderstanding how Python classes work? Also, is my class definition wonky in any way?
If you used...
class dbref:
dbnumber=0
that would share the same number among all instances of the class, because dbnumber would be a class attribute, rather than an instance attribute. Try this instead:
class dbref(object):
def __init__(self, number):
self.dbnumber = number
def getdbref(self):
return self.dbnumber
self is a reference to the object instance itself that's automatically passed by Python when you call one of the instance's methods.