I'm coming from the Java world and reading Bruce Eckels' Python 3 Patterns, Recipes and Idioms.
While reading about classes, it goes on to say that in Python there is no need to declare instance variables. You just use them in the constructor, and boom, they are there.
So for example:
class Simple:
def __init__(self, s):
print("inside the simple constructor")
self.s = s
def show(self):
print(self.s)
def showMsg(self, msg):
print(msg + ':', self.show())
If that’s true, then any object of class Simple can just change the value of variable s outside of the class.
For example:
if __name__ == "__main__":
x = Simple("constructor argument")
x.s = "test15" # this changes the value
x.show()
x.showMsg("A message")
In Java, we have been taught about public/private/protected variables. Those keywords make sense because at times you want variables in a class to which no one outside the class has access to.
Why is that not required in Python?
It's cultural. In Python, you don't write to other classes' instance or class variables. In Java, nothing prevents you from doing the same if you really want to - after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.
If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they're not easily visible to code outside the namespace that contains them (although you can get around it if you're determined enough, just like you can get around Java's protections if you work at it).
By the same convention, the _ prefix means _variable should be used internally in the class (or module) only, even if you're not technically prevented from accessing it from somewhere else. You don't play around with another class's variables that look like __foo or _bar.
Private variables in Python is more or less a hack: the interpreter intentionally renames the variable.
class A:
def __init__(self):
self.__var = 123
def printVar(self):
print self.__var
Now, if you try to access __var outside the class definition, it will fail:
>>> x = A()
>>> x.__var # this will return error: "A has no attribute __var"
>>> x.printVar() # this gives back 123
But you can easily get away with this:
>>> x.__dict__ # this will show everything that is contained in object x
# which in this case is something like {'_A__var' : 123}
>>> x._A__var = 456 # you now know the masked name of private variables
>>> x.printVar() # this gives back 456
You probably know that methods in OOP are invoked like this: x.printVar() => A.printVar(x). If A.printVar() can access some field in x, this field can also be accessed outside A.printVar()... After all, functions are created for reusability, and there isn't any special power given to the statements inside.
As correctly mentioned by many of the comments above, let's not forget the main goal of Access Modifiers: To help users of code understand what is supposed to change and what is supposed not to. When you see a private field you don't mess around with it. So it's mostly syntactic sugar which is easily achieved in Python by the _ and __.
Python does not have any private variables like C++ or Java does. You could access any member variable at any time if wanted, too. However, you don't need private variables in Python, because in Python it is not bad to expose your classes' member variables. If you have the need to encapsulate a member variable, you can do this by using "#property" later on without breaking existing client code.
In Python, the single underscore "_" is used to indicate that a method or variable is not considered as part of the public API of a class and that this part of the API could change between different versions. You can use these methods and variables, but your code could break, if you use a newer version of this class.
The double underscore "__" does not mean a "private variable". You use it to define variables which are "class local" and which can not be easily overridden by subclasses. It mangles the variables name.
For example:
class A(object):
def __init__(self):
self.__foobar = None # Will be automatically mangled to self._A__foobar
class B(A):
def __init__(self):
self.__foobar = 1 # Will be automatically mangled to self._B__foobar
self.__foobar's name is automatically mangled to self._A__foobar in class A. In class B it is mangled to self._B__foobar. So every subclass can define its own variable __foobar without overriding its parents variable(s). But nothing prevents you from accessing variables beginning with double underscores. However, name mangling prevents you from calling this variables /methods incidentally.
I strongly recommend you watch Raymond Hettinger's Python's class development toolkit from PyCon 2013, which gives a good example why and how you should use #property and "__"-instance variables.
If you have exposed public variables and you have the need to encapsulate them, then you can use #property. Therefore you can start with the simplest solution possible. You can leave member variables public unless you have a concrete reason to not do so. Here is an example:
class Distance:
def __init__(self, meter):
self.meter = meter
d = Distance(1.0)
print(d.meter)
# prints 1.0
class Distance:
def __init__(self, meter):
# Customer request: Distances must be stored in millimeters.
# Public available internals must be changed.
# This would break client code in C++.
# This is why you never expose public variables in C++ or Java.
# However, this is Python.
self.millimeter = meter * 1000
# In Python we have #property to the rescue.
#property
def meter(self):
return self.millimeter *0.001
#meter.setter
def meter(self, value):
self.millimeter = value * 1000
d = Distance(1.0)
print(d.meter)
# prints 1.0
There is a variation of private variables in the underscore convention.
In [5]: class Test(object):
...: def __private_method(self):
...: return "Boo"
...: def public_method(self):
...: return self.__private_method()
...:
In [6]: x = Test()
In [7]: x.public_method()
Out[7]: 'Boo'
In [8]: x.__private_method()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-fa17ce05d8bc> in <module>()
----> 1 x.__private_method()
AttributeError: 'Test' object has no attribute '__private_method'
There are some subtle differences, but for the sake of programming pattern ideological purity, it's good enough.
There are examples out there of #private decorators that more closely implement the concept, but your mileage may vary. Arguably, one could also write a class definition that uses meta.
As mentioned earlier, you can indicate that a variable or method is private by prefixing it with an underscore. If you don't feel like this is enough, you can always use the property decorator. Here's an example:
class Foo:
def __init__(self, bar):
self._bar = bar
#property
def bar(self):
"""Getter for '_bar'."""
return self._bar
This way, someone or something that references bar is actually referencing the return value of the bar function rather than the variable itself, and therefore it can be accessed but not changed. However, if someone really wanted to, they could simply use _bar and assign a new value to it. There is no surefire way to prevent someone from accessing variables and methods that you wish to hide, as has been said repeatedly. However, using property is the clearest message you can send that a variable is not to be edited. property can also be used for more complex getter/setter/deleter access paths, as explained here: https://docs.python.org/3/library/functions.html#property
Python has limited support for private identifiers, through a feature that automatically prepends the class name to any identifiers starting with two underscores. This is transparent to the programmer, for the most part, but the net effect is that any variables named this way can be used as private variables.
See here for more on that.
In general, Python's implementation of object orientation is a bit primitive compared to other languages. But I enjoy this, actually. It's a very conceptually simple implementation and fits well with the dynamic style of the language.
The only time I ever use private variables is when I need to do other things when writing to or reading from the variable and as such I need to force the use of a setter and/or getter.
Again this goes to culture, as already stated. I've been working on projects where reading and writing other classes variables was free-for-all. When one implementation became deprecated it took a lot longer to identify all code paths that used that function. When use of setters and getters was forced, a debug statement could easily be written to identify that the deprecated method had been called and the code path that calls it.
When you are on a project where anyone can write an extension, notifying users about deprecated methods that are to disappear in a few releases hence is vital to keep module breakage at a minimum upon upgrades.
So my answer is; if you and your colleagues maintain a simple code set then protecting class variables is not always necessary. If you are writing an extensible system then it becomes imperative when changes to the core is made that needs to be caught by all extensions using the code.
"In java, we have been taught about public/private/protected variables"
"Why is that not required in python?"
For the same reason, it's not required in Java.
You're free to use -- or not use private and protected.
As a Python and Java programmer, I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected.
Why not?
Here's my question "protected from whom?"
Other programmers on my team? They have the source. What does protected mean when they can change it?
Other programmers on other teams? They work for the same company. They can -- with a phone call -- get the source.
Clients? It's work-for-hire programming (generally). The clients (generally) own the code.
So, who -- precisely -- am I protecting it from?
In Python 3, if you just want to "encapsulate" the class attributes, like in Java, you can just do the same thing like this:
class Simple:
def __init__(self, str):
print("inside the simple constructor")
self.__s = str
def show(self):
print(self.__s)
def showMsg(self, msg):
print(msg + ':', self.show())
To instantiate this do:
ss = Simple("lol")
ss.show()
Note that: print(ss.__s) will throw an error.
In practice, Python 3 will obfuscate the global attribute name. It is turning this like a "private" attribute, like in Java. The attribute's name is still global, but in an inaccessible way, like a private attribute in other languages.
But don't be afraid of it. It doesn't matter. It does the job too. ;)
Private and protected concepts are very important. But Python is just a tool for prototyping and rapid development with restricted resources available for development, and that is why some of the protection levels are not so strictly followed in Python. You can use "__" in a class member. It works properly, but it does not look good enough. Each access to such field contains these characters.
Also, you can notice that the Python OOP concept is not perfect. Smalltalk or Ruby are much closer to a pure OOP concept. Even C# or Java are closer.
Python is a very good tool. But it is a simplified OOP language. Syntactically and conceptually simplified. The main goal of Python's existence is to bring to developers the possibility to write easy readable code with a high abstraction level in a very fast manner.
Here's how I handle Python 3 class fields:
class MyClass:
def __init__(self, public_read_variable, private_variable):
self.public_read_variable_ = public_read_variable
self.__private_variable = private_variable
I access the __private_variable with two underscores only inside MyClass methods.
I do read access of the public_read_variable_ with one underscore
outside the class, but never modify the variable:
my_class = MyClass("public", "private")
print(my_class.public_read_variable_) # OK
my_class.public_read_variable_ = 'another value' # NOT OK, don't do that.
So I’m new to Python but I have a background in C# and JavaScript. Python feels like a mix of the two in terms of features. JavaScript also struggles in this area and the way around it here, is to create a closure. This prevents access to data you don’t want to expose by returning a different object.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
return printer # returns the nested function
# Now let's try calling this function.
# Output: Hello
another = print_msg("Hello")
another()
https://www.programiz.com/python-programming/closure
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures#emulating_private_methods_with_closures
About sources (to change the access rights and thus bypass language encapsulation like Java or C++):
You don't always have the sources and even if you do, the sources are managed by a system that only allows certain programmers to access a source (in a professional context). Often, every programmer is responsible for certain classes and therefore knows what he can and cannot do. The source manager also locks the sources being modified and of course, manages the access rights of programmers.
So I trust more in software than in human, by experience. So convention is good, but multiple protections are better, like access management (real private variable) + sources management.
I have been thinking about private class attributes and methods (named members in further reading) since I have started to develop a package that I want to publish. The thought behind it were never to make it impossible to overwrite these members, but to have a warning for those who touch them. I came up with a few solutions that might help. The first solution is used in one of my favorite Python books, Fluent Python.
Upsides of technique 1:
It is unlikely to be overwritten by accident.
It is easily understood and implemented.
Its easier to handle than leading double underscore for instance attributes.
*In the book the hash-symbol was used, but you could use integer converted to strings as well. In Python it is forbidden to use klass.1
class Technique1:
def __init__(self, name, value):
setattr(self, f'private#{name}', value)
setattr(self, f'1{name}', value)
Downsides of technique 1:
Methods are not easily protected with this technique though. It is possible.
Attribute lookups are just possible via getattr
Still no warning to the user
Another solution I came across was to write __setattr__. Pros:
It is easily implemented and understood
It works with methods
Lookup is not affected
The user gets a warning or error
class Demonstration:
def __init__(self):
self.a = 1
def method(self):
return None
def __setattr__(self, name, value):
if not getattr(self, name, None):
super().__setattr__(name, value)
else:
raise ValueError(f'Already reserved name: {name}')
d = Demonstration()
#d.a = 2
d.method = None
Cons:
You can still overwrite the class
To have variables not just constants, you need to map allowed input.
Subclasses can still overwrite methods
To prevent subclasses from overwriting methods you can use __init_subclass__:
class Demonstration:
__protected = ['method']
def method(self):
return None
def __init_subclass__(cls):
protected_methods = Demonstration.__protected
subclass_methods = dir(cls)
for i in protected_methods:
p = getattr(Demonstration,i)
j = getattr(cls, i)
if not p is j:
raise ValueError(f'Protected method "{i}" was touched')
You see, there are ways to protect your class members, but it isn't any guarantee that users don't overwrite them anyway. This should just give you some ideas. In the end, you could also use a meta class, but this might open up new dangers to encounter. The techniques used here are also very simple minded and you should definitely take a look at the documentation, you can find useful feature to this technique and customize them to your need.
Related
This question already has answers here:
What is the purpose of the `self` parameter? Why is it needed?
(26 answers)
Closed 6 months ago.
When defining a method on a class in Python, it looks something like this:
class MyClass(object):
def __init__(self, x, y):
self.x = x
self.y = y
But in some other languages, such as C#, you have a reference to the object that the method is bound to with the "this" keyword without declaring it as an argument in the method prototype.
Was this an intentional language design decision in Python or are there some implementation details that require the passing of "self" as an argument?
I like to quote Peters' Zen of Python. "Explicit is better than implicit."
In Java and C++, 'this.' can be deduced, except when you have variable names that make it impossible to deduce. So you sometimes need it and sometimes don't.
Python elects to make things like this explicit rather than based on a rule.
Additionally, since nothing is implied or assumed, parts of the implementation are exposed. self.__class__, self.__dict__ and other "internal" structures are available in an obvious way.
It's to minimize the difference between methods and functions. It allows you to easily generate methods in metaclasses, or add methods at runtime to pre-existing classes.
e.g.
>>> class C:
... def foo(self):
... print("Hi!")
...
>>>
>>> def bar(self):
... print("Bork bork bork!")
...
>>>
>>> c = C()
>>> C.bar = bar
>>> c.bar()
Bork bork bork!
>>> c.foo()
Hi!
>>>
It also (as far as I know) makes the implementation of the python runtime easier.
I suggest that one should read Guido van Rossum's blog on this topic - Why explicit self has to stay.
When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '#classmethod' or '#staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.
I reject hacks like special-casing '#classmethod' and '#staticmethod'.
Python doesn't force you on using "self". You can give it whatever name you want. You just have to remember that the first argument in a method definition header is a reference to the object.
Also allows you to do this: (in short, invoking Outer(3).create_inner_class(4)().weird_sum_with_closure_scope(5) will return 12, but will do so in the craziest of ways.
class Outer(object):
def __init__(self, outer_num):
self.outer_num = outer_num
def create_inner_class(outer_self, inner_arg):
class Inner(object):
inner_arg = inner_arg
def weird_sum_with_closure_scope(inner_self, num)
return num + outer_self.outer_num + inner_arg
return Inner
Of course, this is harder to imagine in languages like Java and C#. By making the self reference explicit, you're free to refer to any object by that self reference. Also, such a way of playing with classes at runtime is harder to do in the more static languages - not that's it's necessarily good or bad. It's just that the explicit self allows all this craziness to exist.
Moreover, imagine this: We'd like to customize the behavior of methods (for profiling, or some crazy black magic). This can lead us to think: what if we had a class Method whose behavior we could override or control?
Well here it is:
from functools import partial
class MagicMethod(object):
"""Does black magic when called"""
def __get__(self, obj, obj_type):
# This binds the <other> class instance to the <innocent_self> parameter
# of the method MagicMethod.invoke
return partial(self.invoke, obj)
def invoke(magic_self, innocent_self, *args, **kwargs):
# do black magic here
...
print magic_self, innocent_self, args, kwargs
class InnocentClass(object):
magic_method = MagicMethod()
And now: InnocentClass().magic_method() will act like expected. The method will be bound with the innocent_self parameter to InnocentClass, and with the magic_self to the MagicMethod instance. Weird huh? It's like having 2 keywords this1 and this2 in languages like Java and C#. Magic like this allows frameworks to do stuff that would otherwise be much more verbose.
Again, I don't want to comment on the ethics of this stuff. I just wanted to show things that would be harder to do without an explicit self reference.
I think it has to do with PEP 227:
Names in class scope are not accessible. Names are resolved in the
innermost enclosing function scope. If a class definition occurs in a
chain of nested scopes, the resolution process skips class
definitions. This rule prevents odd interactions between class
attributes and local variable access. If a name binding operation
occurs in a class definition, it creates an attribute on the resulting
class object. To access this variable in a method, or in a function
nested within a method, an attribute reference must be used, either
via self or via the class name.
I think the real reason besides "The Zen of Python" is that Functions are first class citizens in Python.
Which essentially makes them an Object. Now The fundamental issue is if your functions are object as well then, in Object oriented paradigm how would you send messages to Objects when the messages themselves are objects ?
Looks like a chicken egg problem, to reduce this paradox, the only possible way is to either pass a context of execution to methods or detect it. But since python can have nested functions it would be impossible to do so as the context of execution would change for inner functions.
This means the only possible solution is to explicitly pass 'self' (The context of execution).
So i believe it is a implementation problem the Zen came much later.
As explained in self in Python, Demystified
anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit). This is the reason the first parameter of a function in class must be the object itself.
class Point(object):
def __init__(self,x = 0,y = 0):
self.x = x
self.y = y
def distance(self):
"""Find distance from origin"""
return (self.x**2 + self.y**2) ** 0.5
Invocations:
>>> p1 = Point(6,8)
>>> p1.distance()
10.0
init() defines three parameters but we just passed two (6 and 8). Similarly distance() requires one but zero arguments were passed.
Why is Python not complaining about this argument number mismatch?
Generally, when we call a method with some arguments, the corresponding class function is called by placing the method's object before the first argument. So, anything like obj.meth(args) becomes Class.meth(obj, args). The calling process is automatic while the receiving process is not (its explicit).
This is the reason the first parameter of a function in class must be the object itself. Writing this parameter as self is merely a convention. It is not a keyword and has no special meaning in Python. We could use other names (like this) but I strongly suggest you not to. Using names other than self is frowned upon by most developers and degrades the readability of the code ("Readability counts").
...
In, the first example self.x is an instance attribute whereas x is a local variable. They are not the same and lie in different namespaces.
Self Is Here To Stay
Many have proposed to make self a keyword in Python, like this in C++ and Java. This would eliminate the redundant use of explicit self from the formal parameter list in methods. While this idea seems promising, it's not going to happen. At least not in the near future. The main reason is backward compatibility. Here is a blog from the creator of Python himself explaining why the explicit self has to stay.
The 'self' parameter keeps the current calling object.
class class_name:
class_variable
def method_name(self,arg):
self.var=arg
obj=class_name()
obj.method_name()
here, the self argument holds the object obj. Hence, the statement self.var denotes obj.var
There is also another very simple answer: according to the zen of python, "explicit is better than implicit".
I've created an example class (a bitmask class) which has 4 really simple functions. I've also created a unit-test for this class.
import unittest
class BitMask:
def __init__(self):
self.__mask = 0
def set(self, slot):
self.__mask |= (1 << slot)
def remove(self, slot):
self.__mask &= ~(1 << slot)
def has(self, slot):
return (self.__mask >> slot) & 1
def clear(self):
self.__mask = 0
class TestBitmask(unittest.TestCase):
def setUp(self):
self.bitmask = BitMask()
def test_set_on_valid_input(self):
self.bitmask.set(5)
self.assertEqual(self.bitmask.has(5), True)
def test_has_on_valid_input(self):
self.bitmask.set(5)
self.assertEqual(self.bitmask.has(5), True)
def test_remove_on_valid_input(self):
self.bitmask.set(5)
self.bitmask.remove(5)
self.assertEqual(self.bitmask.has(5), False)
def test_clear(self):
for i in range(16):
self.bitmask.set(i)
self.bitmask.clear()
for j in range(16):
with self.subTest(j=j):
self.assertEqual(self.bitmask.has(j), False)
The problem I'm facing is that all these tests requires both the set and has methods for setting and checking values in the bitmask, but these methods are untested. I cannot confirm that one is correct without knowing that the other one is.
This example class isn't the first time I've experienced this issue. It usually occurs when I need to set up and check values/states within a class in order to test a method.
I've tried to find resources that explain this, but unfortunately their examples only use pure functions or where the changed attribute can be read directly. I could solve the problem by extracting the methods to be pure functions, or using a read-only property that returns the attribute __mask.
But is this the preferred approach? If not, how do I test a method that needs to be set up and/or checked using untested methods?
Not sure this answers your question, as it deals with changing of initial class design,
but here it goes.
You make a lazy class with no constructor or property , which hides the state of your
object. It is not the set or has methods that are untested, it is the issue of
state of the object being unknown. Have you had a .value property to reveal
self.__mask, this would solve a question of testing .set() and has().
Also I would strongly consider a default value in constructor, which makes it a better-looking
instantination and allows easier testing (some advice on avoiding setters in python is here).
def __init__(self, mask=0):
self.__mask = mask
If there any design considerations that prevent you from having a .value property,
perhaps an `__eq__ method can be used, if __init__ accepts a value.
a = BitMask(0)
b = BitMask(5)
a.set(5)
assert a == b
Of course, you can challenge that on how is __eq__tested itself.
Finally, perhaps you are failiar with patching or monkey-patching - a
technique to block something inside a object under test or make it work differently
(eg imitate web response without actual call). With any of the libraries for pathcing
I think you would still endup-performing a kind of x.__mask = value assignment, which
is not too reasonable for a small, nice, and locally-defined class like one here.
Hope it helps in line of what you are exploring.
I would’ve used single underscore instead of double, and just looked directly at the _mask in unit test.
Python doesn’t really have private attributes or methods, even double underscore attributes are accessible on your instance like this: obj._BitMask__mask.
Double underscore is used when you want subclasses to not overwrite the attribute of superclass. To indicate “private” you should use single underscore.
Allowing access to private fields is a part of python's design, so using this ability responsibly is not considered wrong, doubly so if you are accessing your own class.
The rationale behind "Do not touch the private fields" is that you as the developer can mess something up with the internals of the class, also private interface of s library can change at any point and break your code.
When you are writing unit tests you are not afraid of messing with your own class, and is accepting that you have to change unit test if you change your class, so this programming idiom is not useful for you to apply.
In Python, I have a class that I've built.
However, there is one method where I apply a rather specific type of substring-search procedure. This procedure could be a standalone function by itself (it just requires a needle a haystack string), but it feels odd to have the function outside the class, because my class depends on it.
What is the typical design paradigm for this? Is it typical to just have myClassName.py with the main class, as well as all the support functions outside the class itself, in the same file? Or is it better to have the support function embedded within the class at the expense of modularity?
You can create a staticmethod, like so:
class yo:
#staticmethod
def say_hi():
print "Hi there!"
Then, you can do this:
>>> yo.say_hi()
Hi there!
>>> a = yo()
>>> a.say_hi()
Hi there!
They can be used non-statically, and statically (if that makes sense).
About where to put your functions...
If a method is required by a class, and it is appropriate for the method to perform data that is specific to the class, then make it a method. This is what you would want:
class yo:
self.message = "Hello there!"
def say_message(self):
print self.message
My say_message relies on the data that is particular to the instance of a class.
If you feel the need to have a function, in addition to the class method, by all means go ahead. Use whichever one is more appropriate in your script. There are many examples of this, including in the python built-ins. Take generator objects for example:
a = my_new_generator()
a.next()
Can also be done as:
a = my_new_generator()
next(a)
Use whichever is more appropriate, and obviously whichever one is more readable. :)
If you can think or any reason to override this function one day, make it a staticmethod, else a plain function is just ok - FWIW, your class probably depends on much more than this simple function. And if you cannot think of any reason for anyone else to ever use this function, keep it in the same module as your class.
As a side note: "myClassName.py" is definitly unpythonic. First because module names should be all_lower, then because the one-module-per-class stuff is a nonsense in Python - we group related classes and functions (and exceptions and whatnots) together.
If the search method you are talking about is really so specific and you will never need to reuse it somewhere else, I do not see any reason to make it static. The fact that it doesn't require access to instance variables doesn't make it static by definition.
If there is a possibility, that this method is going to be reused, refactor it into a helper/utility class (no static again).
ADDED:
Just wanted to add, that when you consider something being static or not, think about how method name relates to the class name. Does this method name makes more sense when used in class context or object context?
I have my pseudo Interface, which I implement several times. Each implementation is supposed to store a variable that basically defines a path to a file (a template). Because these classes are produced by a factory, I don't know which subclass is going to come up, therefore, I want to make it possible to access a class variable via an instance method.
This does not really pose a problem, however, when I inherit from my Interface, I don't want to implement the getHidden() method (in the following) several times. But calling it, the way it is written down there, will always yield hidden.
class MySuperInterface(object):
# class Variable
__much = "hidden"
# instance method
def getHidden(self):
print self.__class__.__much
class MySub(MySuperInterface):
__much = "this is different per subclass!"
def soTroublesome(self):
print self.__class__.__much
Execution
>>> sub = MySub() # I don't know this class!
>>> sub.getHidden()
hidden
>>> sub.soTroublesome()
this is different per subclass!
So, how can I implement getHidden() to access the instance's class' classvariable. I know, that the information is available, as I checked with dir(), but I have no idea how to access it.
Again, I don't want to end up with a class/static method, because I don't know the class that gets out of my factory!
Thanks!
Just don't use the "__" in the class variable name and you are set.
Some Python write ups and "documentation" say that the "__" prefix is the Python way to get
"private" members in a class. They are wrong.
The "__" prefix does exactly what is happening to you: ensure that the variable accessed in a method inside a class access the variable defined in that exact class (and not any of the classes that inherit from it).
The way it works is simple: at compile time, the names prefixed with "__" are "mangled", i.e. changed like this: __much -> _classname__much:
>>> class Hidden(object):
... __hidden = False
...
>>> dir(Hidden)
['_Hidden__hidden', '__class__', ...]
Therefore your getHidden method will always look for the __much variable thathas had its name actually changed to _MySuperInterface__much, while the variable you want has had its name changed to _MySub__much.
If you as much as use a single underscore "_" to mean by convention the variable should not be used outside of the class, your code would work as you expect.
You should definitely learn and try how python member access works. Use interactive Python shell (python in the command line), import your modules and experiment.
print self.__class__.__much
You don't typically need to access class members via __class__ when reading. Access them as normal members instead:
print self.__much
Python supports private members by name mangling, changing __much member of MySuperInterface class to _MySuperInterface__much. This is to avoid accessing the private member from child classes which is exactly what you are trying to do.
When you, for any reason, need to access it, you can simply use the mangled name:
print self._MySuperInterface__much
print self._MySub__much
You should typically avoid it and rethink your code instead. The easiest way to do that is to use a single underscore that still denotes internal implementation member but doesn't trigger the name mangling.
Python classes have no concept of public/private, so we are told to not touch something that starts with an underscore unless we created it. But does this not require complete knowledge of all classes from which we inherit, directly or indirectly? Witness:
class Base(object):
def __init__(self):
super(Base, self).__init__()
self._foo = 0
def foo(self):
return self._foo + 1
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self._foo = None
Sub().foo()
Expectedly, a TypeError is raised when None + 1 is evaluated. So I have to know that _foo exists in the base class. To get around this, __foo can be used instead, which solves the problem by mangling the name. This seems to be, if not elegant, an acceptable solution. However, what happens if Base inherits from a class (in a separate package) called Sub? Now __foo in my Sub overrides __foo in the grandparent Sub.
This implies that I have to know the entire inheritance chain, including all "private" objects each uses. The fact that Python is dynamically-typed makes this even harder, since there are no declarations to search for. The worst part, however, is probably the fact Base might inherit from object right now, but in some future release, it switches to inheriting from Sub. Clearly if I know Sub is inherited from, I can rename my class, however annoying that is. But I can't see into the future.
Is this not a case where a true private data type would prevent a problem? How, in Python, can I be sure that I'm not accidentally stepping on somebody's toes if those toes might spring into existence at some point in the future?
EDIT: I've apparently not made clear the primary question. I'm familiar with name mangling and the difference between a single and a double underscore. The question is: how do I deal with the fact that I might clash with classes whose existence I don't know of right now? If my parent class (which is in a package I did not write) happens to start inheriting from a class with the same name as my class, even name mangling won't help. Am I wrong in seeing this as a (corner) case that true private members would solve, but that Python has trouble with?
EDIT: As requested, the following is a full example:
File parent.py:
class Sub(object):
def __init__(self):
self.__foo = 12
def foo(self):
return self.__foo + 1
class Base(Sub):
pass
File sub.py:
import parent
class Sub(parent.Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
Sub().foo()
The grandparent's foo is called, but my __foo is used.
Obviously you wouldn't write code like this yourself, but parent could easily be provided by a third party, the details of which could change at any time.
Use private names (instead of protected ones), starting with a double underscore:
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
# ^^
will not conflict with _foo or __foo in Base. This is because Python replaces the double underscore with a single underscore and the name of the class; the following two lines are equivalent:
class Sub(Base):
def x(self):
self.__foo = None # .. is the same as ..
self._Sub__foo = None
(In response to the edit:) The chance that two classes in a class hierarchy not only have the same name, but that they are both using the same property name, and are both using the private mangled (__) form is so minuscule that it can be safely ignored in practice (I for one haven't heard of a single case so far).
In theory, however, you are correct in that in order to formally verify correctness of a program, one most know the entire inheritance chain. Luckily, formal verification usually requires a fixed set of libraries in any case.
This is in the spirit of the Zen of Python, which includes
practicality beats purity.
Name mangling includes the class so your Base.__foo and Sub.__foo will have different names. This was the entire reason for adding the name mangling feature to Python in the first place. One will be _Base__foo, the other _Sub__foo.
Many people prefer to use composition (has-a) instead of inheritance (is-a) for some of these very reasons.
This implies that I have to know the entire inheritance chain. . .
Yes, you should know the entire inheritance chain, or the docs for the object you are directly sub-classing should tell you what you need to know.
Subclassing is an advanced feature, and should be treated with care.
A good example of docs specifying what should be overridden in a subclass is the threading class:
This class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.
How often do you modify base classes in inheritance chains to introduce inheritance from a class with the same name as a subclass further down the chain???
Less flippantly, yes, you have to know the code you are working with. You certainly have to know the public names being used, after all. Python being python, discovering the public names in use by your ancestor classes takes pretty much the same effort as discovering the private ones.
In years of Python programming, I have never found this to be much of an issue in practice. When you're naming instance variables, you should have a pretty good idea whether (a) a name is generic enough that it's likely to be used in other contexts and (b) the class you're writing is likely to be involved in an inheritance hierarchy with other unknown classes. In such cases, you think a bit more carefully about the names you're using; self.value isn't a great idea for an attribute name, and neither is something like Adaptor a great class name.
In contrast, I have run into difficulties with the overuse of double-underscore names a number of times. Python being Python, even "private" names tend to be accessed by code defined outside the class. You might think that it would always be bad practice to let an external function access "private" attributes, but what about things like getattr and hasattr? The invocation of them can be in the class's own code, so the class is still controlling all access to the private attributes, but they still don't work without you doing the name-mangling manually. If Python had actually-enforced private variables you couldn't use functions like those on them at all. These days I tend to reserve double-underscore names for cases when I'm writing something very generic like a decorator, metaclass, or mixin that needs to add a "secret attribute" to the instances of the (unknown) classes it's applied to.
And of course there's the standard dynamic language argument: the reality is that you have to test your code thoroughly to have much justification in making the claim "my software works". Such testing will be very unlikely to miss the bugs caused by accidentally clashing names. If you are not doing that testing, then many more uncaught bugs will be introduced by other means than by accidental name clashes.
In summation, the lack of private variables is just not that big a deal in idiomatic Python code in practice, and the addition of true private variables would cause more frequent problems in other ways IMHO.
Mangling happens with double underscores. Single underscores are more of a "please don't".
You don't need to know all the details of all parent classes (note that deep inheritance is usually best avoided), because you can still dir() and help() and any other form of introspection you can come up with.
As noted, you can use name mangling. However, you can stick with a single underscore (or none!) if you document your code adequately - you should not have so many private variables that this proves to be a problem. Just say if a method relies on a private variable, and add either the variable, or the name of the method to the class docstring to alert users.
Further, if you create unit tests, you should create tests that check invariants on members, and accordingly these should be able to show up such name clashes.
If you really want to have "private" variables, and for whatever reason name-mangling doesn't meet your needs, you can factor your private state into another object:
class Foo(object):
class Stateholder(object): pass
def __init__(self):
self._state = Stateholder()
self.state.private = 1