Python provides private name mangling for class methods and attributes.
Are there any concrete cases where this feature is required, or is it just a carry over from Java and C++?
Please describe a use case where Python name mangling should be used, if any?
Also, I'm not interested in the case where the author is merely trying to prevent accidental external attribute access. I believe this use case is not aligned with the Python programming model.
It's partly to prevent accidental internal attribute access. Here's an example:
In your code, which is a library:
class YourClass:
def __init__(self):
self.__thing = 1 # Your private member, not part of your API
In my code, in which I'm inheriting from your library class:
class MyClass(YourClass):
def __init__(self):
# ...
self.__thing = "My thing" # My private member; the name is a coincidence
Without private name mangling, my accidental reuse of your name would break your library.
From PEP 8:
If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python's name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.
(Emphasis added)
All previous answers are correct but here is another reason with an example. Name Mangling is needed in python because to avoid problems that could be caused by overriding attributes. In other words, in order to override, the Python interpreter has to be able to build distinct id for child method versus parent method and using __ (double underscore) enable python to do this. In below example, without __help this code would not work.
class Parent:
def __init__(self):
self.__help("will take child to school")
def help(self, activities):
print("parent",activities)
__help = help # private copy of original help() method
class Child(Parent):
def help(self, activities, days): # notice this has 3 arguments and overrides the Parent.help()
self.activities = activities
self.days = days
print ("child will do",self.activities, self.days)
# the goal was to extend and override the Parent class to list the child activities too
print ("list parent & child responsibilities")
c = Child()
c.help("laundry","Saturdays")
The name mangling is there to prevent accidental external attribute access. Mostly, it's there to make sure that there are no name clashes.
Related
I would like to have a function in my class, which I am going to use only inside methods of this class. I will not call it outside the implementations of these methods. In C++, I would use a method declared in the private section of the class. What is the best way to implement such a function in Python?
I am thinking of using a static decorator for this case. Can I use a function without any decorators and the self word?
Python doesn't have the concept of private methods or attributes. It's all about how you implement your class. But you can use pseudo-private variables (name mangling); any variable preceded by __(two underscores) becomes a pseudo-private variable.
From the documentation:
Since there is a valid use-case for class-private members (namely to
avoid name clashes of names with names defined by subclasses), there
is limited support for such a mechanism, called name mangling. Any
identifier of the form __spam (at least two leading underscores, at
most one trailing underscore) is textually replaced with
_classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard
to the syntactic position of the identifier, as long as it occurs
within the definition of a class.
class A:
def __private(self):
pass
So __private now actually becomes _A__private.
Example of a static method:
>>> class A:
... #staticmethod # Not required in Python 3.x
... def __private():
... print 'hello'
...
>>> A._A__private()
hello
Python doesn't have the concept of 'private' the way many other languages do. It is built on the consenting adult principle that says that users of your code will use it responsibly. By convention, attributes starting with a single or double leading underscore will be treated as part of the internal implementation, but they are not actually hidden from users. Double underscore will cause name mangling of the attribute name though.
Also, note that self is only special by convention, not by any feature of the language. Instance methods, when called as members of an instance, are implicitly passed the instance as a first argument, but in the implementation of the method itself, that argument can technically be named any arbitrary thing you want. self is just the convention for ease of understanding code. As a result, not including self in the signature of a method has no actual functional effect other than causing the implicit instance argument to be assigned to the next variable name in the signature.
This is of course different for class methods, which receive the instance of the class object itself as an implicit first argument, and static methods, which receive no implicit arguments at all.
Python just doesn't do private. If you like you can follow convention and precede the name with a single underscore, but it's up to other coders to respect that in a gentlemanly† fashion
† or gentlewomanly
There is plenty of great stuff here with obfuscation using leading underscores. Personally, I benefit greatly from the language design decision to make everything public as it reduces the time it takes to understand and use new modules.
However, if you're determined to implement private attributes/methods and you're willing to be unpythonic, you could do something along the lines of:
from pprint import pprint
# CamelCase because it 'acts' like a class
def SneakyCounter():
class SneakyCounterInternal(object):
def __init__(self):
self.counter = 0
def add_two(self):
self.increment()
self.increment()
def increment(self):
self.counter += 1
def reset(self):
print 'count prior to reset: {}'.format(self.counter)
self.counter = 0
sneaky_counter = SneakyCounterInternal()
class SneakyCounterExternal(object):
def add_two(self):
sneaky_counter.add_two()
def reset(self):
sneaky_counter.reset()
return SneakyCounterExternal()
# counter attribute is not accessible from out here
sneaky_counter = SneakyCounter()
sneaky_counter.add_two()
sneaky_counter.add_two()
sneaky_counter.reset()
# `increment` and `counter` not exposed (AFAIK)
pprint(dir(sneaky_counter))
It is hard to imagine a case where you'd want to do this, but it is possible.
You just don't do it:
The Pythonic way is to not document those methods/members using docstrings, only with "real" code comments. And the convention is to append a single or a double underscore to them;
Then you can use double underscores in front of your member, so they are made local to the class (it's mostly name mangling, i.e., the real name of the member outside of the class becomes: instance.__classname_membername). It's useful to avoid conflicts when using inheritance, or create a "private space" between children of a class.
As far as I can tell, it is possible to "hide" variables using metaclasses, but that violates the whole philosophy of Python, so I won't go into details about that.
I know, there are no 'real' private/protected methods in Python. This approach isn't meant to hide anything; I just want to understand what Python does.
class Parent(object):
def _protected(self):
pass
def __private(self):
pass
class Child(Parent):
def foo(self):
self._protected() # This works
def bar(self):
self.__private() # This doesn't work, I get a AttributeError:
# 'Child' object has no attribute '_Child__private'
So, does this behaviour mean, that 'protected' methods will be inherited but 'private' won't at all?
Or did I miss anything?
Python has no privacy model, there are no access modifiers like in C++, C# or Java. There are no truly 'protected' or 'private' attributes.
Names with a leading double underscore and no trailing double underscore are mangled to protect them from clashes when inherited. Subclasses can define their own __private() method and these will not interfere with the same name on the parent class. Such names are considered class private; they are still accessible from outside the class but are far less likely to accidentally clash.
Mangling is done by prepending any such name with an extra underscore and the class name (regardless of how the name is used or if it exists), effectively giving them a namespace. In the Parent class, any __private identifier is replaced (at compilation time) by the name _Parent__private, while in the Child class the identifier is replaced by _Child__private, everywhere in the class definition.
The following will work:
class Child(Parent):
def foo(self):
self._protected()
def bar(self):
self._Parent__private()
See Reserved classes of identifiers in the lexical analysis documentation:
__*
Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to help avoid name clashes between “private” attributes of base and derived classes.
and the referenced documentation on names:
Private name mangling: When an identifier that textually occurs in a class definition begins with two or more underscore characters and does not end in two or more underscores, it is considered a private name of that class. Private names are transformed to a longer form before code is generated for them. The transformation inserts the class name, with leading underscores removed and a single underscore inserted, in front of the name. For example, the identifier __spam occurring in a class named Ham will be transformed to _Ham__spam. This transformation is independent of the syntactical context in which the identifier is used.
Don't use class-private names unless you specifically want to avoid having to tell developers that want to subclass your class that they can't use certain names or risk breaking your class. Outside of published frameworks and libraries, there is little use for this feature.
The PEP 8 Python Style Guide has this to say about private name mangling:
If your class is intended to be subclassed, and you have attributes
that you do not want subclasses to use, consider naming them with
double leading underscores and no trailing underscores. This invokes
Python's name mangling algorithm, where the name of the class is
mangled into the attribute name. This helps avoid attribute name
collisions should subclasses inadvertently contain attributes with the
same name.
Note 1: Note that only the simple class name is used in the mangled
name, so if a subclass chooses both the same class name and attribute
name, you can still get name collisions.
Note 2: Name mangling can make certain uses, such as debugging and
__getattr__(), less convenient. However the name mangling algorithm
is well documented and easy to perform manually.
Note 3: Not everyone likes name mangling. Try to balance the need to
avoid accidental name clashes with potential use by advanced callers.
The double __ attribute is changed to _ClassName__method_name which makes it more private than the semantic privacy implied by _method_name.
You can technically still get at it if you'd really like to, but presumably no one is going to do that, so for maintenance of code abstraction reasons, the method might as well be private at that point.
class Parent(object):
def _protected(self):
pass
def __private(self):
print("Is it really private?")
class Child(Parent):
def foo(self):
self._protected()
def bar(self):
self.__private()
c = Child()
c._Parent__private()
This has the additional upside (or some would say primary upside) of allowing a method to not collide with child class method names.
By declaring your data member private :
__private()
you simply can't access it from outside the class
Python supports a technique called name mangling.
This feature turns class member prefixed with two underscores into:
_className.memberName
if you want to access it from Child() you can use: self._Parent__private()
Also PEP8 says
Use one leading underscore only for non-public methods and instance
variables.
To avoid name clashes with subclasses, use two leading underscores to
invoke Python's name mangling rules.
Python mangles these names with the class name: if class Foo has an
attribute named __a, it cannot be accessed by Foo.__a. (An insistent
user could still gain access by calling Foo._Foo__a.) Generally,
double leading underscores should be used only to avoid name conflicts
with attributes in classes designed to be subclassed.
You should stay away from _such_methods too, by convention. I mean you should treat them as private
Although this is an old question, I encountered it and found a nice workaround.
In the case you name mangled on the parent class because you wanted to mimic a protected function, but still wanted to access the function in an easy manner on the child class.
parent_class_private_func_list = [func for func in dir(Child) if func.startswith ('_Parent__')]
for parent_private_func in parent_class_private_func_list:
setattr(self, parent_private_func.replace("_Parent__", "_Child"), getattr(self, parent_private_func))
The idea is manually replacing the parents function name into one fitting to the current namespace.
After adding this in the init function of the child class, you can call the function in an easy manner.
self.__private()
AFAIK, in the second case Python perform "name mangling", so the name of the __private method of the parent class is really:
_Parent__private
And you cannot use it in child in this form neither
Basically I have a process class that takes in 3 variables. process as in the name, time and io.
I put that process into a ready queue then in another class I have put the first object from readyqueue into self.__ current. Usually when I put the object into a variable eg. current = process('hi', 6, False). I can say current._process.__time and it returns >>> 6.
Can anyone explain how i can do this with self.__current.
class Process:
def __init__(self,process,time,io = False):
self.__process = process
self.__time = time
self.__io = io
class cpu(Queue):
def __init__(self):
self.timequantum = 1
self.__current = []
The issue you have is because you're using attribute names that start (and don't end) with two underscores. When you do that in a method, it invokes Python's "name mangling" feature, which transforms a name like __process into _Process__process. That makes it awkward to access from other code.
The purpose of the name mangling system is to allow mixin classes to assign attributes on objects that may have any number of other attributes. Since the mangling adds the name of the class where the attribute is being used (not the name of class of the object), the mixin-author can be fairly confident that no other class will accidentally use the same attribute name.
You probably don't want name mangling, so you should not use __ at the start of your attribute names. If you want the attributes to be "private", use a single underscore, which indicates that the attribute is not part of the public API of the class. This is a convention, so for debugging or testing you can still access the _ prefixed attributes if you need to.
Or if you want other code to use the attributes, just use bare names. In Python, it's perfectly acceptable to have public attributes as part of an object's API. (Other programming languages discourage that, but in Python you can reimplement an attribute-API with getter and setter functions using a property later if you need to.)
Python classes have no concept of public/private, so we are told to not touch something that starts with an underscore unless we created it. But does this not require complete knowledge of all classes from which we inherit, directly or indirectly? Witness:
class Base(object):
def __init__(self):
super(Base, self).__init__()
self._foo = 0
def foo(self):
return self._foo + 1
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self._foo = None
Sub().foo()
Expectedly, a TypeError is raised when None + 1 is evaluated. So I have to know that _foo exists in the base class. To get around this, __foo can be used instead, which solves the problem by mangling the name. This seems to be, if not elegant, an acceptable solution. However, what happens if Base inherits from a class (in a separate package) called Sub? Now __foo in my Sub overrides __foo in the grandparent Sub.
This implies that I have to know the entire inheritance chain, including all "private" objects each uses. The fact that Python is dynamically-typed makes this even harder, since there are no declarations to search for. The worst part, however, is probably the fact Base might inherit from object right now, but in some future release, it switches to inheriting from Sub. Clearly if I know Sub is inherited from, I can rename my class, however annoying that is. But I can't see into the future.
Is this not a case where a true private data type would prevent a problem? How, in Python, can I be sure that I'm not accidentally stepping on somebody's toes if those toes might spring into existence at some point in the future?
EDIT: I've apparently not made clear the primary question. I'm familiar with name mangling and the difference between a single and a double underscore. The question is: how do I deal with the fact that I might clash with classes whose existence I don't know of right now? If my parent class (which is in a package I did not write) happens to start inheriting from a class with the same name as my class, even name mangling won't help. Am I wrong in seeing this as a (corner) case that true private members would solve, but that Python has trouble with?
EDIT: As requested, the following is a full example:
File parent.py:
class Sub(object):
def __init__(self):
self.__foo = 12
def foo(self):
return self.__foo + 1
class Base(Sub):
pass
File sub.py:
import parent
class Sub(parent.Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
Sub().foo()
The grandparent's foo is called, but my __foo is used.
Obviously you wouldn't write code like this yourself, but parent could easily be provided by a third party, the details of which could change at any time.
Use private names (instead of protected ones), starting with a double underscore:
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
# ^^
will not conflict with _foo or __foo in Base. This is because Python replaces the double underscore with a single underscore and the name of the class; the following two lines are equivalent:
class Sub(Base):
def x(self):
self.__foo = None # .. is the same as ..
self._Sub__foo = None
(In response to the edit:) The chance that two classes in a class hierarchy not only have the same name, but that they are both using the same property name, and are both using the private mangled (__) form is so minuscule that it can be safely ignored in practice (I for one haven't heard of a single case so far).
In theory, however, you are correct in that in order to formally verify correctness of a program, one most know the entire inheritance chain. Luckily, formal verification usually requires a fixed set of libraries in any case.
This is in the spirit of the Zen of Python, which includes
practicality beats purity.
Name mangling includes the class so your Base.__foo and Sub.__foo will have different names. This was the entire reason for adding the name mangling feature to Python in the first place. One will be _Base__foo, the other _Sub__foo.
Many people prefer to use composition (has-a) instead of inheritance (is-a) for some of these very reasons.
This implies that I have to know the entire inheritance chain. . .
Yes, you should know the entire inheritance chain, or the docs for the object you are directly sub-classing should tell you what you need to know.
Subclassing is an advanced feature, and should be treated with care.
A good example of docs specifying what should be overridden in a subclass is the threading class:
This class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.
How often do you modify base classes in inheritance chains to introduce inheritance from a class with the same name as a subclass further down the chain???
Less flippantly, yes, you have to know the code you are working with. You certainly have to know the public names being used, after all. Python being python, discovering the public names in use by your ancestor classes takes pretty much the same effort as discovering the private ones.
In years of Python programming, I have never found this to be much of an issue in practice. When you're naming instance variables, you should have a pretty good idea whether (a) a name is generic enough that it's likely to be used in other contexts and (b) the class you're writing is likely to be involved in an inheritance hierarchy with other unknown classes. In such cases, you think a bit more carefully about the names you're using; self.value isn't a great idea for an attribute name, and neither is something like Adaptor a great class name.
In contrast, I have run into difficulties with the overuse of double-underscore names a number of times. Python being Python, even "private" names tend to be accessed by code defined outside the class. You might think that it would always be bad practice to let an external function access "private" attributes, but what about things like getattr and hasattr? The invocation of them can be in the class's own code, so the class is still controlling all access to the private attributes, but they still don't work without you doing the name-mangling manually. If Python had actually-enforced private variables you couldn't use functions like those on them at all. These days I tend to reserve double-underscore names for cases when I'm writing something very generic like a decorator, metaclass, or mixin that needs to add a "secret attribute" to the instances of the (unknown) classes it's applied to.
And of course there's the standard dynamic language argument: the reality is that you have to test your code thoroughly to have much justification in making the claim "my software works". Such testing will be very unlikely to miss the bugs caused by accidentally clashing names. If you are not doing that testing, then many more uncaught bugs will be introduced by other means than by accidental name clashes.
In summation, the lack of private variables is just not that big a deal in idiomatic Python code in practice, and the addition of true private variables would cause more frequent problems in other ways IMHO.
Mangling happens with double underscores. Single underscores are more of a "please don't".
You don't need to know all the details of all parent classes (note that deep inheritance is usually best avoided), because you can still dir() and help() and any other form of introspection you can come up with.
As noted, you can use name mangling. However, you can stick with a single underscore (or none!) if you document your code adequately - you should not have so many private variables that this proves to be a problem. Just say if a method relies on a private variable, and add either the variable, or the name of the method to the class docstring to alert users.
Further, if you create unit tests, you should create tests that check invariants on members, and accordingly these should be able to show up such name clashes.
If you really want to have "private" variables, and for whatever reason name-mangling doesn't meet your needs, you can factor your private state into another object:
class Foo(object):
class Stateholder(object): pass
def __init__(self):
self._state = Stateholder()
self.state.private = 1
Python provides private name mangling for class methods and attributes.
Are there any concrete cases where this feature is required, or is it just a carry over from Java and C++?
Please describe a use case where Python name mangling should be used, if any?
Also, I'm not interested in the case where the author is merely trying to prevent accidental external attribute access. I believe this use case is not aligned with the Python programming model.
It's partly to prevent accidental internal attribute access. Here's an example:
In your code, which is a library:
class YourClass:
def __init__(self):
self.__thing = 1 # Your private member, not part of your API
In my code, in which I'm inheriting from your library class:
class MyClass(YourClass):
def __init__(self):
# ...
self.__thing = "My thing" # My private member; the name is a coincidence
Without private name mangling, my accidental reuse of your name would break your library.
From PEP 8:
If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python's name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.
(Emphasis added)
All previous answers are correct but here is another reason with an example. Name Mangling is needed in python because to avoid problems that could be caused by overriding attributes. In other words, in order to override, the Python interpreter has to be able to build distinct id for child method versus parent method and using __ (double underscore) enable python to do this. In below example, without __help this code would not work.
class Parent:
def __init__(self):
self.__help("will take child to school")
def help(self, activities):
print("parent",activities)
__help = help # private copy of original help() method
class Child(Parent):
def help(self, activities, days): # notice this has 3 arguments and overrides the Parent.help()
self.activities = activities
self.days = days
print ("child will do",self.activities, self.days)
# the goal was to extend and override the Parent class to list the child activities too
print ("list parent & child responsibilities")
c = Child()
c.help("laundry","Saturdays")
The name mangling is there to prevent accidental external attribute access. Mostly, it's there to make sure that there are no name clashes.