single underscore vs double underscore encapsulation in python

single underscore vs double underscore encapsulation in python - python

I see that with help of underscore, one can declare private members in class but with one score, it is still accessed in main but with two it is not. If two makes the variable private then why there is single score one? What is the use/purpose of single underscore variable?
class Temp:
def __init__(self):
self.a = 123
self._b = 123
self.__c = 123
obj = Temp()
print(obj.a)
print(obj._b)
print(obj.__c)

Here's why that's the "standard," a little different from other languages.
No underscore indicates it's a public thing that users of that class can touch/modify/use
One underscore is more of an implementation detail that usually (note the term usually) should only be referenced/used in sub-classes or if you know what you're doing. The beautiful thing about python is that we're all adults here and if someone wants to access something for some really custom thing then they should be able to.
Two underscores is name mangled to include the classname like so _Temp__c behind the scenes to prevent your variables clashing with a subclass. However, I would stay away from defaulting to two because it's not a great habit and is generally unnecessary. There are arguments and other posts about it that you can read up on like this
Note: there is no difference to variables/methods that either have an underscore or not. It's just a convention for classes that's not enforced but rather accepted by the community to be private.
Note #2: There is an exception described by Matthias for non-class methods

In Python, there is no existence of “private” instance variables that cannot be accessed except inside an object.
However, a convention is being followed by most Python code and coders i.e., a name prefixed with an underscore, For e.g. _xyz should be treated as a non-public part of the API or any Python code, whether it is a function, a method, or a data member

Related

Why is changing class variables considered a bad practice? [duplicate]

I would like to have a function in my class, which I am going to use only inside methods of this class. I will not call it outside the implementations of these methods. In C++, I would use a method declared in the private section of the class. What is the best way to implement such a function in Python?
I am thinking of using a static decorator for this case. Can I use a function without any decorators and the self word?

Python doesn't have the concept of private methods or attributes. It's all about how you implement your class. But you can use pseudo-private variables (name mangling); any variable preceded by __(two underscores) becomes a pseudo-private variable.
From the documentation:
Since there is a valid use-case for class-private members (namely to
avoid name clashes of names with names defined by subclasses), there
is limited support for such a mechanism, called name mangling. Any
identifier of the form __spam (at least two leading underscores, at
most one trailing underscore) is textually replaced with
_classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard
to the syntactic position of the identifier, as long as it occurs
within the definition of a class.
class A:
def __private(self):
pass
So __private now actually becomes _A__private.
Example of a static method:
>>> class A:
... #staticmethod # Not required in Python 3.x
... def __private():
... print 'hello'
...
>>> A._A__private()
hello

Python doesn't have the concept of 'private' the way many other languages do. It is built on the consenting adult principle that says that users of your code will use it responsibly. By convention, attributes starting with a single or double leading underscore will be treated as part of the internal implementation, but they are not actually hidden from users. Double underscore will cause name mangling of the attribute name though.
Also, note that self is only special by convention, not by any feature of the language. Instance methods, when called as members of an instance, are implicitly passed the instance as a first argument, but in the implementation of the method itself, that argument can technically be named any arbitrary thing you want. self is just the convention for ease of understanding code. As a result, not including self in the signature of a method has no actual functional effect other than causing the implicit instance argument to be assigned to the next variable name in the signature.
This is of course different for class methods, which receive the instance of the class object itself as an implicit first argument, and static methods, which receive no implicit arguments at all.

Python just doesn't do private. If you like you can follow convention and precede the name with a single underscore, but it's up to other coders to respect that in a gentlemanly† fashion
† or gentlewomanly

There is plenty of great stuff here with obfuscation using leading underscores. Personally, I benefit greatly from the language design decision to make everything public as it reduces the time it takes to understand and use new modules.
However, if you're determined to implement private attributes/methods and you're willing to be unpythonic, you could do something along the lines of:
from pprint import pprint
# CamelCase because it 'acts' like a class
def SneakyCounter():
class SneakyCounterInternal(object):
def __init__(self):
self.counter = 0
def add_two(self):
self.increment()
self.increment()
def increment(self):
self.counter += 1
def reset(self):
print 'count prior to reset: {}'.format(self.counter)
self.counter = 0
sneaky_counter = SneakyCounterInternal()
class SneakyCounterExternal(object):
def add_two(self):
sneaky_counter.add_two()
def reset(self):
sneaky_counter.reset()
return SneakyCounterExternal()
# counter attribute is not accessible from out here
sneaky_counter = SneakyCounter()
sneaky_counter.add_two()
sneaky_counter.add_two()
sneaky_counter.reset()
# `increment` and `counter` not exposed (AFAIK)
pprint(dir(sneaky_counter))
It is hard to imagine a case where you'd want to do this, but it is possible.

You just don't do it:
The Pythonic way is to not document those methods/members using docstrings, only with "real" code comments. And the convention is to append a single or a double underscore to them;
Then you can use double underscores in front of your member, so they are made local to the class (it's mostly name mangling, i.e., the real name of the member outside of the class becomes: instance.__classname_membername). It's useful to avoid conflicts when using inheritance, or create a "private space" between children of a class.
As far as I can tell, it is possible to "hide" variables using metaclasses, but that violates the whole philosophy of Python, so I won't go into details about that.

Are variables / methods with names beginning with an underscore internal or protected?

Consider the following snippet:
class MyClass:
def __init__(self, i):
self._i = i
def _print(self):
print(self._i)
my_obj = MyClass(5)
print(my_obj._i)
my_obj._print()
Is MyClass._i and MyClass._print considered a protected or internal variable?
According to official Python docs, Classes § Private Variables:
there is a convention that is followed by most Python code: a name
prefixed with an underscore (e.g. _spam) should be treated as a
non-public part of the API (whether it is a function, a method or a
data member). It should be considered an implementation detail and
subject to change without notice.
So, the way I understand these docs, preceding a name with an underscore means "internal": intended to be used by the package it is defined in and not outside of this package. And the above snippet is, therefore, correct.
However: PyCharm issues warnings if I open the above snippet:
Access to a protected member _i of a class
Access to a protected member _print of a class
Why "protected"?
Am I missing something? Is there a convention that I'm not aware of that requires me to take extra steps to distinguish between protected and internal?

You said in the question:
So, the way I understand these docs, preceding a name with an underscore means "internal": intended to be used by the package it is defined in and not outside of this package.
Usually "internal" or "protected" attributes or methods of a class are intended to be used by that class only, not even outside the class in the whole package/module.
Module attributes and functions: yes, those are intended to be used inside the module only.
That is why you are getting the warnings; the convention is even more restrictive than you seem to think.
The script will work regardless, "protected" or "internal" variables and methods are just a convention.

Does Python require intimate knowledge of all classes in the inheritance chain?

Python classes have no concept of public/private, so we are told to not touch something that starts with an underscore unless we created it. But does this not require complete knowledge of all classes from which we inherit, directly or indirectly? Witness:
class Base(object):
def __init__(self):
super(Base, self).__init__()
self._foo = 0
def foo(self):
return self._foo + 1
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self._foo = None
Sub().foo()
Expectedly, a TypeError is raised when None + 1 is evaluated. So I have to know that _foo exists in the base class. To get around this, __foo can be used instead, which solves the problem by mangling the name. This seems to be, if not elegant, an acceptable solution. However, what happens if Base inherits from a class (in a separate package) called Sub? Now __foo in my Sub overrides __foo in the grandparent Sub.
This implies that I have to know the entire inheritance chain, including all "private" objects each uses. The fact that Python is dynamically-typed makes this even harder, since there are no declarations to search for. The worst part, however, is probably the fact Base might inherit from object right now, but in some future release, it switches to inheriting from Sub. Clearly if I know Sub is inherited from, I can rename my class, however annoying that is. But I can't see into the future.
Is this not a case where a true private data type would prevent a problem? How, in Python, can I be sure that I'm not accidentally stepping on somebody's toes if those toes might spring into existence at some point in the future?
EDIT: I've apparently not made clear the primary question. I'm familiar with name mangling and the difference between a single and a double underscore. The question is: how do I deal with the fact that I might clash with classes whose existence I don't know of right now? If my parent class (which is in a package I did not write) happens to start inheriting from a class with the same name as my class, even name mangling won't help. Am I wrong in seeing this as a (corner) case that true private members would solve, but that Python has trouble with?
EDIT: As requested, the following is a full example:
File parent.py:
class Sub(object):
def __init__(self):
self.__foo = 12
def foo(self):
return self.__foo + 1
class Base(Sub):
pass
File sub.py:
import parent
class Sub(parent.Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
Sub().foo()
The grandparent's foo is called, but my __foo is used.
Obviously you wouldn't write code like this yourself, but parent could easily be provided by a third party, the details of which could change at any time.

Use private names (instead of protected ones), starting with a double underscore:
class Sub(Base):
def __init__(self):
super(Sub, self).__init__()
self.__foo = None
# ^^
will not conflict with _foo or __foo in Base. This is because Python replaces the double underscore with a single underscore and the name of the class; the following two lines are equivalent:
class Sub(Base):
def x(self):
self.__foo = None # .. is the same as ..
self._Sub__foo = None
(In response to the edit:) The chance that two classes in a class hierarchy not only have the same name, but that they are both using the same property name, and are both using the private mangled (__) form is so minuscule that it can be safely ignored in practice (I for one haven't heard of a single case so far).
In theory, however, you are correct in that in order to formally verify correctness of a program, one most know the entire inheritance chain. Luckily, formal verification usually requires a fixed set of libraries in any case.
This is in the spirit of the Zen of Python, which includes
practicality beats purity.

Name mangling includes the class so your Base.__foo and Sub.__foo will have different names. This was the entire reason for adding the name mangling feature to Python in the first place. One will be _Base__foo, the other _Sub__foo.
Many people prefer to use composition (has-a) instead of inheritance (is-a) for some of these very reasons.

This implies that I have to know the entire inheritance chain. . .
Yes, you should know the entire inheritance chain, or the docs for the object you are directly sub-classing should tell you what you need to know.
Subclassing is an advanced feature, and should be treated with care.
A good example of docs specifying what should be overridden in a subclass is the threading class:
This class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, only override the __init__() and run() methods of this class.

How often do you modify base classes in inheritance chains to introduce inheritance from a class with the same name as a subclass further down the chain???
Less flippantly, yes, you have to know the code you are working with. You certainly have to know the public names being used, after all. Python being python, discovering the public names in use by your ancestor classes takes pretty much the same effort as discovering the private ones.
In years of Python programming, I have never found this to be much of an issue in practice. When you're naming instance variables, you should have a pretty good idea whether (a) a name is generic enough that it's likely to be used in other contexts and (b) the class you're writing is likely to be involved in an inheritance hierarchy with other unknown classes. In such cases, you think a bit more carefully about the names you're using; self.value isn't a great idea for an attribute name, and neither is something like Adaptor a great class name.
In contrast, I have run into difficulties with the overuse of double-underscore names a number of times. Python being Python, even "private" names tend to be accessed by code defined outside the class. You might think that it would always be bad practice to let an external function access "private" attributes, but what about things like getattr and hasattr? The invocation of them can be in the class's own code, so the class is still controlling all access to the private attributes, but they still don't work without you doing the name-mangling manually. If Python had actually-enforced private variables you couldn't use functions like those on them at all. These days I tend to reserve double-underscore names for cases when I'm writing something very generic like a decorator, metaclass, or mixin that needs to add a "secret attribute" to the instances of the (unknown) classes it's applied to.
And of course there's the standard dynamic language argument: the reality is that you have to test your code thoroughly to have much justification in making the claim "my software works". Such testing will be very unlikely to miss the bugs caused by accidentally clashing names. If you are not doing that testing, then many more uncaught bugs will be introduced by other means than by accidental name clashes.
In summation, the lack of private variables is just not that big a deal in idiomatic Python code in practice, and the addition of true private variables would cause more frequent problems in other ways IMHO.

Mangling happens with double underscores. Single underscores are more of a "please don't".
You don't need to know all the details of all parent classes (note that deep inheritance is usually best avoided), because you can still dir() and help() and any other form of introspection you can come up with.

As noted, you can use name mangling. However, you can stick with a single underscore (or none!) if you document your code adequately - you should not have so many private variables that this proves to be a problem. Just say if a method relies on a private variable, and add either the variable, or the name of the method to the class docstring to alert users.
Further, if you create unit tests, you should create tests that check invariants on members, and accordingly these should be able to show up such name clashes.
If you really want to have "private" variables, and for whatever reason name-mangling doesn't meet your needs, you can factor your private state into another object:
class Foo(object):
class Stateholder(object): pass
def __init__(self):
self._state = Stateholder()
self.state.private = 1

epydoc hide some class functions?

I have some methods in my class which are only meant to be used by other methods of the class. I've prefixed their names with '_'. Can I hide those functions from epydoc? Is it a good idea?
Should I use '_' or double underscore? To be honest I didn't get the difference after reading about them in some places. Should this naming convention be used only on module/class (instance) functions? Or also variables?

If you want to hide all private methods and private variables, pass option '--no-private' to epydoc.
Note that - for epydoc - a method or variable is private if:
its name starts with an underscore '_' and
its name does not end with an underscore '_' and
you did not include its name in the special all dictionary.
Alternatively you can use the 'undocumented' tag to force epydoc to completely ignore certain methods or variables.
For instance (and here I assume a ReStructured Text kind of formatting):
class MyClass:
"""Some neat description
:undocumented: x
"""
def _y(self): pass
def x(self): pass
def z(self): pass
will result in the documentation to contain only _y (unless you used the '--no-private' option) and z. There will be nothing about x even if it is not private.
Whether private methods should be visible at all or not in the final documentation is a matter of taste. To me, documentation is read by people who don't or should not have any interest in the internal implementation. Private methods are best hidden completely.

What's the difference between _b and b in this place

class a:
def __init__(self):
self._b()#why here use _b,not b,What's the difference
self._c='cccc'#why here use _c,not c,What's the difference
def _b():
print 'bbbb'
a.py
class a:
def __init__(self):
self._b()#why here use _b,not b,What's the difference
self._c='cccc'#why here use _c,not c,What's the difference
def _b(self):
print 'bbbb'
b.py
from a import *
b=a()
b._b()
print b._c
it print
bbbb
bbbb
bbbb
bbbb
cccc
Why can print out these, aren't _b and _c private variables.

Prefixing a variable or function name with an underscore is a convention in Python to indicate that the variable is private. From the docs:
Private” instance variables that cannot be accessed except from inside an object, don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.

Per pep8 (http://www.python.org/dev/peps/pep-0008/), a single underscore is intended to denote a "generally private" method or attribute. The interpreter itself has no interaction with the single underscore, its more of a convention. The double underscore OTOH has significance within the interpreter.

Basically, the idea is that in Python convention, there are three levels of encapsulation/hiding.
Public attributes and methods: These are meant to be accessed by other classes without restriction.
Private attributes and methods (prefixed with an underscore): These are meant to be hidden from the outside world. The underscore serves as something like a warning to say: you shouldn't be touching this method unless you really know what you're doing. Primarily, this is used to differentiate between end users (the lesser mortals) and other developers.
There is a third level called mangled. Any attributes/methods that are mangled are basically not to be touched by anyone who is not the author of that code. It is important to the core functioning of the program and should not be touched, because if it is misused, then it may lead to unwanted/unplanned behavior.
In your case, the '_b' method is a private method. The author of this code wants to let you know that this is not meant for public use, rather if you are writing a wrapper around this class or something of that nature (basically if you are developer using this class), then you may use that method. Otherwise, it is better that you don't.
Also, a single underscore in front of a variable(or method) ties it to that class. So when a._b is called, the interpreter looks for it in the definition of the class a.

Difference between single and double underscore:
class Foo(object):
def regular_method(self):
print 'ok'
def _soft_private(self):
print 'ok'
def __mangled_private(self):
print 'ok'
f = Foo()
f.regular_method()
# prints ok
f._soft_private()
# print ok
f.__mangled_private()
# error! no such attribute. BUT:
f._Foo__mangled_private()
# prints ok
So in essence there is no real "privacy" in python, but convention of using single underscore to warn developer using your API that he's doing so at his own risk.

You'll notice that b isn't defined anywhere. While _b is.
Unless you're asking why that naming convention is being used, in which case, you should ask the author of that code.

The underscore is simply used to denote that the variable is private.

The class object a has an attribute _b which is found when asking for self._b. The same class object a has no attribute b.
_b and b are as different as beehive and zulu.
You will get a parameter error when calling self._b because Python will implicitly pass self as the first argument to a bound method. The signature should be:
def _b(self):
print('bbbb')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

single underscore vs double underscore encapsulation in python - python

Related

Why is changing class variables considered a bad practice? [duplicate]

Are variables / methods with names beginning with an underscore internal or protected?

Does Python require intimate knowledge of all classes in the inheritance chain?

epydoc hide some class functions?

What's the difference between _b and b in this place

Categories

Resources