Referring to Python PEP8:
__double_leading_and_trailing_underscore__ : "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__,
__import__ or __file__. Never invent such names; only use them as documented.
I browsed through many questions related to the use of underscores in Python and I think I have understood the answers to most of them (things like private attributes, name mangling, etc.). I think I have also understood the aforementioned use of double leading and trailing underscores. I guess it's for protecting functions like __init__ which are similar to constructors in languages like C++ and Java.
But then shouldn't it be called community-controlled namespaces (by community I mean the Python community)? What does the author mean when he says user-controlled namespaces? In fact it seems the intent is the opposite: users should not (normally) trifle with these namespaces.
User-controlled namespaces means namespaces where a user, programming in Python, controls what names exist and what values they have. In other words, basically user-created APIs. What it means is that you shouldn't design an API that relies on new __doubleunderscore_names__ that you make up.
"Namespace" here does not refer to the naming convention but to the actual programming scope. For instance, each function has a local namespace for its local variables; a module has a global namespace for its global variables; etc. Users absolutely will use these namespaces -- you will create your own variables, classes, functions, etc.. What it's saying is that you shouldn't make up new magic-looking names and put them in your namespaces.
User-controlled namespaces are namespaces like global variables or object attributes. A Python programmer can put whatever names he or she so chooses into those namespaces; community disapproval can't stop it. Double-dunder names like __init__ and __file__ live in those namespaces along with ordinary names defined by programmers. The PEP 8 recommendation is that users not create non-standard names that look like the standard magic names.
Related
This question already has answers here:
What is the meaning of single and double underscore before an object name?
(18 answers)
Closed 6 years ago.
A very experienced engineer has told me to NEVER use double underscores for defining methods and variables inside a class, because they are reserved for magic methods and only use a single underscore. I understand that double underscores make attributes private to the class, and a single underscore makes them protected. I also understand that protected attributes is just a mutual understanding between developers. I find it hard to believe to not use private attributes, then why was that concept created in the first place. So my questions are:
Is it really bad practice to use double underscores even when it makes sense to make attributes non public?
Since protected attributes are "not really protected", wouldn't it make sense to just make it private, because it would have lesser mistakes when done this way?
Here are some corrections to your statements that will hopefully clarify what you are asking about:
Magic methods and attributes are prefixed and suffixed by a double underscore. A double underscore only in the prefix is specifically to make things private.
In Python 3 and above, attributes that are only prefixed with a double underscore get their name mangled to make them more private. You will be unable to access them outside a class using the literal name. This can cause issues outside of classes, so do not use a double-underscore prefix for say module-level attributes: How to access private variable of Python module from class. However, do use them in classes to make things private. If the feature was not intended to be used, it would not have been added to Python.
As far as privacy and protection goes in general, there is no such concept in Python. It is just an expectation that object oriented programmers have coming in from other languages, so there is an established convention for marking attributes as private.
The single underscore prefix is generally the preferred way to mark things as private because it does not mangle the name, leaving privacy at the discretion of the API's user. This sort of privacy/protection is really more of a way to indicate that the attribute is an implementation detail that may change in future versions. There is nothing stopping you from using the attribute, especially if you are OK with your code breaking when it is linked against different versions of libraries.
Keep in mind that even mangled names follow a fixed pattern for a given version of Python. The mangling is intended more to prevent you from accidentally overriding something you didn't intend to than to make attributes truly private. It just adds the class name with a bunch of underscores to your attribute name, so you can still access it directly if you know how.
Here is a good description of pretty much everything I just wrote from the docs: https://docs.python.org/2/tutorial/classes.html#private-variables-and-class-local-references
I have heard a lot about namespace as a feature for programming language. Thus, can not help thinking:
Does Python have so called namespace feature?
Quote from Python documentation:
A namespace is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries, but that’s normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: the set of built-in names (containing functions such as abs(), and built-in exception names); the global names in a module; and the local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to know about namespaces is that there is absolutely no relation between names in different namespaces; for instance, two different modules may both define a function maximize without confusion — users of the modules must prefix it with the module name.
For a good overview, see the "Python Scopes and Namespaces" section of the Python tutorial at https://docs.python.org/2/tutorial/classes.html#python-scopes-and-namespaces.
In regards to naming conventions for class attributes, PEP 8 states:
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
Does this imply to never use this convention or only use in a "user-controlled namespace."
I have seen this used in other libraries not directly included within Python a few times. From what I gather, using this convention is more specific to implementing an API.
A couple of examples would be providing an __acl__ to a class in Pyramid, or adding __tablename__ to a class in SQLAlchemy.
Is using this convention in the context of an API OK, or should it only ever be used / reserved for Python?
Pyramid and SQLAlchemy have disobeyed the instruction not to invent such names.
It's not clear to me what you mean by "using this convention in the context of an API", but once they've invented the name, you don't have much choice but to use it as documented by them.
There's no difference between you inventing such names, and inventing them only in user-controlled namespaces. Since you're a user of Python, any namespace that you could put the name into is user-controlled. If you're modifying the Python source to add a new extension to the language that requires some "magic" name, then you can invent one. I expect if you're doing this, you'll usually be in communication with GvR one way or another, so you can ask his opinion directly :-)
What's happening here is that the library authors want a name that none of their users will use by accident to mean something else. The Python language also wants names that no user will use by accident to mean something else, so it "reserves" names of a particular format for use by the language. But then the library-writer has decided to "steal" one of those reserved names and use it anyway, because they feel that's the best way to avoid a clash with one of their users. They have tens of thousands of users, most of whom they don't know anything about. But there's only one Python language spec, and they have access to it. So if a clash occurs the library developers will know about it, which is the plus side, but it will be their fault and difficult to fix, which is the minus side.
Perhaps they're hoping that by using it, the have de facto reserved it for themselves, and that GvR will choose never to use a name that a popular library has already used. Or perhaps they've discussed it on the relevant mailing lists and obtained an exception to the usual rule -- I don't know whether there's a process for that.
In cherryPy for example, there are files like:
__init__.py
_cptools.py
How are they different? What does this mean?
__...__ means reserved Python name (both in filenames and in other names). You shouldn't invent your own names using the double-underscore notation; and if you use existing, they have special functionality.
In this particular example, __init__.py defines the 'main' unit for a package; it also causes Python to treat the specific directory as a package. It is the unit that will be used when you call import cherryPy (and cherryPy is a directory). This is briefly explained in the Modules tutorial.
Another example is the __eq__ method which provides equality comparison for a class. You are allowed to call those methods directly (and you use them implicitly when you use the == operator, for example); however, newer Python versions may define more such methods and thus you shouldn't invent your own __-names because they might then collide. You can find quite a detailed list of such methods in Data model docs.
_... is often used as 'internal' name. For example, modules starting with _ shouldn't be used directly; similarly, methods with _ are supposedly-private and so on. It's just a convention but you should respect it.
These, and other, naming conventions are described in detail in Style Guide for Python Code - Descriptive: Naming Styles
Briefly:
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g.__init__, __import__ or __file__. Never invent such names; only use them as documented.
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.
__init__.py is a special file that, when existing in a folder turns that folder into module. Upon importing the module, __init__.py gets executed. The other one is just a naming convention but I would guess this would say that you shouldn't import that file directly.
Take a look here: 6.4. Packages for an explanation of how to create modules.
General rule: If anything in Python is namend __anything__ then it is something special and you should read about it before using it (e.g. magic functions).
The current chosen answer already gave good explanation on the double-underscore notation for __init__.py.
And I believe there is no real need for _cptools.py notation in a filename. It is presumably an unnecessary extended usage of applying the "single leading underscore" rule from the Style Guide for Python Code - Descriptive: Naming Styles:
_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.
If anything, the said Style Guide actually is against using _single_leading_underscore.py in filename. Its Package and Module Names section only mentions such usage when a module is implemented in C/C++.
In general, that _single_leading_underscore notation is typically observed in function names, method names and member variables, to differentiate them from other normal methods.
There is few need (if any at all), to use _single_leading_underscore.py on filename, because the developers are not scrapers , they are unlikely to salvage a file based on its filename. They would just follow a package's highest level of APIs (technically speaking, its exposed entities defined by __all__), therefore all the filenames are not even noticeable, let alone to be a factor, of whether a file (i.e. module) would be used.
One of the really nice things about python is the simplicity with which you can name variables that have the same name as the accessor:
self.__value = 1
def value():
return self.__value
Is there a simple way of providing access to the private members of a class that I wish to subclass? Often I wish to simply work with the raw data objects inside of a class without having to use accessors and mutators all the time.
I know this seems to go against the general idea of private and public, but usually the class I am trying to subclass is one of my own which I am quite happy to expose the members from to a subclass but not to an instance of that class. Is there a clean way of providing this distinction?
Not conveniently, without further breaking encapsulation. The double-underscore attribute is name-mangled by prepending '_ClassName' for the class it is being accessed in. So, if you have a 'ContainerThing' class that has a '__value' attribute, the attribute is actually being stored as '_ContainerThing__value'. Changing the class name (or refactoring where the attribute is assigned to) would mean breaking all subclasses that try to access that attribute.
This is exactly why the double-underscore name-mangling (which is not really "private", just "inconvenient") is a bad idea to use. Just use a single leading underscore. Everyone will know not to touch your 'private' attribute and you will still be able to access it in subclasses and other situations where it's darned handy. The name-mangling of double-underscore attributes is useful only to avoid name-clashes for attributes that are truly specific to a particular class, which is extremely rare. It provides no extra 'security' since even the name-mangled attributes are trivially accessible.
For the record, '__value' and 'value' (and '_value') are not the same name. The underscores are part of the name.
"I know this seems to go against the general idea of private and public" Not really "against", just different from C++ and Java.
Private -- as implemented in C++ and Java is not a very useful concept. It helps, sometimes, to isolate implementation details. But it is way overused.
Python names beginning with two __ are special and you should not, as a normal thing, be defining attributes with names like this. Names with __ are special and part of the implementation. And exposed for your use.
Names beginning with one _ are "private". Sometimes they are concealed, a little. Most of the time, the "consenting adults" rule applies -- don't use them foolishly, they're subject to change without notice.
We put "private" in quotes because it's just an agreement between you and your users. You've marked things with _. Your users (and yourself) should honor that.
Often, we have method function names with a leading _ to indicate that we consider them to be "private" and subject to change without notice.
The endless getters and setters that Java requires aren't as often used in Python. Python introspection is more flexible, you have access to an object's internal dictionary of attribute values, and you have first class functions like getattr() and setattr().
Further, you have the property() function which is often used to bind getters and setters to a single name that behaves like a simple attribute, but is actually well-defined method function calls.
Not sure of where to cite it from, but the following statement in regard to access protection is Pythonic canon: "We're all consenting adults here".
Just as Thomas Wouters has stated, a single leading underscore is the idiomatic way of marking an attribute as being a part of the object's internal state. Two underscores just provides name mangling to prevent easy access to the attribute.
After that, you should just expect that the client of your library won't go and shoot themselves in the foot by meddling with the "private" attributes.