This question already has answers here:
What is the meaning of single and double underscore before an object name?
(18 answers)
Closed 6 years ago.
A very experienced engineer has told me to NEVER use double underscores for defining methods and variables inside a class, because they are reserved for magic methods and only use a single underscore. I understand that double underscores make attributes private to the class, and a single underscore makes them protected. I also understand that protected attributes is just a mutual understanding between developers. I find it hard to believe to not use private attributes, then why was that concept created in the first place. So my questions are:
Is it really bad practice to use double underscores even when it makes sense to make attributes non public?
Since protected attributes are "not really protected", wouldn't it make sense to just make it private, because it would have lesser mistakes when done this way?
Here are some corrections to your statements that will hopefully clarify what you are asking about:
Magic methods and attributes are prefixed and suffixed by a double underscore. A double underscore only in the prefix is specifically to make things private.
In Python 3 and above, attributes that are only prefixed with a double underscore get their name mangled to make them more private. You will be unable to access them outside a class using the literal name. This can cause issues outside of classes, so do not use a double-underscore prefix for say module-level attributes: How to access private variable of Python module from class. However, do use them in classes to make things private. If the feature was not intended to be used, it would not have been added to Python.
As far as privacy and protection goes in general, there is no such concept in Python. It is just an expectation that object oriented programmers have coming in from other languages, so there is an established convention for marking attributes as private.
The single underscore prefix is generally the preferred way to mark things as private because it does not mangle the name, leaving privacy at the discretion of the API's user. This sort of privacy/protection is really more of a way to indicate that the attribute is an implementation detail that may change in future versions. There is nothing stopping you from using the attribute, especially if you are OK with your code breaking when it is linked against different versions of libraries.
Keep in mind that even mangled names follow a fixed pattern for a given version of Python. The mangling is intended more to prevent you from accidentally overriding something you didn't intend to than to make attributes truly private. It just adds the class name with a bunch of underscores to your attribute name, so you can still access it directly if you know how.
Here is a good description of pretty much everything I just wrote from the docs: https://docs.python.org/2/tutorial/classes.html#private-variables-and-class-local-references
Related
PEP8 says:
Use one leading underscore only for non-public methods and instance
variables.
If a class has methods that are intended for usage inside the current package, however not intended for usage in other packages, how do you name them? (i.e. not interface methods, however, they are used inside a package)
Leading underscore describes such a method as "private" which is not true. Name without leading underscore describes it as "free to use for everybody" which is misleading too.
So how to name them? How do you deal with it?
In my projects, this question usually pops ups, but I can't find answers at stack overflow. How others avoid it?
Thanks
Reading this question on method ordering, I thought about where to put protected methods and whether they should be private _method(self) or public method(self) in Python. I know that Python doesn't provide a language feature for protected methods.
Private: By convention, attributes starting with an underscore are private. They can still normally be accessed from the outside but should not. Starting protected methods with an underscore feels weird since it is unclear that the subclass actually overrides the method rather than declaring its own implementation detail.
Public: Without the underscore, it is more likely that someone would take a look at the base class to see whether the method is already there. Thus this is nicer for people who subclass. However, people who want to use the subclass don't know that the method is just an implementation detail and might try to call it from the outside.
What is the preferred way to define protected methods in Python?
Just use names starting with a single underscore.
A protected method is a implementation detail that you want to share with subclasses, so such methods are not part of the public API. Anything not part of the public API is best named with an initial underscore.
In other words, 'protected' should be treated just the same as 'private'. Protected methods only need to exist in a language with a strict privacy model where making such implementation details private would preclude sharing such methods with subclasses. Python has no such problem.
Whatever you do, do not use a leading double underscore; such names are considered class private and are namespaced to the class that defines them (they are renamed by the compiler by prefixing _ClassName in front), to ensure that subclasses don't accidentally overwrite them.
In regards to naming conventions for class attributes, PEP 8 states:
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
Does this imply to never use this convention or only use in a "user-controlled namespace."
I have seen this used in other libraries not directly included within Python a few times. From what I gather, using this convention is more specific to implementing an API.
A couple of examples would be providing an __acl__ to a class in Pyramid, or adding __tablename__ to a class in SQLAlchemy.
Is using this convention in the context of an API OK, or should it only ever be used / reserved for Python?
Pyramid and SQLAlchemy have disobeyed the instruction not to invent such names.
It's not clear to me what you mean by "using this convention in the context of an API", but once they've invented the name, you don't have much choice but to use it as documented by them.
There's no difference between you inventing such names, and inventing them only in user-controlled namespaces. Since you're a user of Python, any namespace that you could put the name into is user-controlled. If you're modifying the Python source to add a new extension to the language that requires some "magic" name, then you can invent one. I expect if you're doing this, you'll usually be in communication with GvR one way or another, so you can ask his opinion directly :-)
What's happening here is that the library authors want a name that none of their users will use by accident to mean something else. The Python language also wants names that no user will use by accident to mean something else, so it "reserves" names of a particular format for use by the language. But then the library-writer has decided to "steal" one of those reserved names and use it anyway, because they feel that's the best way to avoid a clash with one of their users. They have tens of thousands of users, most of whom they don't know anything about. But there's only one Python language spec, and they have access to it. So if a clash occurs the library developers will know about it, which is the plus side, but it will be their fault and difficult to fix, which is the minus side.
Perhaps they're hoping that by using it, the have de facto reserved it for themselves, and that GvR will choose never to use a name that a popular library has already used. Or perhaps they've discussed it on the relevant mailing lists and obtained an exception to the usual rule -- I don't know whether there's a process for that.
I'd like to create a factory pattern in Python, where one class has some configuration, and knows how to build another class' object (or several classes) on demand. To make this complete, I would like to prevent the created class from being created outside of the factory. In Java, I would put both in the same package, and make the class' constructor package protected.
For regular method names or variables, one can follow the Python convention and use single or double underscores ("_foo" or "__foo"). Is there a way to do something like that for a constructor?
Thank you
You can't. The Python mentality is often summed up as "we're all grown-ups here"; that is, you can't stop people calling methods, changing attributes, instantiating classes, and so on. Instead, you should make an obvious way to construct an instance of your class and then assume that it will be used.
Don't bother, it's not the Python way.
The preferred solution is to simply document which constructor or factory method clients are supposed to call, and not worry too much about public/private (which doesn't mean much in Python anyway; everything is essentially public-in-code.)
The convention in Python is to prefix the name of internal things (members or classes) with an underscore. There is no way to enforce limited access, but the underscore serves as a signal that "you shouldn't be touching this".
From the python tutorial:
“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.
Based on a comment from Wim, one can name the class of the object to be created starting with a single or double underscore. This way it is clear that the constructor is private, and should not be called directly.
One of the really nice things about python is the simplicity with which you can name variables that have the same name as the accessor:
self.__value = 1
def value():
return self.__value
Is there a simple way of providing access to the private members of a class that I wish to subclass? Often I wish to simply work with the raw data objects inside of a class without having to use accessors and mutators all the time.
I know this seems to go against the general idea of private and public, but usually the class I am trying to subclass is one of my own which I am quite happy to expose the members from to a subclass but not to an instance of that class. Is there a clean way of providing this distinction?
Not conveniently, without further breaking encapsulation. The double-underscore attribute is name-mangled by prepending '_ClassName' for the class it is being accessed in. So, if you have a 'ContainerThing' class that has a '__value' attribute, the attribute is actually being stored as '_ContainerThing__value'. Changing the class name (or refactoring where the attribute is assigned to) would mean breaking all subclasses that try to access that attribute.
This is exactly why the double-underscore name-mangling (which is not really "private", just "inconvenient") is a bad idea to use. Just use a single leading underscore. Everyone will know not to touch your 'private' attribute and you will still be able to access it in subclasses and other situations where it's darned handy. The name-mangling of double-underscore attributes is useful only to avoid name-clashes for attributes that are truly specific to a particular class, which is extremely rare. It provides no extra 'security' since even the name-mangled attributes are trivially accessible.
For the record, '__value' and 'value' (and '_value') are not the same name. The underscores are part of the name.
"I know this seems to go against the general idea of private and public" Not really "against", just different from C++ and Java.
Private -- as implemented in C++ and Java is not a very useful concept. It helps, sometimes, to isolate implementation details. But it is way overused.
Python names beginning with two __ are special and you should not, as a normal thing, be defining attributes with names like this. Names with __ are special and part of the implementation. And exposed for your use.
Names beginning with one _ are "private". Sometimes they are concealed, a little. Most of the time, the "consenting adults" rule applies -- don't use them foolishly, they're subject to change without notice.
We put "private" in quotes because it's just an agreement between you and your users. You've marked things with _. Your users (and yourself) should honor that.
Often, we have method function names with a leading _ to indicate that we consider them to be "private" and subject to change without notice.
The endless getters and setters that Java requires aren't as often used in Python. Python introspection is more flexible, you have access to an object's internal dictionary of attribute values, and you have first class functions like getattr() and setattr().
Further, you have the property() function which is often used to bind getters and setters to a single name that behaves like a simple attribute, but is actually well-defined method function calls.
Not sure of where to cite it from, but the following statement in regard to access protection is Pythonic canon: "We're all consenting adults here".
Just as Thomas Wouters has stated, a single leading underscore is the idiomatic way of marking an attribute as being a part of the object's internal state. Two underscores just provides name mangling to prevent easy access to the attribute.
After that, you should just expect that the client of your library won't go and shoot themselves in the foot by meddling with the "private" attributes.