I'd like to create a factory pattern in Python, where one class has some configuration, and knows how to build another class' object (or several classes) on demand. To make this complete, I would like to prevent the created class from being created outside of the factory. In Java, I would put both in the same package, and make the class' constructor package protected.
For regular method names or variables, one can follow the Python convention and use single or double underscores ("_foo" or "__foo"). Is there a way to do something like that for a constructor?
Thank you
You can't. The Python mentality is often summed up as "we're all grown-ups here"; that is, you can't stop people calling methods, changing attributes, instantiating classes, and so on. Instead, you should make an obvious way to construct an instance of your class and then assume that it will be used.
Don't bother, it's not the Python way.
The preferred solution is to simply document which constructor or factory method clients are supposed to call, and not worry too much about public/private (which doesn't mean much in Python anyway; everything is essentially public-in-code.)
The convention in Python is to prefix the name of internal things (members or classes) with an underscore. There is no way to enforce limited access, but the underscore serves as a signal that "you shouldn't be touching this".
From the python tutorial:
“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.
Based on a comment from Wim, one can name the class of the object to be created starting with a single or double underscore. This way it is clear that the constructor is private, and should not be called directly.
Related
I have a class A that need to implement a method meth().
Now, I don't want this method to be called by the end-user of my package. Thus, I have to make this method private (i.e. _meth(). I know that it's not really, private, but conventions matter.)
The problem though is that I have yet another class B in my package that has to call that method _meth(). Problem is that I now get the warning method that say that B tries to access a protected method of a class. Thus, I have to make the method public, i.e. without the leading underscore. This contradicts my intentions.
What is the most pythonic way to solve this dilemma?
I know I can re-implement that method outside of A, but it will lead to code duplication and, as meth() uses private attributes of A, will lead to the same problem.
Inheriting from a single metaclass is not an option as those classes have entirely different purposes and that will be contributing towards a ghastly mess.
The fact that pylint/your editor/whatever external tool gives you a warning doesn't prevent code execution. I don't know about your editor but pylint warnings can be disabled on a case-by-case basis using special comments (nb: "case by case" meaning: "do not warn me for this line or block", not "totally disable this warning").
And it's perfectly ok for your own code to access protected attributes and methods in the same package - the "_protected" naming convention does not mean "None shall pass", just "are you sure you understand what you're doing and willing to take responsability if you break something ?". Since you're the author/maintainer of the package and those are intra-package access you are obviously entitled to take this responsability ;)
The "most pythonic way" would be to not care about private and protected, as these concepts do not exist in Python.
Everything is public. Adding a underscore in the name does not make it private, it just indicates the method is for internal use in the class (not to prevent usage by some end-user).
If you need to use the method from another class, it shows that you're not using classes and objects correctly, and you probably come from a different language like Java where classes are used to group methods together in some namespace.
Just move the function to the module level (outside the class), as you're not using the object (self) anyway.
Reading this question on method ordering, I thought about where to put protected methods and whether they should be private _method(self) or public method(self) in Python. I know that Python doesn't provide a language feature for protected methods.
Private: By convention, attributes starting with an underscore are private. They can still normally be accessed from the outside but should not. Starting protected methods with an underscore feels weird since it is unclear that the subclass actually overrides the method rather than declaring its own implementation detail.
Public: Without the underscore, it is more likely that someone would take a look at the base class to see whether the method is already there. Thus this is nicer for people who subclass. However, people who want to use the subclass don't know that the method is just an implementation detail and might try to call it from the outside.
What is the preferred way to define protected methods in Python?
Just use names starting with a single underscore.
A protected method is a implementation detail that you want to share with subclasses, so such methods are not part of the public API. Anything not part of the public API is best named with an initial underscore.
In other words, 'protected' should be treated just the same as 'private'. Protected methods only need to exist in a language with a strict privacy model where making such implementation details private would preclude sharing such methods with subclasses. Python has no such problem.
Whatever you do, do not use a leading double underscore; such names are considered class private and are namespaced to the class that defines them (they are renamed by the compiler by prefixing _ClassName in front), to ensure that subclasses don't accidentally overwrite them.
My IDE keeps suggesting I convert my instance methods to static methods. I guess because I haven't referenced any self within these methods.
An example is :
class NotificationViewSet(NSViewSet):
def pre_create_processing(self, request, obj):
log.debug(" creating messages ")
# Ensure data is consistent and belongs to the sending bot.
obj['user_id'] = request.auth.owner.id
obj['bot_id'] = request.auth.id
So my question would be: do I lose anything by just ignoring the IDE suggestions, or is there more to it?
This is a matter of workflow, intentions with your design, and also a somewhat subjective decision.
First of all, you are right, your IDE suggests converting the method to a static method because the method does not use the instance. It is most likely a good idea to follow this suggestion, but you might have a few reasons to ignore it.
Possible reasons to ignore it:
The code is soon to be changed to use the instance (on the other hand, the idea of soon is subjective, so be careful)
The code is legacy and not entirely understood/known
The interface is used in a polymorphic/duck typed way (e.g. you have a collection of objects with this method and you want to call them in a uniform way, but the implementation in this class happens to not need to use the instance - which is a bit of a code smell)
The interface is specified externally and cannot be changed (this is analog to the previous reason)
The AST of the code is read/manipulated either by itself or something that uses it and expects this method to be an instance method (this again is an external dependency on the interface)
I'm sure there can be more, but failing these types of reasons I would follow the suggestion. However, if the method does not belong to the class (e.g. factory method or something similar), I would refactor it to not be part of the class.
I think that you might be mixing up some terminology - the example is not a class method. Class methods receive the class as the first argument, they do not receive the instance. In this case you have a normal instance method that is not using its instance.
If the method does not belong in the class, you can move it out of the class and make it a standard function. Otherwise, if it should be bundled as part of the class, e.g. it's a factory function, then you should probably make it a static method as this (at a minimum) serves as useful documentation to users of your class that the method is coupled to the class, but not dependent on it's state.
Making the method static also has the advantage this it can be overridden in subclasses of the class. If the method was moved outside of the class as a regular function then subclassing is not possible.
I know most of the ins and outs of Python's approach to private variables/members/functions/...
However, I can't make my mind up on how to distinguish between methods for external use or subclassing use.
Consider the following example:
class EventMixin(object):
def subscribe(self, **kwargs):
'''kwargs should be a dict of event -> callable, to be specialized in the subclass'''
def event(self, name, *args, **kwargs):
...
def _somePrivateMethod(self):
...
In this example, I want to make it clear that subscribe is a method to be used by external users of the class/object, while event is a method that should not be called from the outside, but rather by subclass implementations.
Right now, I consider both part of the public API, hence don't use any underscores. However, for this particular situation, it would feel cleaner to, for example, use no underscores for the external API, one underscore for the subclassable API, and two underscores for the private/internal API. However, that would become unwieldy because then the internal API would need to be invoked as
self._EventMixin__somePrivateMethod()
So, what are your conventions, coding-wise, documentationwise, or otherwise ?
use no underscores for the external API,
one underscore for the subclassable API,
and two underscores for the private/internal API
This is a reasonable and relatively common way of doing it, yes. The double-underline-for-actually-private (as opposed to ‘protected’ in C++ terms) is in practice pretty rare. You never really know what behaviours a subclass might want to override, so assuming ‘protected’ is generally a good bet unless there's a really good reason why messing with a member might be particularly dangerous.
However, that would become unwieldy because then the internal API would
need to be invoked as self._EventMixin__somePrivateMethod()
Nope, you can just use the double-underlined version and it will be munged automatically. It's ugly but it works.
I generally find using double __ to be more trouble that they are worth, as it makes unit testing very painful. using single _ as convention for methods/attributes that are not intended to be part of the public interface of a particular class/module is my preferred approach.
I'd like to make the suggestion that when you find yourself encountering this kind of distinction, it may be a good idea to consider using composition instead of inheritance; in other words, instantiating EventMixin (presumably the name would change) instead of inheriting it.
One of the really nice things about python is the simplicity with which you can name variables that have the same name as the accessor:
self.__value = 1
def value():
return self.__value
Is there a simple way of providing access to the private members of a class that I wish to subclass? Often I wish to simply work with the raw data objects inside of a class without having to use accessors and mutators all the time.
I know this seems to go against the general idea of private and public, but usually the class I am trying to subclass is one of my own which I am quite happy to expose the members from to a subclass but not to an instance of that class. Is there a clean way of providing this distinction?
Not conveniently, without further breaking encapsulation. The double-underscore attribute is name-mangled by prepending '_ClassName' for the class it is being accessed in. So, if you have a 'ContainerThing' class that has a '__value' attribute, the attribute is actually being stored as '_ContainerThing__value'. Changing the class name (or refactoring where the attribute is assigned to) would mean breaking all subclasses that try to access that attribute.
This is exactly why the double-underscore name-mangling (which is not really "private", just "inconvenient") is a bad idea to use. Just use a single leading underscore. Everyone will know not to touch your 'private' attribute and you will still be able to access it in subclasses and other situations where it's darned handy. The name-mangling of double-underscore attributes is useful only to avoid name-clashes for attributes that are truly specific to a particular class, which is extremely rare. It provides no extra 'security' since even the name-mangled attributes are trivially accessible.
For the record, '__value' and 'value' (and '_value') are not the same name. The underscores are part of the name.
"I know this seems to go against the general idea of private and public" Not really "against", just different from C++ and Java.
Private -- as implemented in C++ and Java is not a very useful concept. It helps, sometimes, to isolate implementation details. But it is way overused.
Python names beginning with two __ are special and you should not, as a normal thing, be defining attributes with names like this. Names with __ are special and part of the implementation. And exposed for your use.
Names beginning with one _ are "private". Sometimes they are concealed, a little. Most of the time, the "consenting adults" rule applies -- don't use them foolishly, they're subject to change without notice.
We put "private" in quotes because it's just an agreement between you and your users. You've marked things with _. Your users (and yourself) should honor that.
Often, we have method function names with a leading _ to indicate that we consider them to be "private" and subject to change without notice.
The endless getters and setters that Java requires aren't as often used in Python. Python introspection is more flexible, you have access to an object's internal dictionary of attribute values, and you have first class functions like getattr() and setattr().
Further, you have the property() function which is often used to bind getters and setters to a single name that behaves like a simple attribute, but is actually well-defined method function calls.
Not sure of where to cite it from, but the following statement in regard to access protection is Pythonic canon: "We're all consenting adults here".
Just as Thomas Wouters has stated, a single leading underscore is the idiomatic way of marking an attribute as being a part of the object's internal state. Two underscores just provides name mangling to prevent easy access to the attribute.
After that, you should just expect that the client of your library won't go and shoot themselves in the foot by meddling with the "private" attributes.