In a nutshell, I receive json events via an API and recently I've been learning a lot more about classes. One of the recommended ways to use classes is to implement getters, setters etc.. However, my classes aren't too sophisticated all they're doing is parsing data from a json object and passing better formatted data onto further ETL processes.
Below is a simple example of what I've encountered.
data = {'status': 'ready'}
class StatusHandler:
def __init__(self, data):
self.status = data.get('status', None)
class StatusHandler2:
def __init__(self, data):
self._status = data.get('status', None)
#property
def status(self):
return self._status
without_getter = StatusHandler(data)
print(without_getter.status)
with_getter = StatusHandler2(data)
print(with_getter.status)
Is there anything wrong with me using the class StatusHandler and referencing a status instance variable and using that to pass information forward to other bits of code? I'm just wondering if further down the line as my project gets more complicated that this would be an issue as it doesn't seem to be standard although I could be wrong...
The point of getters/setters is to avoid replacing plain attributes access with computed ones without breaking client code if and when you have to change your implementation. This only make sense for languages that have no support for computed attributes.
Python has a quite strong support for computed attributes thru the descriptor protocol, including the generic builtin property type, so you don't need explicit getters/setters - if you have to change your implementation, just replace affected public attributes by computed ones.
Just make sure to not abuse computed attributes - they should not make any heavy computation, external resource access or so. No one expects what looks like an attribute to have a high cost or raise IOErrors or so ;-)
EDIT
With regard to your example: computed attributes are a way to control attribute access, and making an attribute read-only (not providing a setter for your property) IS a perfectly valid use case - IF you have a reason to make it read-only of course.
I'm coming from the Java world and reading Bruce Eckels' Python 3 Patterns, Recipes and Idioms.
While reading about classes, it goes on to say that in Python there is no need to declare instance variables. You just use them in the constructor, and boom, they are there.
So for example:
class Simple:
def __init__(self, s):
print("inside the simple constructor")
self.s = s
def show(self):
print(self.s)
def showMsg(self, msg):
print(msg + ':', self.show())
If that’s true, then any object of class Simple can just change the value of variable s outside of the class.
For example:
if __name__ == "__main__":
x = Simple("constructor argument")
x.s = "test15" # this changes the value
x.show()
x.showMsg("A message")
In Java, we have been taught about public/private/protected variables. Those keywords make sense because at times you want variables in a class to which no one outside the class has access to.
Why is that not required in Python?
It's cultural. In Python, you don't write to other classes' instance or class variables. In Java, nothing prevents you from doing the same if you really want to - after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.
If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they're not easily visible to code outside the namespace that contains them (although you can get around it if you're determined enough, just like you can get around Java's protections if you work at it).
By the same convention, the _ prefix means _variable should be used internally in the class (or module) only, even if you're not technically prevented from accessing it from somewhere else. You don't play around with another class's variables that look like __foo or _bar.
Private variables in Python is more or less a hack: the interpreter intentionally renames the variable.
class A:
def __init__(self):
self.__var = 123
def printVar(self):
print self.__var
Now, if you try to access __var outside the class definition, it will fail:
>>> x = A()
>>> x.__var # this will return error: "A has no attribute __var"
>>> x.printVar() # this gives back 123
But you can easily get away with this:
>>> x.__dict__ # this will show everything that is contained in object x
# which in this case is something like {'_A__var' : 123}
>>> x._A__var = 456 # you now know the masked name of private variables
>>> x.printVar() # this gives back 456
You probably know that methods in OOP are invoked like this: x.printVar() => A.printVar(x). If A.printVar() can access some field in x, this field can also be accessed outside A.printVar()... After all, functions are created for reusability, and there isn't any special power given to the statements inside.
As correctly mentioned by many of the comments above, let's not forget the main goal of Access Modifiers: To help users of code understand what is supposed to change and what is supposed not to. When you see a private field you don't mess around with it. So it's mostly syntactic sugar which is easily achieved in Python by the _ and __.
Python does not have any private variables like C++ or Java does. You could access any member variable at any time if wanted, too. However, you don't need private variables in Python, because in Python it is not bad to expose your classes' member variables. If you have the need to encapsulate a member variable, you can do this by using "#property" later on without breaking existing client code.
In Python, the single underscore "_" is used to indicate that a method or variable is not considered as part of the public API of a class and that this part of the API could change between different versions. You can use these methods and variables, but your code could break, if you use a newer version of this class.
The double underscore "__" does not mean a "private variable". You use it to define variables which are "class local" and which can not be easily overridden by subclasses. It mangles the variables name.
For example:
class A(object):
def __init__(self):
self.__foobar = None # Will be automatically mangled to self._A__foobar
class B(A):
def __init__(self):
self.__foobar = 1 # Will be automatically mangled to self._B__foobar
self.__foobar's name is automatically mangled to self._A__foobar in class A. In class B it is mangled to self._B__foobar. So every subclass can define its own variable __foobar without overriding its parents variable(s). But nothing prevents you from accessing variables beginning with double underscores. However, name mangling prevents you from calling this variables /methods incidentally.
I strongly recommend you watch Raymond Hettinger's Python's class development toolkit from PyCon 2013, which gives a good example why and how you should use #property and "__"-instance variables.
If you have exposed public variables and you have the need to encapsulate them, then you can use #property. Therefore you can start with the simplest solution possible. You can leave member variables public unless you have a concrete reason to not do so. Here is an example:
class Distance:
def __init__(self, meter):
self.meter = meter
d = Distance(1.0)
print(d.meter)
# prints 1.0
class Distance:
def __init__(self, meter):
# Customer request: Distances must be stored in millimeters.
# Public available internals must be changed.
# This would break client code in C++.
# This is why you never expose public variables in C++ or Java.
# However, this is Python.
self.millimeter = meter * 1000
# In Python we have #property to the rescue.
#property
def meter(self):
return self.millimeter *0.001
#meter.setter
def meter(self, value):
self.millimeter = value * 1000
d = Distance(1.0)
print(d.meter)
# prints 1.0
There is a variation of private variables in the underscore convention.
In [5]: class Test(object):
...: def __private_method(self):
...: return "Boo"
...: def public_method(self):
...: return self.__private_method()
...:
In [6]: x = Test()
In [7]: x.public_method()
Out[7]: 'Boo'
In [8]: x.__private_method()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-fa17ce05d8bc> in <module>()
----> 1 x.__private_method()
AttributeError: 'Test' object has no attribute '__private_method'
There are some subtle differences, but for the sake of programming pattern ideological purity, it's good enough.
There are examples out there of #private decorators that more closely implement the concept, but your mileage may vary. Arguably, one could also write a class definition that uses meta.
As mentioned earlier, you can indicate that a variable or method is private by prefixing it with an underscore. If you don't feel like this is enough, you can always use the property decorator. Here's an example:
class Foo:
def __init__(self, bar):
self._bar = bar
#property
def bar(self):
"""Getter for '_bar'."""
return self._bar
This way, someone or something that references bar is actually referencing the return value of the bar function rather than the variable itself, and therefore it can be accessed but not changed. However, if someone really wanted to, they could simply use _bar and assign a new value to it. There is no surefire way to prevent someone from accessing variables and methods that you wish to hide, as has been said repeatedly. However, using property is the clearest message you can send that a variable is not to be edited. property can also be used for more complex getter/setter/deleter access paths, as explained here: https://docs.python.org/3/library/functions.html#property
Python has limited support for private identifiers, through a feature that automatically prepends the class name to any identifiers starting with two underscores. This is transparent to the programmer, for the most part, but the net effect is that any variables named this way can be used as private variables.
See here for more on that.
In general, Python's implementation of object orientation is a bit primitive compared to other languages. But I enjoy this, actually. It's a very conceptually simple implementation and fits well with the dynamic style of the language.
The only time I ever use private variables is when I need to do other things when writing to or reading from the variable and as such I need to force the use of a setter and/or getter.
Again this goes to culture, as already stated. I've been working on projects where reading and writing other classes variables was free-for-all. When one implementation became deprecated it took a lot longer to identify all code paths that used that function. When use of setters and getters was forced, a debug statement could easily be written to identify that the deprecated method had been called and the code path that calls it.
When you are on a project where anyone can write an extension, notifying users about deprecated methods that are to disappear in a few releases hence is vital to keep module breakage at a minimum upon upgrades.
So my answer is; if you and your colleagues maintain a simple code set then protecting class variables is not always necessary. If you are writing an extensible system then it becomes imperative when changes to the core is made that needs to be caught by all extensions using the code.
"In java, we have been taught about public/private/protected variables"
"Why is that not required in python?"
For the same reason, it's not required in Java.
You're free to use -- or not use private and protected.
As a Python and Java programmer, I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected.
Why not?
Here's my question "protected from whom?"
Other programmers on my team? They have the source. What does protected mean when they can change it?
Other programmers on other teams? They work for the same company. They can -- with a phone call -- get the source.
Clients? It's work-for-hire programming (generally). The clients (generally) own the code.
So, who -- precisely -- am I protecting it from?
In Python 3, if you just want to "encapsulate" the class attributes, like in Java, you can just do the same thing like this:
class Simple:
def __init__(self, str):
print("inside the simple constructor")
self.__s = str
def show(self):
print(self.__s)
def showMsg(self, msg):
print(msg + ':', self.show())
To instantiate this do:
ss = Simple("lol")
ss.show()
Note that: print(ss.__s) will throw an error.
In practice, Python 3 will obfuscate the global attribute name. It is turning this like a "private" attribute, like in Java. The attribute's name is still global, but in an inaccessible way, like a private attribute in other languages.
But don't be afraid of it. It doesn't matter. It does the job too. ;)
Private and protected concepts are very important. But Python is just a tool for prototyping and rapid development with restricted resources available for development, and that is why some of the protection levels are not so strictly followed in Python. You can use "__" in a class member. It works properly, but it does not look good enough. Each access to such field contains these characters.
Also, you can notice that the Python OOP concept is not perfect. Smalltalk or Ruby are much closer to a pure OOP concept. Even C# or Java are closer.
Python is a very good tool. But it is a simplified OOP language. Syntactically and conceptually simplified. The main goal of Python's existence is to bring to developers the possibility to write easy readable code with a high abstraction level in a very fast manner.
Here's how I handle Python 3 class fields:
class MyClass:
def __init__(self, public_read_variable, private_variable):
self.public_read_variable_ = public_read_variable
self.__private_variable = private_variable
I access the __private_variable with two underscores only inside MyClass methods.
I do read access of the public_read_variable_ with one underscore
outside the class, but never modify the variable:
my_class = MyClass("public", "private")
print(my_class.public_read_variable_) # OK
my_class.public_read_variable_ = 'another value' # NOT OK, don't do that.
So I’m new to Python but I have a background in C# and JavaScript. Python feels like a mix of the two in terms of features. JavaScript also struggles in this area and the way around it here, is to create a closure. This prevents access to data you don’t want to expose by returning a different object.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
return printer # returns the nested function
# Now let's try calling this function.
# Output: Hello
another = print_msg("Hello")
another()
https://www.programiz.com/python-programming/closure
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures#emulating_private_methods_with_closures
About sources (to change the access rights and thus bypass language encapsulation like Java or C++):
You don't always have the sources and even if you do, the sources are managed by a system that only allows certain programmers to access a source (in a professional context). Often, every programmer is responsible for certain classes and therefore knows what he can and cannot do. The source manager also locks the sources being modified and of course, manages the access rights of programmers.
So I trust more in software than in human, by experience. So convention is good, but multiple protections are better, like access management (real private variable) + sources management.
I have been thinking about private class attributes and methods (named members in further reading) since I have started to develop a package that I want to publish. The thought behind it were never to make it impossible to overwrite these members, but to have a warning for those who touch them. I came up with a few solutions that might help. The first solution is used in one of my favorite Python books, Fluent Python.
Upsides of technique 1:
It is unlikely to be overwritten by accident.
It is easily understood and implemented.
Its easier to handle than leading double underscore for instance attributes.
*In the book the hash-symbol was used, but you could use integer converted to strings as well. In Python it is forbidden to use klass.1
class Technique1:
def __init__(self, name, value):
setattr(self, f'private#{name}', value)
setattr(self, f'1{name}', value)
Downsides of technique 1:
Methods are not easily protected with this technique though. It is possible.
Attribute lookups are just possible via getattr
Still no warning to the user
Another solution I came across was to write __setattr__. Pros:
It is easily implemented and understood
It works with methods
Lookup is not affected
The user gets a warning or error
class Demonstration:
def __init__(self):
self.a = 1
def method(self):
return None
def __setattr__(self, name, value):
if not getattr(self, name, None):
super().__setattr__(name, value)
else:
raise ValueError(f'Already reserved name: {name}')
d = Demonstration()
#d.a = 2
d.method = None
Cons:
You can still overwrite the class
To have variables not just constants, you need to map allowed input.
Subclasses can still overwrite methods
To prevent subclasses from overwriting methods you can use __init_subclass__:
class Demonstration:
__protected = ['method']
def method(self):
return None
def __init_subclass__(cls):
protected_methods = Demonstration.__protected
subclass_methods = dir(cls)
for i in protected_methods:
p = getattr(Demonstration,i)
j = getattr(cls, i)
if not p is j:
raise ValueError(f'Protected method "{i}" was touched')
You see, there are ways to protect your class members, but it isn't any guarantee that users don't overwrite them anyway. This should just give you some ideas. In the end, you could also use a meta class, but this might open up new dangers to encounter. The techniques used here are also very simple minded and you should definitely take a look at the documentation, you can find useful feature to this technique and customize them to your need.
The question refers to which one is preferable to be used in which use case, not about the technical background.
In python, you can control the access of attributes via a property, a descriptor, or magic methods. Which one is most pythonic in which use case? All of them seem to have the same effect (see the examples below).
I am looking for an answer like:
Property: Should be used in case of …
Descriptor: In the case of … it should be used instead of a property.
Magic method: Only use if ….
Example
A use case would be an attribute that might not be able to be set in the __init__ method, for example because the object is not present in the database yet, but at a later time. Each time the attribute is accessed, it should be tried to be set and returned.
As an example that works with Copy&Paste in the Python shell, there is a class that wants to present its attribute only the second time it is asked for it. So, which one is the best way, or are there different situations one of them is preferable? Here are the three ways to implement it:
With Property::
class ContactBook(object):
intents = 0
def __init__(self):
self.__first_person = None
def get_first_person(self):
ContactBook.intents += 1
if self.__first_person is None:
if ContactBook.intents > 1:
value = 'Mr. First'
self.__first_person = value
else:
return None
return self.__first_person
def set_first_person(self, value):
self.__first_person = value
first_person = property(get_first_person, set_first_person)
With __getattribute__::
class ContactBook(object):
intents = 0
def __init__(self):
self.first_person = None
def __getattribute__(self, name):
if name == 'first_person' \
and object.__getattribute__(self, name) is None:
ContactBook.intents += 1
if ContactBook.intents > 1:
value = 'Mr. First'
self.first_person = value
else:
value = None
else:
value = object.__getattribute__(self, name)
return value
Descriptor::
class FirstPerson(object):
def __init__(self, value=None):
self.value = None
def __get__(self, instance, owner):
if self.value is None:
ContactBook.intents += 1
if ContactBook.intents > 1:
self.value = 'Mr. First'
else:
return None
return self.value
class ContactBook(object):
intents = 0
first_person = FirstPerson()
Each one of it has this behavior::
book = ContactBook()
print(book.first_person)
# >>None
print(book.first_person)
# >>Mr. First
Basically, use the simplest one you can. Roughly speaking, the order of complexity/heavy-duty-ness goes: regular attribute, property, __getattr__, __getattribute__/descriptor. (__getattribute__ and custom descriptors are both things you probably won't need to do very often.) This leads to some simple rules of thumb:
Don't use a property if a normal attribute will work.
Don't write your own descriptor if a property will work.
Don't use __getattr__ if a property will work.
Don't use __getattribute__ if __getattr__ will work.
Stated a bit more specifically: use a property to customize handling of one or a small set of attributes; use __getattr__ to customize handling of all attributes, or all except a small set; use __getattribute__ if you were hoping to use __getattr__ but it doesn't quite work; write your own descriptor class if you are doing something very complicated.
You use a property when you have one or a small set of attributes whose getting/setting you want to hook into. That is, you want things like obj.prop and obj.prop = 2 to secretly call a function that you write to customize what happens.
You would use __getattr__ when you want to do that for so many attributes that you don't actually want to define them all individually, but rather want to customize the whole attribute-access process as a whole. In other words, instead of hooking into obj.prop1, and obj.prop2, etc., you have so many that you want to be able to hook into obj.<anything>, and handle that in general.
However, __getattr__ still won't let you override what happens for attributes that really do exist, it just lets you hook in with a blanket handling for any use of attributes that would otherwise raise an AttributeError. Using __getattribute__ lets you hook in to handle everything, even normal attributes that would have worked without messing with __getattribute__. Because of this, using __getattribute__ has the potential to break fairly basic behavior, so you should only use it if you considered using __getattr__ and it wasn't enough. It also can have a noticeable performance impact. You might for instance need to use __getattribute__ if you're wrapping a class that defines some attributes, and you want to be able to wrap those attributes in a custom way, so that they work as usual in some situations but get custom behavior in other situations.
Finally, I would say writing your own descriptor is a fairly advanced task. property is a descriptor, and for probably 95% of cases it's the only one you'll need. A good simple example of why you might write your own descriptor is given here: basically, you might do it if you would otherwise have to write several propertys with similar behavior; a descriptor lets you factor out the common behavior to avoid code repetition. Custom descriptors are used, for instance, to drive systems like like Django and SQLAlchemy. If you find yourself writing something at that level of complexity you might need to write a custom descriptor.
In your example, property would be the best choice. It is usually (not always) a red flag if you're doing if name == 'somespecificname' inside __getattribute__. If you only need to specially handle one specific name, you can probably do it without stooping to the level of __getattribute__. Likewise, it doesn't make sense to write your own descriptor if all you write for its __get__ is something you could have written in a property's getter method.
__getattribute__ is the hook that enables property (and other descriptors) to work in the first place and is called for all attribute access on an object. Consider it a lower-level API when a property or even a custom descriptor is not enough for your needs. __getattr__ is called by __getattribute__ when no attribute has been located through other means, as a fallback.
Use property for dynamic attributes with a fixed name, __getattr__ for attributes of a more dynamic nature (e.g. a series of attributes that map to values in an algorithmic manner).
Descriptors are used when you need to bind arbitrary objects to an instance. When you need to replace method objects with something more advanced for example; a recent example is a class-based decorator wrapping methods that needed to support additional attributes and methods on the method object. Generally, when you are still thinking in terms of scalar attributes, you don't need descriptors.
I am trying to implement a class in which an attempt to access any attributes that do not exist in the current class or any of its ancestors will attempt to access those attributes from a member. Below is a trivial version of what I am trying to do.
class Foo:
def __init__(self, value):
self._value = value
def __getattr__(self, name):
return getattr(self._value, name)
if __name__ == '__main__':
print(Foo(5) > Foo(4)) # should do 5 > 4 (or (5).__gt__(4))
However, this raises a TypeError. Even using the operator module's attrgetter class does the same thing. I was taking a look at the documentation regarding customizing attribute access, but I didn't find it an easy read. How can I get around this?
If I understand you correctly, what you are doing is correct, but it still won't work for what you're trying to use it for. The reason is that implicit magic-method lookup does not use __getattr__ (or __getattribute__ or any other such thing). The methods have to actually explicitly be there with their magic names. Your approach will work for normal attributes, but not magic methods. (Note that if you do Foo(5).__lt__(4) explicitly, it will work; it's only the implicit "magic" lookup --- e.g., calling __lt__ when < is used) --- that is blocked.)
This post describes an approach for autogenerating magic methods using a metaclass. If you only need certain methods, you can just define them on the class manually.
__*__ methods will not work unless they actually exist - so neither __getattr__ nor __getattribute__ will allow you to proxy those calls. You must create every single methods manually.
Yes, this does involve quite a bit of copy&paste. And yes, it's perfectly fine in this case.
You might be able to use the werkzeug LocalProxy class as a base or instead of your own class; your code would look like this when using LocalProxy:
print(LocalProxy(lambda: 5) > LocalProxy(lambda: 4))
I am used to that Python allows some neat tricks to delegate functionality to other objects. One example is delegation to contained objects.
But it seams, that I don't have luck, when I want to delegate __contains __:
class A(object):
def __init__(self):
self.mydict = {}
self.__contains__ = self.mydict.__contains__
a = A()
1 in a
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'A' is not iterable
What I am making wrong? When I call a.__contains __(1), everything goes smooth. I even tried to define an __iter __ method in A to make A more look like an iterable, but it did not help. What I am missing out here?
Special methods such as __contains__ are only special when defined on the class, not on the instance (except in legacy classes in Python 2, which you should not use anyway).
So, do your delegation at class level:
class A(object):
def __init__(self):
self.mydict = {}
def __contains__(self, other):
return self.mydict.__contains__(other)
I'd actually prefer to spell the latter as return other in self.mydict, but that's a minor style issue.
Edit: if and when "totally dynamic per-instance redirecting of special methods" (like old-style classes offered) is indispensable, it's not hard to implement it with new-style classes: you just need each instance that has such peculiar need to be wrapped in its own special class. For example:
class BlackMagic(object):
def __init__(self):
self.mydict = {}
self.__class__ = type(self.__class__.__name__, (self.__class__,), {})
self.__class__.__contains__ = self.mydict.__contains__
Essentially, after the little bit of black magic reassigning self.__class__ to a new class object (which behaves just like the previous one but has an empty dict and no other instances except this one self), anywhere in an old-style class you would assign to self.__magicname__, assign to self.__class__.__magicname__ instead (and make sure it's a built-in or staticmethod, not a normal Python function, unless of course in some different case you do want it to receive the self when called on the instance).
Incidentally, the in operator on an instance of this BlackMagic class is faster, as it happens, than with any of the previously proposed solutions -- or at least so I'm measuring with my usual trusty -mtimeit (going directly to the built-in method, instead of following normal lookup routes involving inheritance and descriptors, shaves a bit of the overhead).
A metaclass to automate the self.__class__-per-instance idea would not be hard to write (it could do the dirty work in the generated class's __new__ method, and maybe also set all magic names to actually assign on the class if assigned on the instance, either via __setattr__ or many, many properties). But that would be justified only if the need for this feature was really widespread (e.g. porting a huge ancient Python 1.5.2 project that liberally use "per-instance special methods" to modern Python, including Python 3).
Do I recommend "clever" or "black magic" solutions? No, I don't: almost invariably it's better to do things in simple, straightforward ways. But "almost" is an important word here, and it's nice to have at hand such advanced "hooks" for the rare, but not non-existent, situations where their use may actually be warranted.