In a nutshell, I receive json events via an API and recently I've been learning a lot more about classes. One of the recommended ways to use classes is to implement getters, setters etc.. However, my classes aren't too sophisticated all they're doing is parsing data from a json object and passing better formatted data onto further ETL processes.
Below is a simple example of what I've encountered.
data = {'status': 'ready'}
class StatusHandler:
def __init__(self, data):
self.status = data.get('status', None)
class StatusHandler2:
def __init__(self, data):
self._status = data.get('status', None)
#property
def status(self):
return self._status
without_getter = StatusHandler(data)
print(without_getter.status)
with_getter = StatusHandler2(data)
print(with_getter.status)
Is there anything wrong with me using the class StatusHandler and referencing a status instance variable and using that to pass information forward to other bits of code? I'm just wondering if further down the line as my project gets more complicated that this would be an issue as it doesn't seem to be standard although I could be wrong...
The point of getters/setters is to avoid replacing plain attributes access with computed ones without breaking client code if and when you have to change your implementation. This only make sense for languages that have no support for computed attributes.
Python has a quite strong support for computed attributes thru the descriptor protocol, including the generic builtin property type, so you don't need explicit getters/setters - if you have to change your implementation, just replace affected public attributes by computed ones.
Just make sure to not abuse computed attributes - they should not make any heavy computation, external resource access or so. No one expects what looks like an attribute to have a high cost or raise IOErrors or so ;-)
EDIT
With regard to your example: computed attributes are a way to control attribute access, and making an attribute read-only (not providing a setter for your property) IS a perfectly valid use case - IF you have a reason to make it read-only of course.
I'm coming from the Java world and reading Bruce Eckels' Python 3 Patterns, Recipes and Idioms.
While reading about classes, it goes on to say that in Python there is no need to declare instance variables. You just use them in the constructor, and boom, they are there.
So for example:
class Simple:
def __init__(self, s):
print("inside the simple constructor")
self.s = s
def show(self):
print(self.s)
def showMsg(self, msg):
print(msg + ':', self.show())
If that’s true, then any object of class Simple can just change the value of variable s outside of the class.
For example:
if __name__ == "__main__":
x = Simple("constructor argument")
x.s = "test15" # this changes the value
x.show()
x.showMsg("A message")
In Java, we have been taught about public/private/protected variables. Those keywords make sense because at times you want variables in a class to which no one outside the class has access to.
Why is that not required in Python?
It's cultural. In Python, you don't write to other classes' instance or class variables. In Java, nothing prevents you from doing the same if you really want to - after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.
If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they're not easily visible to code outside the namespace that contains them (although you can get around it if you're determined enough, just like you can get around Java's protections if you work at it).
By the same convention, the _ prefix means _variable should be used internally in the class (or module) only, even if you're not technically prevented from accessing it from somewhere else. You don't play around with another class's variables that look like __foo or _bar.
Private variables in Python is more or less a hack: the interpreter intentionally renames the variable.
class A:
def __init__(self):
self.__var = 123
def printVar(self):
print self.__var
Now, if you try to access __var outside the class definition, it will fail:
>>> x = A()
>>> x.__var # this will return error: "A has no attribute __var"
>>> x.printVar() # this gives back 123
But you can easily get away with this:
>>> x.__dict__ # this will show everything that is contained in object x
# which in this case is something like {'_A__var' : 123}
>>> x._A__var = 456 # you now know the masked name of private variables
>>> x.printVar() # this gives back 456
You probably know that methods in OOP are invoked like this: x.printVar() => A.printVar(x). If A.printVar() can access some field in x, this field can also be accessed outside A.printVar()... After all, functions are created for reusability, and there isn't any special power given to the statements inside.
As correctly mentioned by many of the comments above, let's not forget the main goal of Access Modifiers: To help users of code understand what is supposed to change and what is supposed not to. When you see a private field you don't mess around with it. So it's mostly syntactic sugar which is easily achieved in Python by the _ and __.
Python does not have any private variables like C++ or Java does. You could access any member variable at any time if wanted, too. However, you don't need private variables in Python, because in Python it is not bad to expose your classes' member variables. If you have the need to encapsulate a member variable, you can do this by using "#property" later on without breaking existing client code.
In Python, the single underscore "_" is used to indicate that a method or variable is not considered as part of the public API of a class and that this part of the API could change between different versions. You can use these methods and variables, but your code could break, if you use a newer version of this class.
The double underscore "__" does not mean a "private variable". You use it to define variables which are "class local" and which can not be easily overridden by subclasses. It mangles the variables name.
For example:
class A(object):
def __init__(self):
self.__foobar = None # Will be automatically mangled to self._A__foobar
class B(A):
def __init__(self):
self.__foobar = 1 # Will be automatically mangled to self._B__foobar
self.__foobar's name is automatically mangled to self._A__foobar in class A. In class B it is mangled to self._B__foobar. So every subclass can define its own variable __foobar without overriding its parents variable(s). But nothing prevents you from accessing variables beginning with double underscores. However, name mangling prevents you from calling this variables /methods incidentally.
I strongly recommend you watch Raymond Hettinger's Python's class development toolkit from PyCon 2013, which gives a good example why and how you should use #property and "__"-instance variables.
If you have exposed public variables and you have the need to encapsulate them, then you can use #property. Therefore you can start with the simplest solution possible. You can leave member variables public unless you have a concrete reason to not do so. Here is an example:
class Distance:
def __init__(self, meter):
self.meter = meter
d = Distance(1.0)
print(d.meter)
# prints 1.0
class Distance:
def __init__(self, meter):
# Customer request: Distances must be stored in millimeters.
# Public available internals must be changed.
# This would break client code in C++.
# This is why you never expose public variables in C++ or Java.
# However, this is Python.
self.millimeter = meter * 1000
# In Python we have #property to the rescue.
#property
def meter(self):
return self.millimeter *0.001
#meter.setter
def meter(self, value):
self.millimeter = value * 1000
d = Distance(1.0)
print(d.meter)
# prints 1.0
There is a variation of private variables in the underscore convention.
In [5]: class Test(object):
...: def __private_method(self):
...: return "Boo"
...: def public_method(self):
...: return self.__private_method()
...:
In [6]: x = Test()
In [7]: x.public_method()
Out[7]: 'Boo'
In [8]: x.__private_method()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-fa17ce05d8bc> in <module>()
----> 1 x.__private_method()
AttributeError: 'Test' object has no attribute '__private_method'
There are some subtle differences, but for the sake of programming pattern ideological purity, it's good enough.
There are examples out there of #private decorators that more closely implement the concept, but your mileage may vary. Arguably, one could also write a class definition that uses meta.
As mentioned earlier, you can indicate that a variable or method is private by prefixing it with an underscore. If you don't feel like this is enough, you can always use the property decorator. Here's an example:
class Foo:
def __init__(self, bar):
self._bar = bar
#property
def bar(self):
"""Getter for '_bar'."""
return self._bar
This way, someone or something that references bar is actually referencing the return value of the bar function rather than the variable itself, and therefore it can be accessed but not changed. However, if someone really wanted to, they could simply use _bar and assign a new value to it. There is no surefire way to prevent someone from accessing variables and methods that you wish to hide, as has been said repeatedly. However, using property is the clearest message you can send that a variable is not to be edited. property can also be used for more complex getter/setter/deleter access paths, as explained here: https://docs.python.org/3/library/functions.html#property
Python has limited support for private identifiers, through a feature that automatically prepends the class name to any identifiers starting with two underscores. This is transparent to the programmer, for the most part, but the net effect is that any variables named this way can be used as private variables.
See here for more on that.
In general, Python's implementation of object orientation is a bit primitive compared to other languages. But I enjoy this, actually. It's a very conceptually simple implementation and fits well with the dynamic style of the language.
The only time I ever use private variables is when I need to do other things when writing to or reading from the variable and as such I need to force the use of a setter and/or getter.
Again this goes to culture, as already stated. I've been working on projects where reading and writing other classes variables was free-for-all. When one implementation became deprecated it took a lot longer to identify all code paths that used that function. When use of setters and getters was forced, a debug statement could easily be written to identify that the deprecated method had been called and the code path that calls it.
When you are on a project where anyone can write an extension, notifying users about deprecated methods that are to disappear in a few releases hence is vital to keep module breakage at a minimum upon upgrades.
So my answer is; if you and your colleagues maintain a simple code set then protecting class variables is not always necessary. If you are writing an extensible system then it becomes imperative when changes to the core is made that needs to be caught by all extensions using the code.
"In java, we have been taught about public/private/protected variables"
"Why is that not required in python?"
For the same reason, it's not required in Java.
You're free to use -- or not use private and protected.
As a Python and Java programmer, I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected.
Why not?
Here's my question "protected from whom?"
Other programmers on my team? They have the source. What does protected mean when they can change it?
Other programmers on other teams? They work for the same company. They can -- with a phone call -- get the source.
Clients? It's work-for-hire programming (generally). The clients (generally) own the code.
So, who -- precisely -- am I protecting it from?
In Python 3, if you just want to "encapsulate" the class attributes, like in Java, you can just do the same thing like this:
class Simple:
def __init__(self, str):
print("inside the simple constructor")
self.__s = str
def show(self):
print(self.__s)
def showMsg(self, msg):
print(msg + ':', self.show())
To instantiate this do:
ss = Simple("lol")
ss.show()
Note that: print(ss.__s) will throw an error.
In practice, Python 3 will obfuscate the global attribute name. It is turning this like a "private" attribute, like in Java. The attribute's name is still global, but in an inaccessible way, like a private attribute in other languages.
But don't be afraid of it. It doesn't matter. It does the job too. ;)
Private and protected concepts are very important. But Python is just a tool for prototyping and rapid development with restricted resources available for development, and that is why some of the protection levels are not so strictly followed in Python. You can use "__" in a class member. It works properly, but it does not look good enough. Each access to such field contains these characters.
Also, you can notice that the Python OOP concept is not perfect. Smalltalk or Ruby are much closer to a pure OOP concept. Even C# or Java are closer.
Python is a very good tool. But it is a simplified OOP language. Syntactically and conceptually simplified. The main goal of Python's existence is to bring to developers the possibility to write easy readable code with a high abstraction level in a very fast manner.
Here's how I handle Python 3 class fields:
class MyClass:
def __init__(self, public_read_variable, private_variable):
self.public_read_variable_ = public_read_variable
self.__private_variable = private_variable
I access the __private_variable with two underscores only inside MyClass methods.
I do read access of the public_read_variable_ with one underscore
outside the class, but never modify the variable:
my_class = MyClass("public", "private")
print(my_class.public_read_variable_) # OK
my_class.public_read_variable_ = 'another value' # NOT OK, don't do that.
So I’m new to Python but I have a background in C# and JavaScript. Python feels like a mix of the two in terms of features. JavaScript also struggles in this area and the way around it here, is to create a closure. This prevents access to data you don’t want to expose by returning a different object.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
return printer # returns the nested function
# Now let's try calling this function.
# Output: Hello
another = print_msg("Hello")
another()
https://www.programiz.com/python-programming/closure
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures#emulating_private_methods_with_closures
About sources (to change the access rights and thus bypass language encapsulation like Java or C++):
You don't always have the sources and even if you do, the sources are managed by a system that only allows certain programmers to access a source (in a professional context). Often, every programmer is responsible for certain classes and therefore knows what he can and cannot do. The source manager also locks the sources being modified and of course, manages the access rights of programmers.
So I trust more in software than in human, by experience. So convention is good, but multiple protections are better, like access management (real private variable) + sources management.
I have been thinking about private class attributes and methods (named members in further reading) since I have started to develop a package that I want to publish. The thought behind it were never to make it impossible to overwrite these members, but to have a warning for those who touch them. I came up with a few solutions that might help. The first solution is used in one of my favorite Python books, Fluent Python.
Upsides of technique 1:
It is unlikely to be overwritten by accident.
It is easily understood and implemented.
Its easier to handle than leading double underscore for instance attributes.
*In the book the hash-symbol was used, but you could use integer converted to strings as well. In Python it is forbidden to use klass.1
class Technique1:
def __init__(self, name, value):
setattr(self, f'private#{name}', value)
setattr(self, f'1{name}', value)
Downsides of technique 1:
Methods are not easily protected with this technique though. It is possible.
Attribute lookups are just possible via getattr
Still no warning to the user
Another solution I came across was to write __setattr__. Pros:
It is easily implemented and understood
It works with methods
Lookup is not affected
The user gets a warning or error
class Demonstration:
def __init__(self):
self.a = 1
def method(self):
return None
def __setattr__(self, name, value):
if not getattr(self, name, None):
super().__setattr__(name, value)
else:
raise ValueError(f'Already reserved name: {name}')
d = Demonstration()
#d.a = 2
d.method = None
Cons:
You can still overwrite the class
To have variables not just constants, you need to map allowed input.
Subclasses can still overwrite methods
To prevent subclasses from overwriting methods you can use __init_subclass__:
class Demonstration:
__protected = ['method']
def method(self):
return None
def __init_subclass__(cls):
protected_methods = Demonstration.__protected
subclass_methods = dir(cls)
for i in protected_methods:
p = getattr(Demonstration,i)
j = getattr(cls, i)
if not p is j:
raise ValueError(f'Protected method "{i}" was touched')
You see, there are ways to protect your class members, but it isn't any guarantee that users don't overwrite them anyway. This should just give you some ideas. In the end, you could also use a meta class, but this might open up new dangers to encounter. The techniques used here are also very simple minded and you should definitely take a look at the documentation, you can find useful feature to this technique and customize them to your need.
So I've used python as a functional language for a while but I'm trying to do thing "right" and use classes now... and falling down. I'm trying to write a classmethod that can instantiate multiple members of the class (use case is load rows from SQLAlchemy.) I'd like to just be able to call the classmethod and have it return a status code (success/failure) rather than returning a list of objects. Then to access the objects I'll iterate through the class. Here's my code so far (which fails to iterate when I use the classmethod, works fine when I use the normal constructor.) Am I way off-base/crazy here? What's the "pythonic" way to do this? Any help is appreciated and thank you.
class KeepRefs(object):
__refs__ = defaultdict(list)
def __init__(self):
self.__refs__[self.__class__].append(weakref.ref(self))
#classmethod
def get_instances(cls):
for inst_ref in cls.__refs__[cls]:
inst = inst_ref()
if inst is not None:
yield inst
class Credentials(KeepRefs):
def __init__(self,name, username, password):
super(Credentials, self).__init__()
self.name=name
self.username=username
self.password=password
#classmethod
def loadcreds(cls):
Credentials('customer1','bob','password')
return True
success = Credentials.loadcreds()
for i in Credentials.get_instances():
print (i.name)
In your own words - yes, you are off-base and crazy :)
Status-Codes are a thing of C, not languages with proper exception semantics as Python. Modifying global state is a sure recipe for disaster. So - don't do it. Return a list of objects. Throw an exception if something disastrous happens, and just return an empty list if there happen to be no objects. This allows the client code to just do
for item in Thingies.load_thingies():
... # this won't do anything if load_thingies gave us an empty list
without having to painstakingly check something before using it.
Functional languages have certain advantages, and you are going too far the other way in your exploration of the procedural style. Global variables and class variable have their place, but what will happen if you need to fire off two SQAlchemy queries and consume the results in parallels? The second query will stomp over the class attributes that the first one still needs, is what. Using an object attribute (instance attribute) solves the problem, since each result contains its own handle.
If your concern is to avoid pre-fetching the array of results, you are in luck because Python offers the perfect solution: Generators, which are basically lazy functions. They are so nicely integrated in Python, I bet you didn't know you've been using them with every for-loop you write.
I have a library which stores additional data for foreign user objects in a WeakKeyDictionary:
extra_stuff = weakref.WeakKeyDictionary()
def get_extra_stuff_for_obj(o):
return extra_stuff[o]
When user object is copied, I want the copy to have the same extra stuff. However, I have limited control over the user object. I would like to define a class decorator for user object classes which will be used in this manner:
def has_extra_stuff(klass):
def copy_with_hook(self):
new = magic_goes_here(self)
extra_stuff[new] = extra_stuff[self]
klass.__copy__ = copy_with_hook
return klass
This is easy if klass already defines __copy__, because I can close copy_with_hook over the original and call it. However, typically it's not defined. What to call here? This obviously can't be copy.copy, because that would result in infinite recursion.
I found this question which appears to ask the exact same question, but afaict the answer is wrong because this results in a deepcopy, not a copy. I would also be unable to do this, as I need to install hooks for both deepcopy and copy. (Incidentally, I would have continued the discussion in that question, but having no reputation I am not able to do this.)
I looked at what the copy module does, which is a bunch of voodoo involving __reduce_ex(). I can obviously cut/paste this into my code, or call its private methods directly, but I would consider this an absolute last resort. This seems like such a simple thing, I'm convinced I'm missing a simple solution.
Essentially, you need to (A) copy and preserve the original __copy__ if present (and delegate to it), otherwise (B) trick copy.copy into not using your newly-added __copy__ (and delegate to copy,copy).
So, for example...:
import copy
import threading
copylock = threading.RLock()
def has_extra_stuff(klass):
def simple_copy_with_hook(self):
with copylock:
new = original_copy(self)
extra_stuff[new] = extra_stuff[self]
def tricky_case(self):
with copylock:
try:
klass.__copy__ = None
new = copy.copy(self)
finally:
klass.__copy__ = tricky_case
extra_stuff[new] = extra_stuff[self]
original_copy = getattr(klass, '__copy__', None)
if original_copy is None:
klass.__copy__ = tricky_case
else:
klass.__copy__ = simple_copy_with_hook
return klass
Not the most elegant code ever written, but at least it just plays around with klass, without monkey-patching nor copy-and-pasting copy.py itself:-)
Added: since the OP mentioned in a comment he can't use this solution because the app is multi-threaded, added appropriate locking to make it actually usable. Using a single global re-entrant lock to ensure against deadlocks due to out-of-order acquires of multiple locks among multiple threads, and perhaps over-locked "just in case" although I suspect the simple case and the dict assignent in the tricky case probably don't need the lock... but, when threading threatens, better safe than sorry:-)
After some playing I've come up with the following:
import copy_reg, copy
# Library
def hook(new):
print "new object: %s" % new
def pickle_hooked(o):
pickle = o.__reduce_ex__(2)
creator = pickle[0]
def creator_hook(*args, **kwargs):
new = creator(*args, **kwargs)
hook(new)
return new
return (creator_hook,) + pickle[1:]
def with_copy_hook(klass):
copy_reg.pickle(klass, pickle_hooked)
return klass
# Application
#with_copy_hook
class A(object):
def __init__(self, value):
self.value = value
This registers a pass-through copy hook which also has the advantage of working for both copy and deepcopy. The only detail of the return value of reduce_ex it needs to concern itself with is that the first element in the tuple is a creator function. All other details are handed off to existing library code. It is not perfect, because I still don't see a way of detecting if the target class has already registered a pickler.
I have several TextField columns on my UserProfile object which contain JSON objects. I've also defined a setter/getter property for each column which encapsulates the logic for serializing and deserializing the JSON into python datastructures.
The nature of this data ensures that it will be accessed many times by view and template logic within a single Request. To save on deserialization costs, I would like to memoize the python datastructures on read, invalidating on direct write to the property or save signal from the model object.
Where/How do I store the memo? I'm nervous about using instance variables, as I don't understand the magic behind how any particular UserProfile is instantiated by a query. Is __init__ safe to use, or do I need to check the existence of the memo attribute via hasattr() at each read?
Here's an example of my current implementation:
class UserProfile(Model):
text_json = models.TextField(default=text_defaults)
#property
def text(self):
if not hasattr(self, "text_memo"):
self.text_memo = None
self.text_memo = self.text_memo or simplejson.loads(self.text_json)
return self.text_memo
#text.setter
def text(self, value=None):
self.text_memo = None
self.text_json = simplejson.dumps(value)
You may be interested in a built-in django decorator django.utils.functional.memoize.
Django uses this to cache expensive operation like url resolving.
Generally, I use a pattern like this:
def get_expensive_operation(self):
if not hasattr(self, '_expensive_operation'):
self._expensive_operation = self.expensive_operation()
return self._expensive_operation
Then you use the get_expensive_operation method to access the data.
However, in your particular case, I think you are approaching this in slightly the wrong way. You need to do the deserialization when the model is first loaded from the database, and serialize on save only. Then you can simply access the attributes as a standard Python dictionary each time. You can do this by defining a custom JSONField type, subclassing models.TextField, which overrides to_python and get_db_prep_save.
In fact someone's already done it: see here.
For class methods, you should use django.utils.functional.cached_property.
Since the first argument on a class method is self, memoize will maintain a reference to the object and the results of the function even after you've thrown it away. This can cause memory leaks by preventing the garbage collector from cleaning up the stale object. cached_property turns Daniel's suggestion into a decorator.