Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 days ago.
The community is reviewing whether to reopen this question as of 9 days ago.
Improve this question
I have Python classes, of which I need only one instance at runtime, so it would be sufficient to have the attributes only once per class and not per instance. If there would be more than one instance (which won't happen), all instance should have the same configuration. I wonder which of the following options would be better or more "idiomatic" Python.
Class variables:
class MyController(Controller):
path = "something/"
children = [AController, BController]
def action(self, request):
pass
Instance variables:
class MyController(Controller):
def __init__(self):
self.path = "something/"
self.children = [AController, BController]
def action(self, request):
pass
If you have only one instance anyway, it's best to make all variables per-instance, simply because they will be accessed (a little bit) faster (one less level of "lookup" due to the "inheritance" from class to instance), and there are no downsides to weigh against this small advantage.
Further echoing Mike's and Alex's advice and adding my own color...
Using instance attributes are the typical... the more idiomatic Python. Class attributes are not used used as much, since their use cases are specific. The same is true for static and class methods vs. "normal" methods. They're special constructs addressing specific use cases, else it's code created by an aberrant programmer wanting to show off they know some obscure corner of Python programming.
Alex mentions in his reply that access will be (a little bit) faster due to one less level of lookup... let me further clarify for those who don't know about how this works yet. It is very similar to variable access -- the search order of which is:
locals
nonlocals
globals
built-ins
For attribute access, the order is:
instance
class
base classes as determined by the MRO (method resolution order)
Both techniques work in an "inside-out" manner, meaning the most local objects are checked first, then outer layers are checked in succession.
In your example above, let's say you're looking up the path attribute. When it encounters a reference like "self.path", Python will look at the instance attributes first for a match. When that fails, it checks the class from which the object was instantiated from. Finally, it will search the base classes. As Alex stated, if your attribute is found in the instance, it doesn't need to look elsewhere, hence your little bit of time savings.
However, if you insist on class attributes, you need that extra lookup. Or, your other alternative is to refer to the object via the class instead of the instance, e.g., MyController.path instead of self.path. That's a direct lookup which will get around the deferred lookup, but as alex mentions below, it's a global variable, so you lose that bit that you thought you were going to save (unless you create a local reference to the [global] class name).
The bottom-line is that you should use instance attributes most of the time. However, there will be occasions where a class attribute is the right tool for the job. Code using both at the same time will require the most diligence, because using self will only get you the instance attribute object and shadows access to the class attribute of the same name. In this case, you must use access the attribute by the class name in order to reference it.
When in doubt, you probably want an instance attribute.
Class attributes are best reserved for special cases where they make sense. The only very-common use case is methods. It isn't uncommon to use class attributes for read-only constants that instances need to know (though the only benefit to this is if you also want access from outside the class), but you should certainly be cautious about storing any state in them, which is seldom what you want. Even if you will only have one instance, you should write the class like you would any other, which usually means using instance attributes.
Same question at Performance of accessing class variables in Python - the code here adapted from #Edward Loper
Local Variables are the fastest to access, pretty much tied with Module Variables, followed by Class Variables, followed by Instance Variables.
There are 4 scopes you can access variables from:
Instance Variables (self.varname)
Class Variables (Classname.varname)
Module Variables (VARNAME)
Local Variables (varname)
The test:
import timeit
setup='''
XGLOBAL= 5
class A:
xclass = 5
def __init__(self):
self.xinstance = 5
def f1(self):
xlocal = 5
x = self.xinstance
def f2(self):
xlocal = 5
x = A.xclass
def f3(self):
xlocal = 5
x = XGLOBAL
def f4(self):
xlocal = 5
x = xlocal
a = A()
'''
print('access via instance variable: %.3f' % timeit.timeit('a.f1()', setup=setup, number=300000000) )
print('access via class variable: %.3f' % timeit.timeit('a.f2()', setup=setup, number=300000000) )
print('access via module variable: %.3f' % timeit.timeit('a.f3()', setup=setup, number=300000000) )
print('access via local variable: %.3f' % timeit.timeit('a.f4()', setup=setup, number=300000000) )
The result:
access via instance variable: 93.456
access via class variable: 82.169
access via module variable: 72.634
access via local variable: 72.199
Related
I'm coming from the Java world and reading Bruce Eckels' Python 3 Patterns, Recipes and Idioms.
While reading about classes, it goes on to say that in Python there is no need to declare instance variables. You just use them in the constructor, and boom, they are there.
So for example:
class Simple:
def __init__(self, s):
print("inside the simple constructor")
self.s = s
def show(self):
print(self.s)
def showMsg(self, msg):
print(msg + ':', self.show())
If that’s true, then any object of class Simple can just change the value of variable s outside of the class.
For example:
if __name__ == "__main__":
x = Simple("constructor argument")
x.s = "test15" # this changes the value
x.show()
x.showMsg("A message")
In Java, we have been taught about public/private/protected variables. Those keywords make sense because at times you want variables in a class to which no one outside the class has access to.
Why is that not required in Python?
It's cultural. In Python, you don't write to other classes' instance or class variables. In Java, nothing prevents you from doing the same if you really want to - after all, you can always edit the source of the class itself to achieve the same effect. Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely.
If you want to emulate private variables for some reason, you can always use the __ prefix from PEP 8. Python mangles the names of variables like __foo so that they're not easily visible to code outside the namespace that contains them (although you can get around it if you're determined enough, just like you can get around Java's protections if you work at it).
By the same convention, the _ prefix means _variable should be used internally in the class (or module) only, even if you're not technically prevented from accessing it from somewhere else. You don't play around with another class's variables that look like __foo or _bar.
Private variables in Python is more or less a hack: the interpreter intentionally renames the variable.
class A:
def __init__(self):
self.__var = 123
def printVar(self):
print self.__var
Now, if you try to access __var outside the class definition, it will fail:
>>> x = A()
>>> x.__var # this will return error: "A has no attribute __var"
>>> x.printVar() # this gives back 123
But you can easily get away with this:
>>> x.__dict__ # this will show everything that is contained in object x
# which in this case is something like {'_A__var' : 123}
>>> x._A__var = 456 # you now know the masked name of private variables
>>> x.printVar() # this gives back 456
You probably know that methods in OOP are invoked like this: x.printVar() => A.printVar(x). If A.printVar() can access some field in x, this field can also be accessed outside A.printVar()... After all, functions are created for reusability, and there isn't any special power given to the statements inside.
As correctly mentioned by many of the comments above, let's not forget the main goal of Access Modifiers: To help users of code understand what is supposed to change and what is supposed not to. When you see a private field you don't mess around with it. So it's mostly syntactic sugar which is easily achieved in Python by the _ and __.
Python does not have any private variables like C++ or Java does. You could access any member variable at any time if wanted, too. However, you don't need private variables in Python, because in Python it is not bad to expose your classes' member variables. If you have the need to encapsulate a member variable, you can do this by using "#property" later on without breaking existing client code.
In Python, the single underscore "_" is used to indicate that a method or variable is not considered as part of the public API of a class and that this part of the API could change between different versions. You can use these methods and variables, but your code could break, if you use a newer version of this class.
The double underscore "__" does not mean a "private variable". You use it to define variables which are "class local" and which can not be easily overridden by subclasses. It mangles the variables name.
For example:
class A(object):
def __init__(self):
self.__foobar = None # Will be automatically mangled to self._A__foobar
class B(A):
def __init__(self):
self.__foobar = 1 # Will be automatically mangled to self._B__foobar
self.__foobar's name is automatically mangled to self._A__foobar in class A. In class B it is mangled to self._B__foobar. So every subclass can define its own variable __foobar without overriding its parents variable(s). But nothing prevents you from accessing variables beginning with double underscores. However, name mangling prevents you from calling this variables /methods incidentally.
I strongly recommend you watch Raymond Hettinger's Python's class development toolkit from PyCon 2013, which gives a good example why and how you should use #property and "__"-instance variables.
If you have exposed public variables and you have the need to encapsulate them, then you can use #property. Therefore you can start with the simplest solution possible. You can leave member variables public unless you have a concrete reason to not do so. Here is an example:
class Distance:
def __init__(self, meter):
self.meter = meter
d = Distance(1.0)
print(d.meter)
# prints 1.0
class Distance:
def __init__(self, meter):
# Customer request: Distances must be stored in millimeters.
# Public available internals must be changed.
# This would break client code in C++.
# This is why you never expose public variables in C++ or Java.
# However, this is Python.
self.millimeter = meter * 1000
# In Python we have #property to the rescue.
#property
def meter(self):
return self.millimeter *0.001
#meter.setter
def meter(self, value):
self.millimeter = value * 1000
d = Distance(1.0)
print(d.meter)
# prints 1.0
There is a variation of private variables in the underscore convention.
In [5]: class Test(object):
...: def __private_method(self):
...: return "Boo"
...: def public_method(self):
...: return self.__private_method()
...:
In [6]: x = Test()
In [7]: x.public_method()
Out[7]: 'Boo'
In [8]: x.__private_method()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-fa17ce05d8bc> in <module>()
----> 1 x.__private_method()
AttributeError: 'Test' object has no attribute '__private_method'
There are some subtle differences, but for the sake of programming pattern ideological purity, it's good enough.
There are examples out there of #private decorators that more closely implement the concept, but your mileage may vary. Arguably, one could also write a class definition that uses meta.
As mentioned earlier, you can indicate that a variable or method is private by prefixing it with an underscore. If you don't feel like this is enough, you can always use the property decorator. Here's an example:
class Foo:
def __init__(self, bar):
self._bar = bar
#property
def bar(self):
"""Getter for '_bar'."""
return self._bar
This way, someone or something that references bar is actually referencing the return value of the bar function rather than the variable itself, and therefore it can be accessed but not changed. However, if someone really wanted to, they could simply use _bar and assign a new value to it. There is no surefire way to prevent someone from accessing variables and methods that you wish to hide, as has been said repeatedly. However, using property is the clearest message you can send that a variable is not to be edited. property can also be used for more complex getter/setter/deleter access paths, as explained here: https://docs.python.org/3/library/functions.html#property
Python has limited support for private identifiers, through a feature that automatically prepends the class name to any identifiers starting with two underscores. This is transparent to the programmer, for the most part, but the net effect is that any variables named this way can be used as private variables.
See here for more on that.
In general, Python's implementation of object orientation is a bit primitive compared to other languages. But I enjoy this, actually. It's a very conceptually simple implementation and fits well with the dynamic style of the language.
The only time I ever use private variables is when I need to do other things when writing to or reading from the variable and as such I need to force the use of a setter and/or getter.
Again this goes to culture, as already stated. I've been working on projects where reading and writing other classes variables was free-for-all. When one implementation became deprecated it took a lot longer to identify all code paths that used that function. When use of setters and getters was forced, a debug statement could easily be written to identify that the deprecated method had been called and the code path that calls it.
When you are on a project where anyone can write an extension, notifying users about deprecated methods that are to disappear in a few releases hence is vital to keep module breakage at a minimum upon upgrades.
So my answer is; if you and your colleagues maintain a simple code set then protecting class variables is not always necessary. If you are writing an extensible system then it becomes imperative when changes to the core is made that needs to be caught by all extensions using the code.
"In java, we have been taught about public/private/protected variables"
"Why is that not required in python?"
For the same reason, it's not required in Java.
You're free to use -- or not use private and protected.
As a Python and Java programmer, I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected.
Why not?
Here's my question "protected from whom?"
Other programmers on my team? They have the source. What does protected mean when they can change it?
Other programmers on other teams? They work for the same company. They can -- with a phone call -- get the source.
Clients? It's work-for-hire programming (generally). The clients (generally) own the code.
So, who -- precisely -- am I protecting it from?
In Python 3, if you just want to "encapsulate" the class attributes, like in Java, you can just do the same thing like this:
class Simple:
def __init__(self, str):
print("inside the simple constructor")
self.__s = str
def show(self):
print(self.__s)
def showMsg(self, msg):
print(msg + ':', self.show())
To instantiate this do:
ss = Simple("lol")
ss.show()
Note that: print(ss.__s) will throw an error.
In practice, Python 3 will obfuscate the global attribute name. It is turning this like a "private" attribute, like in Java. The attribute's name is still global, but in an inaccessible way, like a private attribute in other languages.
But don't be afraid of it. It doesn't matter. It does the job too. ;)
Private and protected concepts are very important. But Python is just a tool for prototyping and rapid development with restricted resources available for development, and that is why some of the protection levels are not so strictly followed in Python. You can use "__" in a class member. It works properly, but it does not look good enough. Each access to such field contains these characters.
Also, you can notice that the Python OOP concept is not perfect. Smalltalk or Ruby are much closer to a pure OOP concept. Even C# or Java are closer.
Python is a very good tool. But it is a simplified OOP language. Syntactically and conceptually simplified. The main goal of Python's existence is to bring to developers the possibility to write easy readable code with a high abstraction level in a very fast manner.
Here's how I handle Python 3 class fields:
class MyClass:
def __init__(self, public_read_variable, private_variable):
self.public_read_variable_ = public_read_variable
self.__private_variable = private_variable
I access the __private_variable with two underscores only inside MyClass methods.
I do read access of the public_read_variable_ with one underscore
outside the class, but never modify the variable:
my_class = MyClass("public", "private")
print(my_class.public_read_variable_) # OK
my_class.public_read_variable_ = 'another value' # NOT OK, don't do that.
So I’m new to Python but I have a background in C# and JavaScript. Python feels like a mix of the two in terms of features. JavaScript also struggles in this area and the way around it here, is to create a closure. This prevents access to data you don’t want to expose by returning a different object.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
return printer # returns the nested function
# Now let's try calling this function.
# Output: Hello
another = print_msg("Hello")
another()
https://www.programiz.com/python-programming/closure
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures#emulating_private_methods_with_closures
About sources (to change the access rights and thus bypass language encapsulation like Java or C++):
You don't always have the sources and even if you do, the sources are managed by a system that only allows certain programmers to access a source (in a professional context). Often, every programmer is responsible for certain classes and therefore knows what he can and cannot do. The source manager also locks the sources being modified and of course, manages the access rights of programmers.
So I trust more in software than in human, by experience. So convention is good, but multiple protections are better, like access management (real private variable) + sources management.
I have been thinking about private class attributes and methods (named members in further reading) since I have started to develop a package that I want to publish. The thought behind it were never to make it impossible to overwrite these members, but to have a warning for those who touch them. I came up with a few solutions that might help. The first solution is used in one of my favorite Python books, Fluent Python.
Upsides of technique 1:
It is unlikely to be overwritten by accident.
It is easily understood and implemented.
Its easier to handle than leading double underscore for instance attributes.
*In the book the hash-symbol was used, but you could use integer converted to strings as well. In Python it is forbidden to use klass.1
class Technique1:
def __init__(self, name, value):
setattr(self, f'private#{name}', value)
setattr(self, f'1{name}', value)
Downsides of technique 1:
Methods are not easily protected with this technique though. It is possible.
Attribute lookups are just possible via getattr
Still no warning to the user
Another solution I came across was to write __setattr__. Pros:
It is easily implemented and understood
It works with methods
Lookup is not affected
The user gets a warning or error
class Demonstration:
def __init__(self):
self.a = 1
def method(self):
return None
def __setattr__(self, name, value):
if not getattr(self, name, None):
super().__setattr__(name, value)
else:
raise ValueError(f'Already reserved name: {name}')
d = Demonstration()
#d.a = 2
d.method = None
Cons:
You can still overwrite the class
To have variables not just constants, you need to map allowed input.
Subclasses can still overwrite methods
To prevent subclasses from overwriting methods you can use __init_subclass__:
class Demonstration:
__protected = ['method']
def method(self):
return None
def __init_subclass__(cls):
protected_methods = Demonstration.__protected
subclass_methods = dir(cls)
for i in protected_methods:
p = getattr(Demonstration,i)
j = getattr(cls, i)
if not p is j:
raise ValueError(f'Protected method "{i}" was touched')
You see, there are ways to protect your class members, but it isn't any guarantee that users don't overwrite them anyway. This should just give you some ideas. In the end, you could also use a meta class, but this might open up new dangers to encounter. The techniques used here are also very simple minded and you should definitely take a look at the documentation, you can find useful feature to this technique and customize them to your need.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 days ago.
The community is reviewing whether to reopen this question as of 9 days ago.
Improve this question
I have Python classes, of which I need only one instance at runtime, so it would be sufficient to have the attributes only once per class and not per instance. If there would be more than one instance (which won't happen), all instance should have the same configuration. I wonder which of the following options would be better or more "idiomatic" Python.
Class variables:
class MyController(Controller):
path = "something/"
children = [AController, BController]
def action(self, request):
pass
Instance variables:
class MyController(Controller):
def __init__(self):
self.path = "something/"
self.children = [AController, BController]
def action(self, request):
pass
If you have only one instance anyway, it's best to make all variables per-instance, simply because they will be accessed (a little bit) faster (one less level of "lookup" due to the "inheritance" from class to instance), and there are no downsides to weigh against this small advantage.
Further echoing Mike's and Alex's advice and adding my own color...
Using instance attributes are the typical... the more idiomatic Python. Class attributes are not used used as much, since their use cases are specific. The same is true for static and class methods vs. "normal" methods. They're special constructs addressing specific use cases, else it's code created by an aberrant programmer wanting to show off they know some obscure corner of Python programming.
Alex mentions in his reply that access will be (a little bit) faster due to one less level of lookup... let me further clarify for those who don't know about how this works yet. It is very similar to variable access -- the search order of which is:
locals
nonlocals
globals
built-ins
For attribute access, the order is:
instance
class
base classes as determined by the MRO (method resolution order)
Both techniques work in an "inside-out" manner, meaning the most local objects are checked first, then outer layers are checked in succession.
In your example above, let's say you're looking up the path attribute. When it encounters a reference like "self.path", Python will look at the instance attributes first for a match. When that fails, it checks the class from which the object was instantiated from. Finally, it will search the base classes. As Alex stated, if your attribute is found in the instance, it doesn't need to look elsewhere, hence your little bit of time savings.
However, if you insist on class attributes, you need that extra lookup. Or, your other alternative is to refer to the object via the class instead of the instance, e.g., MyController.path instead of self.path. That's a direct lookup which will get around the deferred lookup, but as alex mentions below, it's a global variable, so you lose that bit that you thought you were going to save (unless you create a local reference to the [global] class name).
The bottom-line is that you should use instance attributes most of the time. However, there will be occasions where a class attribute is the right tool for the job. Code using both at the same time will require the most diligence, because using self will only get you the instance attribute object and shadows access to the class attribute of the same name. In this case, you must use access the attribute by the class name in order to reference it.
When in doubt, you probably want an instance attribute.
Class attributes are best reserved for special cases where they make sense. The only very-common use case is methods. It isn't uncommon to use class attributes for read-only constants that instances need to know (though the only benefit to this is if you also want access from outside the class), but you should certainly be cautious about storing any state in them, which is seldom what you want. Even if you will only have one instance, you should write the class like you would any other, which usually means using instance attributes.
Same question at Performance of accessing class variables in Python - the code here adapted from #Edward Loper
Local Variables are the fastest to access, pretty much tied with Module Variables, followed by Class Variables, followed by Instance Variables.
There are 4 scopes you can access variables from:
Instance Variables (self.varname)
Class Variables (Classname.varname)
Module Variables (VARNAME)
Local Variables (varname)
The test:
import timeit
setup='''
XGLOBAL= 5
class A:
xclass = 5
def __init__(self):
self.xinstance = 5
def f1(self):
xlocal = 5
x = self.xinstance
def f2(self):
xlocal = 5
x = A.xclass
def f3(self):
xlocal = 5
x = XGLOBAL
def f4(self):
xlocal = 5
x = xlocal
a = A()
'''
print('access via instance variable: %.3f' % timeit.timeit('a.f1()', setup=setup, number=300000000) )
print('access via class variable: %.3f' % timeit.timeit('a.f2()', setup=setup, number=300000000) )
print('access via module variable: %.3f' % timeit.timeit('a.f3()', setup=setup, number=300000000) )
print('access via local variable: %.3f' % timeit.timeit('a.f4()', setup=setup, number=300000000) )
The result:
access via instance variable: 93.456
access via class variable: 82.169
access via module variable: 72.634
access via local variable: 72.199
Python is supposed to be fun, simple and easy to learn.
Instead, it's been a huge pain.
I've discovered that all the errors I'm getting are related to me not declaring each variable global in each function.
So for my toy program of dressUp, I have to write:
hatColor = ""
shirtColor = ""
pantsColor = ""
def pickWardrobe(hat, shirt, pants):
global hatColor
global shirtColor
global pantsColor
...
This gets really annoying when I have 20 functions, and each one needs to have 20 global declarations at the beginning.
Is there any way to avoid this?
Thanks!
ADDED
I am getting tons of `UnboundLocalError - local variable X referenced before assignment.
Why am I doing this? Because I need to write a py file that can do some calculations for me. I don't want it all in the same function, or it gets messy and I can't reuse code. But if I split the work among a few functions, I have to declare these annoying globals over and over.
Classes versus global variables
global is common to all
class is a template for an object, representing something, here it could be person dressed up somehow.
Class might have class properties, these are not so commonly used, as they are shared by all instances (sort of "global for classes).
Classes start living as soon as you instantiate them, it means, the pattern defined by class definition is realized in form of unique object.
Such an object, called instance, might have it's own properties, which are not shared with other instances.
I am sometime thinking about a class as of a can - class definition means "can is something you can put thing into" and instance is real tangible can, which has a name of it and in Python I put property values into it, which are bound to the name of given can holder.
DressUp class with real instance properties
Properties in "holmeswatson" solution are bound to class definition. You would run into problems if you would use multiple instances of DressUp, they would be sharing the properties over class definition.
It is better and safer to use it as instance variables, which are over self bound to instance of the class, not to class definition.
Modified code:
class DressUp:
def __init__(self, name, hatColor="", shirtColor=""):
self.name = name
self.hatColor = hatColor
self.shirtColor = shirtColor
def pickWardrobe(self,hat, shirt):
self.hatColor = hat
self.shirtColor = shirt
def __repr__(self):
name = self.name
hatColor = self.hatColor
shirtColor = self.shirtColor
templ = "<Person:{name}: hat:{hatColor}, shirt:{shirtColor}>"
return templ.format(name=name, hatColor=hatColor, shirtColor=shirtColor)
tom = DressUp("Tom")
tom.pickWardrobe("red","yellow")
print "tom's hat is", tom.hatColor
print "simple print:", tom
print "__repr__ call:", tom.__repr__()
jane = DressUp("Jane")
jane.pickWardrobe("pink","pink")
print "jane's hat is", jane.hatColor
print "simple print:", jane
print "__repr__ call:", jane.__repr__()
The __repr__ method is used at the moment, you call print tom or print jane.
It is used here to show, how to instance method can get access to instance properties.
Is there any way around it? Yes, there are several. If you're using global variables on a regular basis, you're making a mistake in your design.
One common pattern when you have many functions that will operate on the same, related data is to create a class and then declare instances of that class. Each instance has its own set of data and methods, and the methods within that instance can operate on the data within that instance.
This is called object oriented programming, it is a common and basic paradigm in modern programming.
Several respondents have sketched out what a class might look like in your case but I don't think you've given enough information (which would include the method signatures of the other functions) to actually write out what you need. If you post more information you might get some better examples.
If it is appropriate, you could use classes.
class DressUp:
def __init__(self, name):
self.name = name
def pickWardrobe(self,hat, shirt, pants):
self.hatColor = hat
self.shirtColor = shirt
self.pantsColor = pants
obj1 = DressUp("Tom")
obj1.pickWardrobe("red","yellow","blue")
print obj1.hatColor
Have a look:
http://www.tutorialspoint.com/python/python_classes_objects.htm
I tried this example code:
class testclass:
classvar = 'its classvariable LITERAL'
def __init__(self,x,y):
self.z = x
self.classvar = 'its initvariable LITERAL'
self.test()
def test(self):
print('class var',testclass.classvar)
print('instance var',self.classvar)
if __name__ == '__main__':
x = testclass(2,3)
I need some clarification. In both cases, I'm able to access the class attribute and instance in the test method.
So, suppose if I have to define a literal that needs to be used across all function, which would be the better way to define it: an instance attribute or a class attribute?
I found this in an old presentation made by Guido van Rossum in 1999 ( http://legacy.python.org/doc/essays/ppt/acm-ws/sld001.htm ) and I think it explains the topic beautifully:
Instance variable rules
On use via instance (self.x), search order:
(1) instance, (2) class, (3) base classes
this also works for method lookup
On assigment via instance (self.x = ...):
always makes an instance variable
Class variables "default" for instance variables
But...!
mutable class variable: one copy shared by all
mutable instance variable: each instance its own
Class variables are quite good for "constants" used by all the instances (that's all methods are technically). You could use module globals, but using a class variable makes it more clearly associated with the class.
There are often uses for class variables that you actually change, too, but it's usually best to stay away from them for the same reason you stay away from having different parts of your program communicate by altering global variables.
Instance variables are for data that is actually part of the instance. They could be different for each particular instance, and they often change over the lifetime of a single particular instance. It's best to use instance variables for data that is conceptually part of an instance, even if in your program you happen to only have one instance, or you have a few instances that in practice always have the same value.
It's good practice to only use class attributes if they are going to remain fixed, and one great thing about them is that they can be accessed outside of an instance:
class MyClass():
var1 = 1
def __init__(self):
self.var2 = 2
MyClass.var1 # 1 (you can reference var1 without instantiating)
MyClass.var2 # AttributeError: class MyClass has no attribute 'var2'
If MyClass.var is defined, it should be the same in every instance of MyClass, otherwise you get the following behaviour which is considered confusing.
a = MyClass()
b = MyClass()
a.var1, a.var2 # (1,2)
a.var1, a.var2 = (3,4) # you can change these variables
a.var1, a.var2 # (3,4)
b.var1, b.var2 # (1,2) # but they don't change in b
MyClass.var1 # 1 nor in MyClass
You should define it as a class attribute if you want it to be shared among all instances. You should define it as an instance variable if you want a separate one for each instance (e.g., if different instances might have different values for the variable).
I'm interested in hearing some discussion about class attributes in Python. For example, what is a good use case for class attributes? For the most part, I can not come up with a case where a class attribute is preferable to using a module level attribute. If this is true, then why have them around?
The problem I have with them, is that it is almost too easy to clobber a class attribute value by mistake, and then your "global" value has turned into a local instance attribute.
Feel free to comment on how you would handle the following situations:
Constant values used by a class and/or sub-classes. This may include "magic number" dictionary keys or list indexes that will never change, but possible need one-time initialization.
Default class attribute, that in rare occasions updated for a special instance of the class.
Global data structure used to represent an internal state of a class shared between all instances.
A class that initializes a number of default attributes, not influenced by constructor arguments.
Some Related Posts:
Difference Between Class and Instance Attributes
#4:
I never use class attributes to initialize default instance attributes (the ones you normally put in __init__). For example:
class Obj(object):
def __init__(self):
self.users = 0
and never:
class Obj(object):
users = 0
Why? Because it's inconsistent: it doesn't do what you want when you assign anything but an invariant object:
class Obj(object):
users = []
causes the users list to be shared across all objects, which in this case isn't wanted. It's confusing to split these into class attributes and assignments in __init__ depending on their type, so I always put them all in __init__, which I find clearer anyway.
As for the rest, I generally put class-specific values inside the class. This isn't so much because globals are "evil"--they're not so big a deal as in some languages, because they're still scoped to the module, unless the module itself is too big--but if external code wants to access them, it's handy to have all of the relevant values in one place. For example, in module.py:
class Obj(object):
class Exception(Exception): pass
...
and then:
from module import Obj
try:
o = Obj()
o.go()
except o.Exception:
print "error"
Aside from allowing subclasses to change the value (which isn't always wanted anyway), it means I don't have to laboriously import exception names and a bunch of other stuff needed to use Obj. "from module import Obj, ObjException, ..." gets tiresome quickly.
what is a good use case for class attributes
Case 0. Class methods are just class attributes. This is not just a technical similarity - you can access and modify class methods at runtime by assigning callables to them.
Case 1. A module can easily define several classes. It's reasonable to encapsulate everything about class A into A... and everything about class B into B.... For example,
# module xxx
class X:
MAX_THREADS = 100
...
# main program
from xxx import X
if nthreads < X.MAX_THREADS: ...
Case 2. This class has lots of default attributes which can be modified in an instance. Here the ability to leave attribute to be a 'global default' is a feature, not bug.
class NiceDiff:
"""Formats time difference given in seconds into a form '15 minutes ago'."""
magic = .249
pattern = 'in {0}', 'right now', '{0} ago'
divisions = 1
# there are more default attributes
One creates instance of NiceDiff to use the existing or slightly modified formatting, but a localizer to a different language subclasses the class to implement some functions in a fundamentally different way and redefine constants:
class Разница(NiceDiff): # NiceDiff localized to Russian
'''Из разницы во времени, типа -300, делает конкретно '5 минут назад'.'''
pattern = 'через {0}', 'прям щас', '{0} назад'
Your cases:
constants -- yes, I put them to class. It's strange to say self.CONSTANT = ..., so I don't see a big risk for clobbering them.
Default attribute -- mixed, as above may go to class, but may also go to __init__ depending on the semantics.
Global data structure --- goes to class if used only by the class, but may also go to module, in either case must be very well-documented.
Class attributes are often used to allow overriding defaults in subclasses. For example, BaseHTTPRequestHandler has class constants sys_version and server_version, the latter defaulting to "BaseHTTP/" + __version__. SimpleHTTPRequestHandler overrides server_version to "SimpleHTTP/" + __version__.
Encapsulation is a good principle: when an attribute is inside the class it pertains to instead of being in the global scope, this gives additional information to people reading the code.
In your situations 1-4, I would thus avoid globals as much as I can, and prefer using class attributes, which allow one to benefit from encapsulation.